Using filters in HBase to match certain columns

HBase is a column oriented database which stores the content by column rather than by row. To limit the output of an scan you can use filters, so far so good.

But how it'll work when you want to filter more as one matching column, let's say 2 or more certain columns?
The trick here is to use an SingleColumnValueFilter (SCVF) in conjunction with a boolean arithmetic operation. The idea behind is to include all columns which have "X" and NOT the value DOESNOTEXIST; the filter would look like:


List list = new ArrayList<Filter>(2);
Filter filter1 = new SingleColumnValueFilter(Bytes.toBytes("fam1"),
 Bytes.toBytes("VALUE1"), CompareOp.DOES_NOT_EQUAL, Bytes.toBytes("DOESNOTEXIST"));
filter1.setFilterIfMissing(true);
list.addFilter(filter1);
Filter filter2 = new SingleColumnValueFilter(Bytes.toBytes("fam2"),
 Bytes.toBytes("VALUE2"), CompareOp.DOES_NOT_EQUAL, Bytes.toBytes("DOESNOTEXIST"));
filter2.setFilterIfMissing(true);
list.addFilter(filter2);
FilterList filterList = new FilterList(list);
Scan scan = new Scan();
scan.setFilter(filterList);



Define a new filter list, add an family (fam1) and define the filter mechanism to match VALUE1 and compare them with NOT_EQUAL => DOESNOTEXIST. Means, the filter match all columns which have VALUE1 and returns only the rows who have NOT included DOESNOTEXIST. Now you can add more and more values to the filter list, start the scan and you should only get data back which match exactly your conditions.

Comments

  1. Thank you so much for this nice information. Hope so many people will get aware of this and useful as well. And please keep update like this.

    Big Data Consulting Services

    Data Lake Solutions

    Advanced Analytics

    Full Stack Development Solutions

    ReplyDelete

Post a Comment

Popular posts from this blog

Export HDFS over CIFS (Samba3)

Hive query shows ERROR "too many counters"

Connect to HiveServer2 with a kerberized JDBC client (Squirrel)