Posts

Showing posts from May, 2012

Using filters in HBase to match certain columns

HBase is a column oriented database which stores the content by column rather than by row. To limit the output of an scan you can use filters, so far so good.

But how it'll work when you want to filter more as one matching column, let's say 2 or more certain columns?
The trick here is to use an SingleColumnValueFilter (SCVF) in conjunction with a boolean arithmetic operation. The idea behind is to include all columns which have "X" and NOT the value DOESNOTEXIST; the filter would look like:


List list = new ArrayList<Filter>(2);
Filter filter1 = new SingleColumnValueFilter(Bytes.toBytes("fam1"),
 Bytes.toBytes("VALUE1"), CompareOp.DOES_NOT_EQUAL, Bytes.toBytes("DOESNOTEXIST"));
filter1.setFilterIfMissing(true);
list.addFilter(filter1);
Filter filter2 = new SingleColumnValueFilter(Bytes.toBytes("fam2"),
 Bytes.toBytes("VALUE2"), CompareOp.DOES_NOT_EQUAL, Bytes.toBytes("DOESNOTEXIST"));
filter2.setFilterIfMissin…

Stop accepting new jobs in a hadoop cluster (ACL)

To stop accepting new MR jobs in a hadoop cluster you have to enable ACL's first. If you've done that, you can specify a single character queue ACL (' ' = a space!). Since mapred-queue-acls.xml is polled regularly you can dynamically change the queue in a running system . Useful for ops related work (setting into maintenance, extending / decommission nodes and such things).

Enable ACL's

Edit the config file ($HADOOP/conf/mapred-queue-acls.xml) to fit your needs:

<configuration>
 <property>
   <name>mapred.queue.default.acl-submit-job</name>
   <value>user1,user2,group1,group2,admins</value>
 </property>

 <property>
   <name>mapred.queue.default.acl-administer-jobs</name>
   <value>admins</value>
 </property>

</configuration>

Enable an ACL driven cluster by editing the value of mapred.acls.enabled in conf/mapred-site.xml and setting to true.

Now edit simply the value of mapred.queue.default.…