Thursday, November 15, 2012

HBase major compaction per cronjob

Sometimes I get asked how a admin can run a major compaction on a particular table at a time when the cluster isn't usually used.

This can be done per cron, or at. HBase shell needs a ruby script, which is very simple:

# cat m_compact.rb
major_compact 't1'
exit

A working shell script for cron, as example:

# cat daily_compact
#!/bin/bash
USER=hbase
PWD=`echo ~$USER`
TABLE=t1
# kerberos enabled 
KEYTAB=/etc/hbase/conf/hbase.keytab
HOST=`hostname`
REALM=ALO.ALT
LOG=/var/log/daily_compact

# get a new ticket
sudo -u $USER kinit -k -t $KEYTAB $USER/$HOST@$REALM
# start compaction
sudo -u $USER hbase shell $PWD/m_compact.rb 2>&1 |tee -a $LOG

All messages will be redirected to /var/log/daily_compact:
11/15/13 06:49:26 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
12 row(s) in 0.7800 seconds