Saturday, November 19, 2011

All in one HDFS Cluster for your pocket

Update 1 (Nov 21, 2011):
- added 3rd interface as host-only-adapter (hadoop1)
- enabled trusted device eth2

About one year ago, I created a small XEN-environment for my engineering pourposes. When I was traveling for hours it was very helpful to track some issues or test new features. The problem was that I had to carry 2 notebooks with me. That was the reason I switched to VirtualBox [1] which runs on OSX, Linux and Windows as well. I could play with my servers and when I did, they configured to death and I reimported them into a clean setup. I think that will also be a good start for new people who have to find into the hadoop ecosystem to see the power without the harm of configuration in a multi-node environment.
The appliance is created with VirtualBox, because it runs on OSX and Windows very easily. The idea behind it is to check new settings in a small environment rather easily; the appliance is designed for research, not for development and really not for production. The appliance has 4 nodes, one master and 3 slaves. The setup is not perfect, but it matched the environment I created it for. We have no seperate secondary namenode, for example. I set up hdfs, hive with mysql-metastore, hBase in distributed mode with zookeeper and stargate.

Before we can play with our own LAB we have to consider that we need some specials before. Please read the site [2] I created for.

[1] https://www.virtualbox.org/wiki/Downloads
[2] http://mapredit.blogspot.com/p/all-in-one-hadoop-multi-node-appliance.html