All in one HDFS Cluster for your pocket

Update 1 (Nov 21, 2011):
- added 3rd interface as host-only-adapter (hadoop1)
- enabled trusted device eth2

About one year ago, I created a small XEN-environment for my engineering pourposes. When I was traveling for hours it was very helpful to track some issues or test new features. The problem was that I had to carry 2 notebooks with me. That was the reason I switched to VirtualBox [1] which runs on OSX, Linux and Windows as well. I could play with my servers and when I did, they configured to death and I reimported them into a clean setup. I think that will also be a good start for new people who have to find into the hadoop ecosystem to see the power without the harm of configuration in a multi-node environment.
The appliance is created with VirtualBox, because it runs on OSX and Windows very easily. The idea behind it is to check new settings in a small environment rather easily; the appliance is designed for research, not for development and really not for production. The appliance has 4 nodes, one master and 3 slaves. The setup is not perfect, but it matched the environment I created it for. We have no seperate secondary namenode, for example. I set up hdfs, hive with mysql-metastore, hBase in distributed mode with zookeeper and stargate.

Before we can play with our own LAB we have to consider that we need some specials before. Please read the site [2] I created for.

[1] https://www.virtualbox.org/wiki/Downloads
[2] http://mapredit.blogspot.com/p/all-in-one-hadoop-multi-node-appliance.html

Comments

  1. Pretty interesting. This is a good way to create an Hadoop test environment and actually our team is going to use it. I currently use VMWare Player to do something similar on one box, to get a full cluster up for testing purposes. I am the lead developer of oceansync.com, an Hadoop management software tool and so its important to have a test environment that is portable to I can test things quickly.

    ReplyDelete
  2. I was testing with vmware-player, but I missed some features VirtualBox provides. The first is the transparency, I can use the app with OSX, Windows (7) and Linux as well.

    For consulting is really cool - you can demonstrate some changes in seconds live.

    ReplyDelete
  3. Thanks a lot for this contribution! Based on this I could prepare my test environment in just some minutes. I also tested with vmware-player and finally I switched to VirtualBox too, which runs now on Windows7 and OpenSuse12.1.

    What do you think about a git-hub repository to collect useful admin and/or developer scripts which can than be deployed to a clean preinstalled DEMO- TEST- or TRAINING-cluster which can be based on your work?

    ReplyDelete
  4. @Mirko: Sounds like a good idea, especially for ant builds I think.

    ReplyDelete
  5. I created the repository here ...

    https://github.com/kamir/hadoop-admin-and-developer-scripts

    ReplyDelete
  6. Hello Sir,
    I am student & need your help on below error
    Hadoop Error while running in multinode cluster

    root@ubuntu:/opt/hadoop-1.0.0# bin/hadoop jar hadoop-examples-1.0.0.jar pi 10 1$

    Number of Maps = 10
    Samples per Map = 10
    12/02/03 09:01:47 INFO ipc.Client: Retrying connect to server: ubuntu/192.168.1$
    12/02/03 09:01:48 INFO ipc.Client: Retrying connect to server: ubuntu/192.168.1$
    12/02/03 09:01:49 INFO ipc.Client: Retrying connect to server: ubuntu/192.168.1$
    12/02/03 09:01:50 INFO ipc.Client: Retrying connect to server: ubuntu/192.168.1$
    12/02/03 09:01:51 INFO ipc.Client: Retrying connect to server: ubuntu/192.168.1$
    12/02/03 09:01:52 INFO ipc.Client: Retrying connect to server: ubuntu/192.168.1$
    12/02/03 09:01:53 INFO ipc.Client: Retrying connect to server: ubuntu/192.168.1$
    12/02/03 09:01:54 INFO ipc.Client: Retrying connect to server: ubuntu/192.168.1$
    12/02/03 09:01:55 INFO ipc.Client: Retrying connect to server: ubuntu/192.168.1$

    Please help me if anyone can have solution on this error
    Configuration:
    hadoop 1.0
    Ubuntu 11.10
    jdk 1.7

    ReplyDelete

Post a Comment

Popular posts from this blog

Export HDFS over CIFS (Samba3)

Hive query shows ERROR "too many counters"

Connect to HiveServer2 with a kerberized JDBC client (Squirrel)