Posts

Showing posts from November, 2011

NFS exported HDFS (CDH3)

For some reasons it could be a good idea to make a hdfs filesystem available across networks as a exported share. Here I describe a working scenario with linux and hadoop with tools both have on board.
I used fuse and libhdfs to mount a hdfs filesystem. Change namenode.local and <PORT> to fit your environment.

Install:
yum install hadoop-0.20-fuse.x86_64 hadoop-0.20-libhdfs.x86_64

Create a mountpoint:
mkdir /hdfs-mount

Mount your hdfs (testing):
hadoop-fuse-dfs dfs://namenode.local:<PORT> /hdfs-mount -d

You will show like that:
INFO fuse_options.c:162 Adding FUSE arg /hdfs-mount
 INFO fuse_options.c:110 Ignoring option -d
 unique: 1, opcode: INIT (26), nodeid: 0, insize: 56
 INIT: 7.10
 flags=0x0000000b
 max_readahead=0x00020000
 INFO fuse_init.c:101 Mounting namenode.local:<PORT>
 INIT: 7.8
 flags=0x00000001
 max_readahead=0x00020000
 max_write=0x00020000
 unique: 1, error: 0 (Success), outsize: 40

Hit crtl-C after you see "Success".

Make the mount available at boot time:
ec…

All in one HDFS Cluster for your pocket

Image
Update 1 (Nov 21, 2011):
- added 3rd interface as host-only-adapter (hadoop1)
- enabled trusted device eth2

About one year ago, I created a small XEN-environment for my engineering pourposes. When I was traveling for hours it was very helpful to track some issues or test new features. The problem was that I had to carry 2 notebooks with me. That was the reason I switched to VirtualBox [1] which runs on OSX, Linux and Windows as well. I could play with my servers and when I did, they configured to death and I reimported them into a clean setup. I think that will also be a good start for new people who have to find into the hadoop ecosystem to see the power without the harm of configuration in a multi-node environment.
The appliance is created with VirtualBox, because it runs on OSX and Windows very easily. The idea behind it is to check new settings in a small environment rather easily; the appliance is designed for research, not for development and really not for production. The applian…

HDFS debugging scenario

The first step to debug issues in a running hadoop - environment to take a look at the stacktraces, easy accessible over jobtracker/stacks and let you show all running stacks in a jstack view. You will see the running processes, as an example I discuss a lab testing scenario, see below.

http://jobtracker:50030/stacks

Process Thread Dump: 
43 active threads
Thread 3203101 (IPC Client (47) connection to NAMENODE/IP:9000 from hdfs):
  State: TIMED_WAITING
  Blocked count: 6
  Waited count: 7
  Stack:
    java.lang.Object.wait(Native Method)
    org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:676)
    org.apache.hadoop.ipc.Client$Connection.run(Client.java:719)

In that case the RPC connection has a state "TIMED_WAIT" in a block and waited count. That means, the namenode could not answer the RPC request fast enough. The problem belongs the setup as I see often in production environments.
For demonstration I use a ESX Cluster with a VM for the namenode. The ESX abstraction …