Wednesday, November 30, 2011

NFS exported HDFS (CDH3)


For some reasons it could be a good idea to make a hdfs filesystem available across networks as a exported share. Here I describe a working scenario with linux and hadoop with tools both have on board.
I used fuse and libhdfs to mount a hdfs filesystem. Change namenode.local and <PORT> to fit your environment.

Install:
 yum install hadoop-0.20-fuse.x86_64 hadoop-0.20-libhdfs.x86_64

Create a mountpoint:
 mkdir /hdfs-mount

Mount your hdfs (testing):
 hadoop-fuse-dfs dfs://namenode.local:<PORT> /hdfs-mount -d

You will show like that:
 INFO fuse_options.c:162 Adding FUSE arg /hdfs-mount
 INFO fuse_options.c:110 Ignoring option -d
 unique: 1, opcode: INIT (26), nodeid: 0, insize: 56
 INIT: 7.10
 flags=0x0000000b
 max_readahead=0x00020000
 INFO fuse_init.c:101 Mounting namenode.local:<PORT>
 INIT: 7.8
 flags=0x00000001
 max_readahead=0x00020000
 max_write=0x00020000
 unique: 1, error: 0 (Success), outsize: 40

Hit crtl-C after you see "Success".

Make the mount available at boot time:
 echo "hadoop-fuse-dfs#dfs://namenode.local:<PORT> /hdfs-mount fuse usetrash,rw 0 0" >> /etc/fstab

Test:
#> mount -a
#> mount
 [..]
 sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
 fuse on /hdfs-mount type fuse (rw,nosuid,nodev,allow_other,default_permissions)

To tune the memory for each JVM process take a look into /etc/default/hadoop-0.20-fuse and adjust the settings there.

Export via NFS (unsecure):
First we have to decide which user we use, I suppose the user hdfs. Use "id hdfs":
 uid=104(hdfs) gid=105(hdfs) groups=105(hdfs),104(hadoop) context=root:staff_r:staff_t:SystemLow-SystemHigh

Create an exports-file:
 cat /etc/exports
 /hdfs-mount/user    (fsid=111,rw,wdelay,anonuid=104,anongid=105,sync,insecure,no_subtree_check,no_root_squash)

Expl.: read-write, fsid=unused ID (man 5 exports), write-delay, hdfs user, sync

To export only the user-directory from HDFS prevents you from unwanted changes in system relevant directories (mapred as example).
Restart your NFS Server (service nfs restart).

Now you can use your hdfs as a "local" filesystem, which makes some tasks easier. Note that the "use user" are mapped to the local user, to using root is a bad idea.
Mount the exported NFS on your machine and create / copy your jobdefinitions or files simply.

PS: works only from kernel 2.6.27 upwards