Tuesday, July 8, 2014

XAttr are coming to HDFS

HDFS 2006 [1] describes the use of Extended Attributes. XAttr, known from *NIX Operating Systems, connects physically stored data with describing metadata above the strictly defined attributes by the filesystem. Mostly used to provide additional information, like hash, checksum, encoding or security relevant information like signature or author / creator.
According to the source code [2] the use of xattr can be configured by dfs.namenode.fs-limits.max-xattrs-per-inode and dfs.namenode.fs-limits.max-xattr-size in hdfs-default.xml. The default for dfs.namenode.fs-limits.max-xattrs-per-inode is 32, for dfs.namenode.fs-limits.max-xattr-size the default is 16384.

Within HDFS, the extended user attributes will be stored in the user namespace as an identifier.The identifier has four namespaces, like the Linux FS kernel implementation has: security, system, trusted and user. Only the superuser can access the trusted namespaces (system and security).
The xattr definitions are free and can be interpreted by additional tools like security frameworks, backup systems, per API or similar. Additionally, the attributes are case-sensitive and the namespace interpretes the definition as it is (case-insensitive).

An attribute can be set per dfs command like this:

hadoop dfs -setfattr -n 'alo.enc_default' -v UTF8 /user/alo/definition_table.txt

and can be read per:

hadoop dfs -getfattr -d /user/alo/definition_table.txt

# file: /user/alo/definition_table.txt

HDFS 2006 is already committed [3] and will be available in HDFS 2.5.x, is enabled per default and will have no impact on performance if you don't use them.

[1] https://issues.apache.org/jira/browse/HDFS-2006