Posts

Showing posts from June, 2016

Encryption in HDFS

Image
Encryption of data was and is the hottest topic in terms of data protection and prevention against theft. Hadoop HDFS supports full transparent encryption in transit and at rest [1], based on Kerberos implementations [2], often used within multiple trusted Kerberos domains.

Technology Hadoop KMS provides a REST-API, which has built-in SPNEGO and HTTPS support, comes mostly bundled with a pre-configured Apache Tomcat within your preferred Hadoop distribution.  To have encryption transparent for the user and the system, each encrypted zone is associated with a SEZK (single encryption zone key), created when the zone is defined as an encryption zone by interaction between NN and KMS. Each file within that zone will have its own DEK (Data Encryption Key). This behavior is fully transparent, since the NN directly asks the KMS for a new EDEK (encrypted data encryption key) encrypted with the zones key and adds them to the file’s metadata when a new file is created.
When a client wants to re…

Open Source based Hyper-Converged Infrastructures and Hadoop

Image
According to a report from Simplivity [1] Hyper-Converged Infrastructures are used by more than 50% of the interviewed businesses, tendentious increasing. But what does this mean for BigData solutions, and Hadoop especially? What tools and technologies can be used, what are the limitations and the gains from such a solution?

To build a production ready and reliable private cloud to support Hadoop clusters as well as on-demand and static I have made great experience with OpenStack, Saltstack and the Sahara plugin for Openstack.
Openstack supports Hadoop-on-demand per Sahara, it's also convenient to use VM's and install a Hadoop Distribution within, especially for static clusters with special setups. The Openstack project provides ready to go images per [2], as example for Vanilla 2.7.1 based Hadoop installations. As an additional benefit, Openstack supports Docker [3], which adds an additional layer of flexibility for additional services, like Kafka [4] or SolR [5].

Costs and I…