Posts

Showing posts from February, 2015

Hadoop and trusted MiTv5 Kerberos with Active Directory

For actuality here a example how to enable an MiTv5 Kerberos <=> Active Directory trust just from scratch. Should work out of the box, just replace the realms:
HADOOP1.INTERNAL = local server (KDC) ALO.LOCAL = local kerberos realm AD.REMOTE = AD realm
with your servers. The KDC should be inside your hadoop network, the remote AD can be somewhere.
1. Install the bits
At the KDC server (CentOS, RHEL - other OS' should have nearly the same bits): yum install krb5-server krb5-libs krb5-workstation -y

At the clients (hadoop nodes): yum install krb5-libs krb5-workstation -y

Install Java's JCE policy (see Oracle documentation) on all hadoop nodes.
2. Configure your local KDC

/etc/krb5.conf

[libdefaults] default_realm = ALO.LOCAL
dns_lookup_realm = false
dns_lookup_kdc = false
kdc_timesync = 1
ccache_type = 4
forwardable = true
proxiable = true
fcc-mit-ticketflags = true
max_life = 1d
max_renewable_life = 7d
renew_lifetime = 7d
default_tgs_enctypes = aes128-cts arcfour-hmac
default_tkt_…

Hadoop based SQL engines

Apache Hadoop comes more and more into the focus of business critical architectures and applications. Naturally SQL based solutions are the first to get considered, but the market is evolving and new tools are coming up, but leaving unnoticed.

Listed below an overview over currently available Hadoop based SQL technologies. The must haves are:
Open Source (various contributors), low-latency querying possible, supporting CRUD (mostly!) and statements like CREATE, INSERT INTO, SELECT * FROM (limit..), UPDATE Table SET A1=2 WHERE, DELETE, and DROP TABLE.

Apache Hive (SQL-like, with interactive SQL (Stinger)
Apache Drill (ANSI SQL support)
Apache Spark (Spark SQL, queries only, add data via Hive, RDD or Parquet)
Apache Phoenix (built atop Apache HBase, lacks full transaction support, relational operators and some built-in functions)
Presto from Facebook (can query Hive, Cassandra, relational DBs & etc. Doesn't seem to be designed for low-latency responses across small clusters, or suppor…