Posts

Showing posts from April, 2015

Hive on Spark at CDH 5.3

However, since Hive on Spark is not (yet) officially supported by Cloudera some manual steps are required to get Hive on Spark within CDH 5.3 working. Please note that there are four important requirements additionally to the hands-on work:
Spark Gateway nodes needs to be a Hive Gateway node as wellIn case the client configurations are redeployed, you need to copy the hive-site.xml againIn case CDH is upgraded (also for minor patches, often updated without noticing you), you need to adjust the class pathsHive libraries need to be present on all executors (CM should take care of this automatically) Login to your spark server(s) and copy the running hive-site.xml to spark:

cp /etc/hive/conf/hive-site.xml /etc/spark/conf/

Start your spark shell with (replace <CDH_VERSION> with your parcel version, e.g. 5.3.2-1.cdh5.3.2.p0.10) and load the hive context within spark-shell:

spark-shell --master yarn-client --driver-class-path "/opt/cloudera/parcels/CDH-<CDH_VERSION>/lib/hive/l…