Posts

Showing posts from November, 2014

Hadoop server performance tuning

To tune a Hadoop cluster from a DevOps perspective needs an understanding of the kernel principles and linux. The following article will describe the most important parameters together with tricks for an optimal tuning.

Memory Typically modern Linux systems (Linux 2.6 +) use swapping to avoid OOM (Out of Memory) to protect the system from kernel freezes. But Hadoop uses Java, and typically Java is configured with MAXHEAPSIZE per service (HDFS, HBase, Zookeeper etc). The configuration has to match the available memory in the system. A common formula for MapReduce1:
TOTAL_MEMORY = (Mappers + Reducers) * CHILD_TASK_HEAP + TT_HEAP + DN_HEAP + RS_HEAP + OTHER_SERVICES_HEAP + 3GB (for OS and caches)

For MapReduce2 YARN takes care about the resources, but only for services which are running as YARN Applications. [1], [2]

Disable swappiness is done one the fly per
echo 0 > /proc/sys/vm/swappiness

and persistent after reboots per sysctl.conf:
echo “vm.swappiness = 0” >> /etc/sysctl.conf…