Hive query shows ERROR "too many counters"

A hive job face the odd "Too many counters:" like

Ended Job = job_xxxxxx with exception 'org.apache.hadoop.mapreduce.counters.LimitExceededException(Too many counters: 201 max=200)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
Intercepting System.exit(1)


These happens when operators are used in queries (Hive Operators). Hive creates 4 counters per operator, max upto 1000, plus a few additional counters like file read/write, partitions and tables. Hence the number of counter required is going to be dependent upon the query. 

To avoid such exception, configure "mapreduce.job.counters.max" in mapreduce-site.xml to a value above 1000. Hive will fail when he is hitting the 1k counts, but other MR jobs not. A number around 1120 should be a good choice.

Using "EXPLAIN EXTENDED" and "grep -ri operators | wc -l" print out the used numbers of operators. Use this value to tweak the MR settings carefully. 

Comments

  1. For more information on Hadoop counters - check the blog I have written some time back (http://www.thecloudavenue.com/2011/12/limiting-usage-counters-in-hadoop.html).

    Also, there might be a reason (performance) why the number of counters are restricted in Hadoop. So, I suggest just to not increase it blindly, but to keep an eye on the performance after the changes.

    ReplyDelete
  2. Yes, thats a good point. Thanks for sharing this, Praveen :)

    ReplyDelete

Post a Comment

Popular posts from this blog

Deal with corrupted messages in Apache Kafka

Embedded Linux won't reboot - how to fix and repair