Use hive to catch grabber

Get the logs from the farm via flume & syslog, mapreduce them in hive for IP, how often / second, bytes, item and compare with "human" profiles. Get the data on the fly via sqlstream, processes back into Oracle and from there a loadbalancer could get the IPs for a smooth redirect and I process the data into a graphing system (connection from that IP):

Hourly I check geolocation, whois, provider. Using pig.latin. Ready for first testing in our labs. And, of course, not a really performant task (yet) ;-)


Popular posts from this blog

Export HDFS over CIFS (Samba3)

Hive query shows ERROR "too many counters"

Connect to HiveServer2 with a kerberized JDBC client (Squirrel)