Use hive to catch grabber

Get the logs from the farm via flume & syslog, mapreduce them in hive for IP, how often / second, bytes, item and compare with "human" profiles. Get the data on the fly via sqlstream, processes back into Oracle and from there a loadbalancer could get the IPs for a smooth redirect and I process the data into a graphing system (connection from that IP):



Hourly I check geolocation, whois, provider. Using pig.latin. Ready for first testing in our labs. And, of course, not a really performant task (yet) ;-)

Comments

Popular posts from this blog

Deal with corrupted messages in Apache Kafka

Hive query shows ERROR "too many counters"

Embedded Linux won't reboot - how to fix and repair