« NoSQL, SQL, Is that the Question? | Main | Learning from Others »

Sunday, October 03, 2010

Comments

Otis

How come you developed a log4j adapter that sends all Java logs directly to the agent? Why didn't you choose to simply keep logging with one of the log4j file appenders and simply tail that log file with Flume?

You mention storing structured data in something like Big Table. At Sematext (http://sematext.com/ ) we developed a Flume Sink for HBase - see FLUME-247.

Dan Pritchett

We wrote an appender to get better structured content into Flume. If we simply tail the log file then we lose content (unless I'm missing something) like level, parsed timestamp, class name, thread name, etc. As written the appender also picks up location information and properties if present and places that into fields. The purpose being to keep as much logger information structured as possible.

The HBASE sink sounds great. I'll have a look!

Otis

Hm, I don't fully follow the log4j comment. That is, I feel like one could achieve the same thing by tailing and creating appropriate structure around the log event to prepare it for HBase with a decorator, as in FLUME-247.

Another Q: You also said: "The flexibility of Flume allows it to scale from environments with as few as 5 machines to environments with thousands of machines."

Why do you say *5* machines? Couldn't it be as low as 4, 3, 2, or even 1? Thanks.

Dan Pritchett

Yes, you could use a decorator to parse your log file and get back to the structure. I could see where that can become more problematic if you are setting properties on the log record in log4j, but of course it could be done that way. The appender does save me from having to manage log files on the application server, which is a primary motivator for using Flume to begin with.

And yes, Flume can be used on as few as 1 machines.:)

Benjamin Manes

This design is similar to how Rearden was going to deploy Splunk. It was to replace OpsConsole (log analysis, JMX monitoring, alerting). I think there were performance issues, so we ended up with neither. Flume looks like a nice addition.

On a name coincidence, you may find Google's paper on FlumeJava interesting. It provides an workflow optimizer for Map/Reduce.
(http://portal.acm.org/citation.cfm?id=1806596.1806638)

Dan Pritchett

We might consider posting it but one of the challenges is what is outlined in that ticket. How do you rendezvous with the Flume agent. We've picked a very specific policy for how to do it which may not be a policy that every organization is willing to adopt.

The comments to this entry are closed.