Skip to content
jmhsieh edited this page Sep 14, 2010 · 26 revisions

Flume Frequently Asked Questions

These FAQs and answers are relative to the current release unless otherwise noted.

Setup + Installation

Why is zookeeper a dependency? Why is it included?

We use it to make the master reliable.

From packages, flume runs as a flume user but we can’t read certain files becuase they belong to root!

For now, we suggest adding a group that the flume user is part of, and make the file give read writes to members of that group.

Configuration

I’ve edited my flume-site.xml but my changes aren’t showing up!

You may have edited a version of the file but probably isn’t where flume expects it. Try manually starting the flume node but going to the command line and
entering:

flume node

In the first few lines of output there should be something like:

10/07/21 10:25:20 INFO conf.FlumeConfiguration: Loading configurations
from /etc/flume/conf

The flume-site.xml file that you edit should be in that directory.
(in this case ‘/etc/flume/conf/flume-site.xml’)

How do I change the maximum raw event size?

Set the flume.event.max.size.bytes property in the flume-site.xml file to a max size value.

Does this version support output file compression?

Not yet, it is on our list of things TODO !

autoDFOChain stuff seems to hang!

Yup, it’s a bug. Fix one bug and it causes a bunch of new ones to appear. We are working on it. autoBEChain stuff is more likely to work at the moment.

Using a agent or auto E2E results in many periodic duplicates!

To use E2E reliability modes, you currently must use the collectorSink at the end point! The collectorSink contains the code that checks and responds to the acking and flushing logic injected in the ackedWriteAhead decorator that are used/generated in the auto/agent E2E sinks.

Huh? why isn’t the ack checking stuff in the collectorSource?

To guarantee data gets written, we can only send acknowledgements after we have successfully written. Sinks to the writing so only they can send the acknowledgement signals!

I get this exception in my logs:

2136264 [pool-1-thread-3] ERROR org.apache.thrift.server.TSaneThreadPoolServer - Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client?
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:201)
        at com.cloudera.flume.conf.thrift.FlumeClientServer$Processor.process(FlumeClientServer.java:290)
        at org.apache.thrift.server.TSaneThreadPoolServer$WorkerProcess.run(TSaneThreadPoolServer.java:280)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

This happens when an incorrect client attempts to talk to one of the thrift service. You may have accidentally added a port to flume.master.servers or some other place.

Dev Stuff

Where can I get a tar ball with source?

http://archive.cloudera.com/cdh/3/

I can start a Flume Master or Flume Node in eclipse but I can’t seem to load the web pages! I get a flumemaster.jsp not found error!?

Te default setting is to precompile the jsps into java code. Currently these are generated when ‘ant’ is run from the command line. The java servlets are written to ./build/src/. You need to make sure to add ‘build/src’ to you eclipse build path.

I’ve found a problem: bug with the program, typo in the docs, etc.

Please let us know! We use a system called JIRA for bug reporting, tracking, and resolution. You can go here to let us know what you have found! Please let us know the version, component (if you can tell), and ideally a way to duplicate the bug!

Clone this wiki locally