FAQ

Flume Frequently Asked Questions

These FAQs and answers are relative to the current release unless otherwise noted.

Setup + Installation

Why is ZooKeeper a dependency? Why is it included?

We use it to make the master reliable.

From packages, flume runs as a flume user but we can’t read certain files because they belong to root!

For now, we suggest adding a group that the flume user is part of, and make the file give read writes to members of that group.

Configuration

I’ve edited my flume-site.xml but my changes aren’t showing up!

You may have edited a version of the file but probably isn’t where flume expects it. Try manually starting the flume node but going to the command line and
entering:

flume node

In the first few lines of output there should be something like:

10/07/21 10:25:20 INFO conf.FlumeConfiguration: Loading configurations
from /etc/flume/conf

The flume-site.xml file that you edit should be in that directory.
(in this case ‘/etc/flume/conf/flume-site.xml’)

How do I change the maximum raw event size?

Set the flume.event.max.size.bytes property in the flume-site.xml file to a max size value.

Does this version support output file compression?

Not yet, it is on our list of things TODO !

autoDFOChain stuff seems to hang!

Yup, it’s a bug. Fix one bug and it causes a bunch of new ones to appear. We are working on it. autoBEChain stuff is more likely to work at the moment.

Using a agent or auto E2E results in many periodic duplicates!

To use E2E reliability modes, you currently must use the collectorSink at the end point! The collectorSink contains the code that checks and responds to the acking and flushing logic injected in the ackedWriteAhead decorator that are used/generated in the auto/agent E2E sinks.

Huh? why isn’t the ack checking stuff in the collectorSource?

To guarantee data gets written, we can only send acknowledgements after we have successfully written. Sinks to the writing so only they can send the acknowledgement signals!

I get this exception in my logs:

2136264 [pool-1-thread-3] ERROR org.apache.thrift.server.TSaneThreadPoolServer - Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client?
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:201)
        at com.cloudera.flume.conf.thrift.FlumeClientServer$Processor.process(FlumeClientServer.java:290)
        at org.apache.thrift.server.TSaneThreadPoolServer$WorkerProcess.run(TSaneThreadPoolServer.java:280)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

This happens when an incorrect client attempts to talk to one of the thrift service. You may have accidentally added a port to flume.master.servers or some other place.

Dev Stuff

Where can I get a tar ball with source?

http://archive.cloudera.com/cdh/3/

I can start a Flume Master or Flume Node in eclipse but I can’t seem to load the web pages! I get a flumemaster.jsp not found error!?

The default setting is to precompile the jsps into java code. Currently these are generated when ‘ant’ is run from the command line. The java servlets are written to ./build/src/. You need to make sure to add ‘build/src’ to you eclipse build path.

I’ve found a problem: bug with the program, typo in the docs, etc.

Please let us know! We use a system called JIRA for bug reporting, tracking, and resolution. You can go here to let us know what you have found! Please let us know the version, component (if you can tell), and ideally a way to duplicate the bug!

Flume has weak ordering guarantees.

Flume has weaker guarantees than some other systems (message queues for example) in the interest of moving data around more quickly and to enable cheaper fault tolerance (The idea is to minimise the amount of state that Flume has to keep. Replicated state is what makes fault-tolerance hard, and makes reasoning about failure conditions difficult.). In Flume’s end-to-end reliability mode, events are delivered at least once, but with no ordering guarantees. We’ve found this sufficient for using Flume as a data conduit, since messages can be de-duplicated either at write time or by a post-hoc batch process. However, this means that Flume is harder to use as a message passing or eventing framework unless your application is setup to be idempotent wrt duplicate events and there is no causal relationship between events that is required to be preserved upon delivery.

There are two ways that events may be re-ordered:

1. They are transmitted in DFO or E2E modes, and a failure delays them until after the successful delivery of some chronologically later events. The agent will try and retransmit unacknowledged events, but that could happen after some events get delivered just fine.

2. The network reorders the packets. That can’t happen with current TCP protocols (i.e. there’s buffering and reordering done at the receiver), but I can’t rule out us going to UDP, precisely because we don’t need those guarantees.

You can always reconstruct causal order after all events are delivered by looking at their timestamps, but at the time of delivery you don’t know if there are events that you missed, unless you attach sequence numbers to each. If you are using Flume for alerting then you just need to track when the last interesting state was Say you received an ERROR notification with timestamp t – just make sure you save t and silently drop any messages that arrive after it with timestamps < t.

In BE mode, currently, events should arrive in order but it’s possible they could be delivered to different collectors, if you have more than one. You have to be aware of the possibility that events could be arbitrarily delayed, as well, although the delay you see for BE should be less than for DFO or E2E (i.e. events are usually delivered quickly, or not at all).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FAQ

Flume Frequently Asked Questions

Setup + Installation

Why is ZooKeeper a dependency? Why is it included?

From packages, flume runs as a flume user but we can’t read certain files because they belong to root!

Configuration

I’ve edited my flume-site.xml but my changes aren’t showing up!

How do I change the maximum raw event size?

Does this version support output file compression?

autoDFOChain stuff seems to hang!

Using a agent or auto E2E results in many periodic duplicates!

Huh? why isn’t the ack checking stuff in the collectorSource?

I get this exception in my logs:

Dev Stuff

Where can I get a tar ball with source?

I can start a Flume Master or Flume Node in eclipse but I can’t seem to load the web pages! I get a flumemaster.jsp not found error!?

I’ve found a problem: bug with the program, typo in the docs, etc.

Flume has weak ordering guarantees.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally