Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(otelarrowexporter) README: on batching w/ otel-arrow #35225

Closed
wants to merge 8 commits into from
105 changes: 84 additions & 21 deletions exporter/otelarrowexporter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,44 +264,107 @@ exporters:

### Batching Configuration

This exporter includes a new, experimental `batcher` configuration for
batching in the `exporterhelper` module, but this mode is disabled by
default. This batching support works when combined with
`queue_sender` functionality.
### Option 1: Batching with back-pressure

To configure an OpenTelemetry Collector pipeline for both batching and
back-pressure, use of a custom component, the Concurrent Batch Processor,
available in the OTel-Arrow project repository, is required. We have not
included this in the Collector-Contrib repository because equivalent
functionality is being added as a standard exporter-batcher mechanism and the
new exporter-batcher functionality is still experimental.
Comment on lines +269 to +274
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user, how would I configure this to be include this into my collector?

Would it be possible add like a otb (collector builder) example? Or a link earlier in this text to go follow this and learn more there?

My concern that this is a lot of "expert" mode configuration with no warning that it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a response to the reality today, which is that the none of the built-in support actually helps with the problem I am trying to solve--which is to have a synchronous pipeline with error transmission that is not limited to one export at a time. I am desperate to fix these problems! See the issue:

open-telemetry/opentelemetry-collector#11308

If you feel that this "expert-mode" is really a problem, then we should address the underlying issue--I ask you to approve open-telemetry/opentelemetry-collector#11324. If you do not feel that there is a problem, then let's merge this PR.


When the [Concurrent Batch Processor](https://github.com/open-telemetry/otel-arrow/blob/main/collector/processor/concurrentbatchprocessor/README.md) is configured, parallel batches of data are
exported with no limit on concurrency. This configuration requires that
receivers apply memory limits or admission control. While this is our preferred
configuration, it is just one of several reasonable setups. As an example
configuration:

```yaml
exporters:
otelarrow:
# place gRPC, otel-arrow, retry, and timeout settings here
batcher:
enabled: false
sending_queue:
enabled: false
receivers:
otelarrow:
# otelarrow supports OTLP and OTel-Arrow with admission control
admission:
request_limit_mib: 128
processors:
concurrentbatch:
service:
pipelines:
traces:
exporters: [otelarrow]
processors: [concurrentbatch]
receivers: [otelarrow]
```

### Option 2: Batching with a persistent queue

The OpenTelemetry Collector has a built-in persistent queue mechanism which
supplies back-pressure corresponding with disk write speed. In this mode,
jmacd marked this conversation as resolved.
Show resolved Hide resolved
batching is done after writing to the persistent queue. In this mode, the
`num_consumers` field determines how many parallel batches of data are presented
to the exporter. When the `sending_queue` function is enabled, `num_consumers`
should be set to at least the number of OTel-Arrow streams, or higher to
increase throughput.

As an example configuration:

```yaml
exporters:
otelarrow:
# place gRPC, otel-arrow, retry, and timeout settings here.
batcher:
enabled: true
sending_queue:
enabled: true
num_consumers: 32
storage: file_storage/otc
extensions:
file_storage/otc:
directory: /var/lib/storage/otc
receivers:
otlp:
protocols:
grpc:
service:
extensions: [file_storage]
pipelines:
traces:
exporters: [otelarrow]
receivers: [otlp]
```

The built-in batcher is only recommended with a persistent queue,
otherwise it cannot provide back-pressure to the caller. If building
a custom build of the OpenTelemetry Collector, we recommend using the
[Concurrent Batch
Processor](https://github.com/open-telemetry/otel-arrow/blob/main/collector/processor/concurrentbatchprocessor/README.md)
to provide simultaneous back-pressure, concurrency, and batching
functionality. See [more discussion on this
issue](https://github.com/open-telemetry/opentelemetry-collector/issues/10368).
### Option 3: Batching without back-pressure

```
Instead of applying back-pressure, another option is to return success as
quickly as possible to the caller using an in-memory queue. As long as the
exporter can keep up with the arriving data, none will be dropped in this
configuration; however, this setup is relatively fragile and more likely to
cause the loss of telemetry data.

As an example configuration:

```yaml
exporters:
otelarrow:
# place gRPC, otel-arrow, retry, and timeout settings here
batcher:
enabled: false
enabled: true
sending_queue:
enabled: false
processors:
concurrentbatch:
send_batch_max_size: 1500
send_batch_size: 1000
timeout: 1s
max_in_flight_size_mib: 128
enabled: true
num_consumers: 32
receivers:
otlp:
protocols:
grpc:
service:
pipelines:
traces:
exporters: [otelarrow]
receivers: [otlp]
```
Loading