Add support for OTLP/HTTP Receiver #1765

lalitb · 2026-01-12T20:35:23Z

Summary

Adds HTTP/1.1 support for the OTLP receiver alongside the existing gRPC server. HTTP is disabled by default—existing gRPC-only deployments are unaffected.

Key Design Decisions

Shared vs. Separate Concurrency
- When HTTP is enabled, both protocols share a single Arc<Semaphore> per pipeline instance ((i.e., per core in thread-per-core deployments, not cross-core) so total in-flight requests respect downstream channel capacity regardless of protocol mix.
- When HTTP is disabled, gRPC retains the original GlobalConcurrencyLimitLayer with no new overhead.

Why: The downstream bounded channel is the true bottleneck. If gRPC and HTTP had separate limits, a burst on one protocol could still overwhelm the shared channel. Sharing ensures backpressure is applied uniformly. Preserving GlobalConcurrencyLimitLayer for gRPC-only avoids introducing Arc overhead for deployments that don't need HTTP.

Reuse Existing Infrastructure
- Uses effect_handler.tcp_listener() for socket creation (inherits SO_REUSEPORT, keepalive settings)
- Shares AckRegistry for wait-for-result ACK/NACK flow
- Shares OtlpReceiverMetrics for unified observability

Why: Consistency and reduced maintenance. Operators see one set of metrics regardless of protocol; ACK/NACK behavior is identical; socket tuning is centralized in the engine.

Lazy Decode (Zero-Copy Path)
- HTTP body is wrapped as OtlpProtoBytes without deserialization, matching gRPC's lazy-decode strategy
- JSON content-type not implemented because it would require deserialization, breaking zero-copy
Send Bounds Trade-off
- Tonic requires Send futures; HTTP shares Arc<Mutex<...>> state with gRPC for metrics (OtlpReceiverMetrics) and ACK slots (AckRegistry) - these are necessarily Arc-wrapped because tonic's service handlers require Send + Sync
  HTTP uses tokio::spawn (not spawn_local) and Arc rather than Rc.
- Trade-off accepted to avoid duplicating metrics/ACK infrastructure

Why: Duplicating metrics/ACK state for a !Send HTTP path would add complexity and divergence. Since tonic already forces Send on the gRPC side, sharing state via Arc is the pragmatic choice. The atomic overhead is acceptable given the I/O-bound workload.

Semaphore-Based Admission Control
- HTTP uses semaphore.acquire_owned() with a timeout
- Permit timeout: uses the configured http.timeout if set, otherwise falls back to 5s
- If permit isn't acquired within timeout, returns 503 Service Unavailable
- This differs from gRPC's GlobalConcurrencyLimitLayer which rejects immediately at poll_ready

Why: HTTP doesn't have Tower middleware, so we use a raw semaphore. The timeout allows brief queuing during bursts (fairer than immediate rejection) while still bounding wait time. Immediate rejection would cause more client retries and load amplification.

Body Collection with Size Limits
- Uses http_body_util::Limited to enforce max_request_body_size during body collection
- Aborts early with 400 Bad Request if wire size exceeds limit
- Dual-limit enforcement: limit checked again after decompression to prevent decompression bombs

Why: HTTP/1.1 requires buffering the full body before processing (unlike gRPC streaming). Without limits, a malicious client could exhaust memory. Dual enforcement (wire + decompressed) defends against both large payloads and zip bombs where a small compressed payload expands to gigabytes.

Protobuf Only
- JSON content-type not implemented (can be added later if needed)
- Keeps initial scope focused; protobuf is the primary OTLP format

Why: JSON would require deserialization in the receiver, breaking the zero-copy strategy. Protobuf is the canonical OTLP format and what most SDKs use. JSON support can be added as an opt-in path later if there's demand.

TCP Socket Tuning for keep-alive
- hyper enables HTTP/1.1 keep-alive by default (connections reused across requests)
- TCP-level keep-alive is configurable via tcp_keepalive settings

Key Changes

New Module: crates/otap/src/otlp_http.rs — HTTP/1.1 server with POST /v1/{logs,metrics,traces}
Decompression: gzip, deflate, zstd via Content-Encoding
Config: Optional http: section; omit to keep gRPC-only behavior

Documentation

Updated docs/otlp-receiver.md

Configuration Example:

Nodes:
  receiver:
    plugin_urn: "urn:otel:otlp:receiver"
    config:
      listening_addr: "0.0.0.0:4317"
      max_concurrent_requests: 0  # auto-tune
      
      http:  # Optional - omit to disable HTTP
        listening_addr: "0.0.0.0:4318"
        max_request_body_size: "4MiB"
        accept_compressed_requests: true
        timeout: "30s"

Limitations

JSON content-type not supported (protobuf only)
HTTP/2 not supported on HTTP server (gRPC uses HTTP/2 separately via tonic)
Response compression not implemented
HTTP shares Arc-backed metrics/ACK state with tonic; a split !Send HTTP path would require separate state and isn't planned here.

codecov · 2026-01-12T20:38:33Z

Codecov Report

❌ Patch coverage is 83.30475% with 292 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.38%. Comparing base (bd62852) to head (d9c1fdf).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1765      +/-   ##
==========================================
- Coverage   84.40%   84.38%   -0.02%     
==========================================
  Files         496      499       +3     
  Lines      145393   147090    +1697     
==========================================
+ Hits       122716   124124    +1408     
- Misses      22143    22432     +289     
  Partials      534      534

Components	Coverage Δ
otap-dataflow	`85.63% <83.30%> (-0.05%)`	⬇️
query_abstraction	`80.61% <ø> (ø)`
query_engine	`90.52% <ø> (ø)`
syslog_cef_receivers	`∅ <ø> (∅)`
otel-arrow-go	`53.50% <ø> (ø)`
quiver	`90.66% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…o grpc-http-receiver

lquerel · 2026-01-13T07:04:00Z

@lalitb It's really great to have built-in HTTP support at the OTLP receiver level.

I'm waiting for a more detailed description of the PR and the approach taken before doing a deeper review. Thanks by advance.

lalitb · 2026-01-13T16:35:52Z

I'm waiting for a more detailed description of the PR and the approach taken before doing a deeper review. Thanks by advance.

Thanks @lquerel - This is still a draft while I finish benchmarks and a final review. I’ll add a detailed PR description and update the README with a summary of the changes shortly.

lalitb · 2026-01-14T18:44:53Z

rust/otap-dataflow/crates/otap/src/otap_grpc/otlp/server_new.rs

-            state: settings
-                .wait_for_result
-                .then(|| AckSlot::new(settings.max_concurrent_requests)),
+            state,


The refactor just moved the AckSlot construction out of ServerCommon::new into the receiver so HTTP and gRPC can share the same slot pool when HTTP wait-for-result is enabled.

lalitb · 2026-01-14T18:45:37Z

rust/otap-dataflow/crates/otap/src/otap_grpc/proxy.rs

-    let std_stream: std::net::TcpStream = socket.into();
-    std_stream.set_nonblocking(true)?;
-    TcpStream::from_std(std_stream)
+    socket_options::apply_socket_options(


socket tuning is now centralized in socket_options.rs because both the proxy and the OTLP/HTTP server need the same keepalive/nodelay configuration (tokio -> std -> socket2 -> std -> tokio dance). This keeps the settings consistent across listeners and avoids duplicating the conversion code.

lalitb · 2026-01-14T18:51:22Z

rust/otap-dataflow/crates/otap/src/otlp_http.rs

+                {
+                    if let Some(acceptor) = maybe_tls_acceptor.clone() {
+                        let shutdown = shutdown.clone();
+                        let _ = tracker.spawn(async move {


This uses TaskTracker/tokio::spawn, which forces these per-connection handlers to be Send. If we want !Send HTTP handlers, serve (the HTTP server function) would need to run on a LocalSet and use spawn_local with a local tracker for draining. Note that serve is currently treated as a Send future and shares Arc-backed metrics/semaphore with the tonic gRPC path (Send + Sync), so a !Send HTTP path would also require decoupling that shared state. This is documented in the otlp_receiver.md too.

lalitb · 2026-01-14T18:52:02Z

rust/otap-dataflow/crates/otap/src/otlp_http.rs

+                }
+
+                let shutdown = shutdown.clone();
+                let _ = tracker.spawn(async move {


see comment.

…o grpc-http-receiver

lalitb · 2026-01-14T20:29:41Z

I'm waiting for a more detailed description of the PR and the approach taken before doing a deeper review. Thanks by advance.
Thanks @lquerel - This is still a draft while I finish benchmarks and a final review. I’ll add a detailed PR description and update the README with a summary of the changes shortly.

@lquerel - Have updated the PR description, and also docs/otlp_receiver.md with approach and design details.

lquerel

I have not finished the review yet, but there is one point that is bothering me and that I would like to discuss before continuing. Maybe I do not yet have the full picture, so please take the following with a grain of salt.

If I understand correctly, when HTTP is enabled, gRPC switches from GlobalConcurrencyLimitLayer to SharedConcurrencyLayer. The difference lies in where the semaphore is enforced:

GlobalConcurrencyLimitLayer gates in poll_ready. When at capacity, the service is not ready, so HTTP/2 stops accepting new streams and applies backpressure. This effectively bounds the number of in-flight requests to the configured limit.
SharedConcurrencyLayer forwards poll_ready directly to the inner service and only waits on the semaphore inside call. This means the server can accept an unbounded number of new streams and spawn futures that then sit parked waiting for a permit. Those parked futures still own decoded request payloads and metadata, so memory usage grows with the number of queued requests.

Is my analysis accurate? If so, I think this is a real problem, because it means memory is effectively unbounded, which is something we want to avoid as much as possible. There is no fixed cap on the number of pending requests once the semaphore is saturated. With a default 4 MiB max message size, even a few thousand queued requests could turn into multiple gigabytes of memory.

I think we need to find a way to reintroduce backpressure at the poll_ready level while still using your shared semaphore.

To avoid any OOM risk, a possible fix would be to reintroduce backpressure in poll_ready while still sharing the semaphore. For example, SharedConcurrencyLayer could acquire a permit in poll_ready like GlobalConcurrencyLimitLayer does and stash it for call, so that a not-ready state propagates back to tonic and limits stream acceptance.

lalitb · 2026-01-15T21:26:09Z

I have not finished the review yet, but there is one point that is bothering me and that I would like to discuss before continuing. Maybe I do not yet have the full picture, so please take the following with a grain of salt.

If I understand correctly, when HTTP is enabled, gRPC switches from GlobalConcurrencyLimitLayer to SharedConcurrencyLayer. The difference lies in where the semaphore is enforced:

GlobalConcurrencyLimitLayer gates in poll_ready. When at capacity, the service is not ready, so HTTP/2 stops accepting new streams and applies backpressure. This effectively bounds the number of in-flight requests to the configured limit.

SharedConcurrencyLayer forwards poll_ready directly to the inner service and only waits on the semaphore inside call. This means the server can accept an unbounded number of new streams and spawn futures that then sit parked waiting for a permit. Those parked futures still own decoded request payloads and metadata, so memory usage grows with the number of queued requests.

Is my analysis accurate? If so, I think this is a real problem, because it means memory is effectively unbounded, which is something we want to avoid as much as possible. There is no fixed cap on the number of pending requests once the semaphore is saturated. With a default 4 MiB max message size, even a few thousand queued requests could turn into multiple gigabytes of memory.

I think we need to find a way to reintroduce backpressure at the poll_ready level while still using your shared semaphore.

To avoid any OOM risk, a possible fix would be to reintroduce backpressure in poll_ready while still sharing the semaphore. For example, SharedConcurrencyLayer could acquire a permit in poll_ready like GlobalConcurrencyLimitLayer does and stash it for call, so that a not-ready state propagates back to tonic and limits stream acceptance.

@lquerel - Thanks for flagging this. You're right: with the shared layer acquiring the permit inside call, gRPC can accept an unbounded number of HTTP/2 streams that park waiting for permits, each holding its request payload in memory. I'll fix this as you suggested.

On the HTTP side, we acquire the permit before collecting the body, with a timeout (default 5s or http.timeout). Pending connections can still accumulate if the timeout is set high, but they hold only connection state and headers - the body isn't collected until after the permit is granted. The MB-scale memory concern is specific to gRPC. Happy to add stricter connection gating for HTTP if needed.

lquerel

Thanks for fixing the unbounded memory issue.

…yHandle

lalitb added 3 commits January 7, 2026 15:04

initial commit

0ceaea7

add shared concurrency, and nit fixes

496718c

fix

d3e0f50

github-project-automation bot added this to OTel-Arrow Jan 12, 2026

github-actions bot added the rust Pull requests that update Rust code label Jan 12, 2026

lalitb changed the title ~~Grpc http receiver~~ Add support for OTLP/HTTP Receiver Jan 12, 2026

lalitb added 4 commits January 12, 2026 13:02

lint/clippy warnings

f3f98de

Merge branch 'main' into grpc-http-receiver

10b0373

doc lint

6557831

Merge branch 'grpc-http-receiver' of github.com:lalitb/otel-arrow int…

9a52222

…o grpc-http-receiver

lalitb added 5 commits January 13, 2026 21:08

add cancellation of in-flight during shutdown

260e6ae

markdown lint

f50e392

update doc to consistent with code

1e8519f

Merge branch 'main' into grpc-http-receiver

71e70ce

Merge branch 'main' into grpc-http-receiver

88562fa

lalitb commented Jan 14, 2026

View reviewed changes

lalitb added 3 commits January 14, 2026 11:26

reorganise doc

f35c574

Merge branch 'grpc-http-receiver' of github.com:lalitb/otel-arrow int…

ba15f59

…o grpc-http-receiver

Merge branch 'main' into grpc-http-receiver

854245f

lalitb marked this pull request as ready for review January 14, 2026 21:11

lalitb requested a review from a team as a code owner January 14, 2026 21:11

lalitb added 2 commits January 14, 2026 16:46

Merge branch 'main' into grpc-http-receiver

6d7bc30

Merge branch 'main' into grpc-http-receiver

5a1c10a

lquerel requested changes Jan 15, 2026

View reviewed changes

lalitb added 3 commits January 15, 2026 13:37

restore gRPC backpressure

54a592b

fix doc for backpressure

3fc39fe

Merge branch 'main' into grpc-http-receiver

c08ee6e

lquerel approved these changes Jan 16, 2026

View reviewed changes

lalitb added 2 commits January 16, 2026 10:35

Replaced the stale MetricsRegistryHandle usages with TelemetryRegistr…

3ef6cc9

…yHandle

Merge branch 'main' into grpc-http-receiver

d9c1fdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for OTLP/HTTP Receiver #1765

Add support for OTLP/HTTP Receiver #1765

lalitb commented Jan 12, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 12, 2026 •

edited

Loading

Uh oh!

lquerel commented Jan 13, 2026

Uh oh!

lalitb commented Jan 13, 2026

Uh oh!

lalitb Jan 14, 2026

Uh oh!

lalitb Jan 14, 2026 •

edited

Loading

Uh oh!

lalitb Jan 14, 2026

Uh oh!

lalitb Jan 14, 2026

Uh oh!

lalitb commented Jan 14, 2026

Uh oh!

lquerel left a comment

Uh oh!

lalitb commented Jan 15, 2026

Uh oh!

lquerel left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add support for OTLP/HTTP Receiver #1765

Are you sure you want to change the base?

Add support for OTLP/HTTP Receiver #1765

Conversation

lalitb commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Design Decisions

Key Changes

Documentation

Configuration Example:

Limitations

Uh oh!

codecov bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lquerel commented Jan 13, 2026

Uh oh!

lalitb commented Jan 13, 2026

Uh oh!

lalitb Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

lalitb Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lalitb Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

lalitb Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

lalitb commented Jan 14, 2026

Uh oh!

lquerel left a comment

Choose a reason for hiding this comment

Uh oh!

lalitb commented Jan 15, 2026

Uh oh!

lquerel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lalitb commented Jan 12, 2026 •

edited

Loading

codecov bot commented Jan 12, 2026 •

edited

Loading

lalitb Jan 14, 2026 •

edited

Loading