-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Dedicated Threads for BatchExportProcessor and PeriodicReader #2386
Comments
This is an interesting one! Some thoughts:
Feature Flags
What if this was the other way around - Networking Libraries
Hyper is async-first, and tonic rests on hyper for gRPC. We'd have to change libraries here. Concurrent runtimes as an alternative? An alternative but potentially fraught option would be allowing the user to request a separate Tokio or async-std runtime. This would let us fence ourselves off from the "application" runtime, but keep all the async niceties and libraries, and not have to worry about thread management for e.g. parallel export. The change would then be essentially an override to specify a different runtime on the builders. However, this'd prevent us from providing a "no-async" runtime, if that was a goal. |
Excellent point! As of now it'll be just one thread per PeriodicReader/BatchExportProcessor. If the throughput is not sufficient, we'll have to resort to some techniques to improve. If the solution is to use more threads or do some multiplexing, then it should be capped based on user preference. See more in my next reply below.
OTel .NET works exactly as proposed in this issue. It spins up own thread and makes blocking calls from it. So far, I am not aware of any issue with throughput. (IIUC, other languages are also working this way, but I need to check more to get confirmation)
I don't think this will be measurably high, as its one thread per signal, so at most 3 threads.
Correct. We currently mandates an async runtime for OTLP scenarios.
This was intentional! The current runtime abstractions and implementations for tokio, async-std exposes a lot of public API ( The dedicated background thread was already implemented for Metrics (Logs is open PR), and I have been testing it for last several weeks. Additional note about high throughput scenarios: For extremely high throughput scenarios, especially when running on powerful machines with 100s of CPU cores - BatchExportProcessor is really a bottleneck. Within Microsoft, we leverage exporters like ETW for such high throughput requirements. They have no batching need and no contention, and hence scale extremely well. Unfortunately, OTel Collector currently does not support accept ETW (or Linux counterpart user_events), but this is something I expect to change in next several months.
We can probably keep the same ones, but make some tweaks to make this work. Based on my own testing, OTLP/gRPC with tonic works fine when called from the background thread, as long the channels/clients were initialized in a tokio context. i.e if user's main application is tokio::main, that is good enough. Of course, we have to test and make sure this is robust enough!
Agree with the merits of this. I believe this can be an additive change post 1.0, if we see demand for it and prove it is better than own thread. The issue with relying on tokio/runtime thread pool for telemetry is that telemetry flow can be affected by thread pool issues when user makes mistakes... This is a well known problem, and I think it'd be good to make a statement like "OTel is a telemetry library that people often use to diagnose issues within their application. OTel's default design ensures that it can continue to operate even when the application's own thread pool misbehaves and OTel can report metrics/logs about this misbehavior, rather than OTel also stuck without the ability to report telemetry." We never wrote these down explicitly, but a lot of the implementations (esp. Metrics) is already ensuring this (and still improving) |
Hey @cijothomas thanks for taking the time 💪
If we want to support no-async, then this bit is a bit more nuanced, I reckon? :
... but the work @lalitb 's done using futures_executor on #2390 seems relevant. To put a line under it, we have:
What's next - can we unbundle the work in here a bit? |
I'll open separate issues for each of the tasks to be done to close this issue soon. There will be many items that we can work in parallel. |
Yes. Its is post 1.0 release. (I am not even sure if it will be a minimal implementation of a runtime, or something simpler than that or something else. As mentioned earlier, lot of public APIs exposed, we need time to make sure they are all needed.) |
For the 1.0 release:
Use dedicated threads for BatchExportProcessor and PeriodicReader. This is done to improve usability, telemetry reliability, simplify debugging, and eliminate issues stemming from asynchronous runtime dependencies and/or improper usage. Existing runtime integrations (eg:
rt-tokio
etc. will remain available under an experimental feature flag to ensure backward compatibility and allow time for further refinement.Dedicated Background Threads as Default:
BatchExportProcessor
andPeriodicReader
will use dedicated background threads by default.Existing Runtime - Offer under Experimental Feature flag:
Networking Library Issues:
reqwest
,hyper
, andtonic
used by our core Exporter (OTLP) present challenges when operating outside async runtimes:reqwest
: Replace withreqwest-blocking
for dedicated threads. Given that the thread being blocked is OpenTelemetry's own background thread, this should not be a concern.hyper
andtonic
: Evaluate their compatibility with dedicated threads and find appropriate solutions. If certain clients cannot be supported, users requiring those clients can revert to the experimental runtime feature.Advantages of Dedicated Threads
Improved Reliability:
Eliminate a class of bugs:
Simplified Debugging:
Ease of onboarding
Implementation Plan
Start with Metrics:
PeriodicReaderWithOwnThread
, which already spawns its own thread, as a starting point.BatchExportProcessor:
Migration for users
rt
argument to BatchExportProcessor and PeriodicReader. The defaults are sufficient for most scenarios.The text was updated successfully, but these errors were encountered: