-
Notifications
You must be signed in to change notification settings - Fork 524
[Bug]: Shutdown doesn't flush when used with global subscriber #1961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That's expected behavior. The |
It's kind of misleading for
If current behavior (no shutdown if there are other references) is intentional and should remain, then I think it'd be helpful to change the documentation (explicitly state that it's not going to do anything if there's more references to the tracer provider, suggest calling
In previous |
|
Here's a test against #![cfg(feature = "rt-tokio")]
use futures_util::future::BoxFuture;
use opentelemetry::global as otel_global;
use opentelemetry::trace::{TracerProvider as _, Tracer as _};
use opentelemetry_sdk::{
export::trace::{ExportResult, SpanData, SpanExporter},
runtime,
trace::TracerProvider,
};
use tokio::runtime::Runtime;
use std::sync::{Arc, Mutex};
#[derive(Clone, Debug, Default)]
struct TestExporter(Arc<Mutex<Vec<SpanData>>>);
impl SpanExporter for TestExporter {
fn export(&mut self, mut batch: Vec<SpanData>) -> BoxFuture<'static, ExportResult> {
let spans = self.0.clone();
Box::pin(async move {
if let Ok(mut inner) = spans.lock() {
inner.append(&mut batch);
}
Ok(())
})
}
}
fn test_tracer(runtime: &Runtime) -> (TracerProvider, TestExporter) {
let _guard = runtime.enter();
let exporter = TestExporter::default();
let provider = TracerProvider::builder()
.with_batch_exporter(exporter.clone(), runtime::Tokio)
.build();
(provider, exporter)
}
#[test]
fn shutdown_global() {
let rt = Runtime::new().unwrap();
let (provider, exporter) = test_tracer(&rt);
otel_global::set_tracer_provider(provider);
let tracer = otel_global::tracer("test");
for _ in 0..1000 {
tracer.start("test_span");
}
// drop(tracer);
// Should flush all batched telemetry spans
otel_global::shutdown_tracer_provider();
let spans = exporter.0.lock().unwrap();
assert_eq!(spans.len(), 1000);
} Dropping the tracer before the call to As currently implemented, the behavior of |
Here's a test to exercise this. However, it does produce an error printout when the tracer instance is kept alive past the shutdown. #[test]
fn shutdown_in_scope() {
let rt = Runtime::new().unwrap();
let (provider, exporter) = test_tracer(&rt);
let tracer = provider.tracer("test");
for _ in 0..1000 {
tracer.start("test_span");
}
// drop(tracer);
// Should flush all batched telemetry spans
provider.shutdown().unwrap();
let spans = exporter.0.lock().unwrap();
assert_eq!(spans.len(), 1000);
} There is also a case of a lockup in the
So it looks like we're not safe with the |
I just got bit by this too, and took me a while to find a work around, I think at least the examples / docs need to be updated. This was my workaround pub struct OtelGuard {
tracer_provider: opentelemetry_sdk::trace::TracerProvider,
}
impl Drop for OtelGuard {
fn drop(&mut self) {
println!("Dropping OtelGuard!");
println!("Shutting down TracerProvider!");
self.tracer_provider.shutdown().expect("TracerProvider should shutdown properly");
}
}
pub fn setup_tracing_subscriber() -> anyhow::Result<OtelGuard> {
let tracer_provider = opentelemetry_sdk::trace::TracerProvider::builder()
.with_batch_exporter(
SpanExporter::builder()
.with_tonic()
.with_endpoint("grpc://localhost:4317")
.build()?,
runtime::Tokio,
)
.build();
global::set_tracer_provider(tracer_provider.clone());
tracing::subscriber::set_global_default(
tracing_subscriber::registry()
.with(tracing_subscriber::fmt::layer())
.with(OpenTelemetryLayer::new(tracer_provider.tracer(TRACER_NAME))),
)
.unwrap();
Ok(OtelGuard { tracer_provider })
} main: #[tokio::main]
async fn main() -> anyhow::Result<()> {
let _otel_guard = otel::setup_tracing_subscriber()?;
let root_span = info_span!("test");
let _span_gaurd = root_span.enter();
let fruit = "watermelon";
tracing::debug!("Fruit was {fruit}");
tracing::info!("Fruit was {fruit}");
tracing::warn!("Fruit was {fruit}");
tracing::error!("Fruit was {fruit}");
Ok(())
} |
To clarify, the |
@lalitb, @cijothomas - Would it be welcome to update some of the tracing examples to show this more clearly? I could see this being very frustrating to new comers of the crate trying to get it working from examples and traces not being sent.
just updated my example snippet to be more accurate |
@pitoniak32 I'd suggest to take a look at the Metrics example and see if it makes sense and is easy to use. If yes, lets replicate the same to Traces. This was always the plan, just that Metrics (and logs) made progress, and we left out tracing. (Sorry, we just didn't have the bandwidth to tackle all, but I realize now it'd have been better to not leave high level inconsistency like this :( ) The overall idea is: Metrics and Logs are already working like this. (Small difference: Logs does not have the need to set global) @pitoniak32 Something you can help to make happen? |
Yeah I can take a look at this! Makes sense, thank you for the outline! Totally understand, there's a ton of stuff to maintain in here! |
I think this issue can be closed now. @pitoniak32 do you see anything remaining to be tackled for this? |
Because we're using the batch provider, and span information is sent when the span *exits*, if we just let the process exit immediately, we might lose some tracing data; The [recommended pattern](open-telemetry/opentelemetry-rust#1961 (comment)) is to hold onto the providers and shut them down manually as the process exits. This will wait for any spans to finish shipping and avoid losing data. Note, that we might want another pass at this in the future: - integrate it into the panic handler that I added in another branch - integrate something like [Tokio Graceful Shutdown](https://docs.rs/tokio-graceful-shutdown/latest/tokio_graceful_shutdown/) to intercept ctrl+C and the like - add a timeout, so that a stalled metrics writer doesn't wait forever I kept it simple for this PR, but just something we should keep in mind
Because we're using the batch provider, and span information is sent when the span *exits*, if we just let the process exit immediately, we might lose some tracing data; The [recommended pattern](open-telemetry/opentelemetry-rust#1961 (comment)) is to hold onto the providers and shut them down manually as the process exits. This will wait for any spans to finish shipping and avoid losing data. Note, that we might want another pass at this in the future: - integrate it into the panic handler that I added in another branch - integrate something like [Tokio Graceful Shutdown](https://docs.rs/tokio-graceful-shutdown/latest/tokio_graceful_shutdown/) to intercept ctrl+C and the like - add a timeout, so that a stalled metrics writer doesn't wait forever I kept it simple for this PR, but just something we should keep in mind
Because we're using the batch provider, and span information is sent when the span *exits*, if we just let the process exit immediately, we might lose some tracing data; The [recommended pattern](open-telemetry/opentelemetry-rust#1961 (comment)) is to hold onto the providers and shut them down manually as the process exits. This will wait for any spans to finish shipping and avoid losing data. Note, that we might want another pass at this in the future: - integrate it into the panic handler that I added in another branch - integrate something like [Tokio Graceful Shutdown](https://docs.rs/tokio-graceful-shutdown/latest/tokio_graceful_shutdown/) to intercept ctrl+C and the like - add a timeout, so that a stalled metrics writer doesn't wait forever I kept it simple for this PR, but just something we should keep in mind
Because we're using the batch provider, and span information is sent when the span *exits*, if we just let the process exit immediately, we might lose some tracing data; The [recommended pattern](open-telemetry/opentelemetry-rust#1961 (comment)) is to hold onto the providers and shut them down manually as the process exits. This will wait for any spans to finish shipping and avoid losing data. Note, that we might want another pass at this in the future: - integrate it into the panic handler that I added in another branch - integrate something like [Tokio Graceful Shutdown](https://docs.rs/tokio-graceful-shutdown/latest/tokio_graceful_shutdown/) to intercept ctrl+C and the like - add a timeout, so that a stalled metrics writer doesn't wait forever I kept it simple for this PR, but just something we should keep in mind
Closing this issue. All examples are modified to show the right way to perform shutdown() now. |
What happened?
As mentioned in #1625 -
Tracer
now holds strong reference toTracerProvider
.When opentelemetry used as a layer with global tracing subscriber it is now impossible to shutdown properly (it only decrements a reference, but doesn't execute Drop).
As a result some spans are missing, flattened, etc.
EDIT: Possible workaround is to flush manually:
API Version
0.24.0
SDK Version
0.24.1
What Exporter(s) are you seeing the problem on?
OTLP
Relevant log output
No response
The text was updated successfully, but these errors were encountered: