Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tracing on process exit; instrument Import command #47

Closed
wants to merge 3 commits into from

Conversation

Quantumplation
Copy link
Contributor

I was playing around with instrumentation to get a feel for it, and test out my fix to the Grafana / tempo setup, and I wanted something that was a bit more nested than what we had in the sync, so I instrumented the import command.

This command is likely temporary, but served as a good test-bed / example for tracing. Indeed, it allowed me to discover a bug in capturing trace information when the process exits before the tracing provider has had a chance to export it from the process.

Now, Grafana consistently reports the full traces for an import, including in the presence of errors, etc.

image

See the commit messages for more details; in particular, I'm not attached to this specific instrumentation, so feel free to suggest changes if you think it's overboard.

This branch is based on top of my pi/fix-tempo branch, so also includes that fix.

Because we're using the batch provider, and span information is sent
when the span *exits*, if we just let the process exit immediately, we
might lose some tracing data;

The [recommended pattern](open-telemetry/opentelemetry-rust#1961 (comment)) is to hold onto the providers and shut them down manually
as the process exits. This will wait for any spans to finish shipping
and avoid losing data.

Note, that we might want another pass at this in the future:
 - integrate it into the panic handler that I added in another branch
 - integrate something like [Tokio Graceful Shutdown](https://docs.rs/tokio-graceful-shutdown/latest/tokio_graceful_shutdown/) to intercept ctrl+C and the like
 - add a timeout, so that a stalled metrics writer doesn't wait forever

I kept it simple for this PR, but just something we should keep in mind
This command is short lived, and @KtorZ mentioned it might go away;
still, to experiment with structured spans that go a bit deeper than
elsewhere in the app, I went ahead and matched the natural structure of
the code that was present in the comments / serialization format.

Using {} blocks allows us to avoid calling .exit manually on spans, and
highlights the natural structure of the code.

I'm not super attached to the specifics of these commands, so feel free
to suggest changes or even strip them out entirely, but I wanted
something substantial to experiment with when fixing the grafana
integration, and a bug around missing traces on exit.
}
})
.into_diagnostic()?;
}

Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the effort in this file but I still want to cry a little because I greatly reworked that file in #42.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah that's fine we can discard the changes; like I said, it was mostly an interesting test before for me to make sure tracing was working etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still keep the shutdown fix though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said, cherry-picked already 🫡

@KtorZ
Copy link
Contributor

KtorZ commented Jan 4, 2025

(I've cherry-picked your first commit).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants