Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentry Native delays app termination if device is not online #929

Open
1 of 3 tasks
csk-ableton opened this issue Dec 21, 2023 · 4 comments
Open
1 of 3 tasks

Sentry Native delays app termination if device is not online #929

csk-ableton opened this issue Dec 21, 2023 · 4 comments

Comments

@csk-ableton
Copy link

Description

Sentry Native slows down the termination of my executables in case the device is not online. The termination is delayed by the time defined with sentry_options_set_shutdown_timeout (2 seconds by default).

By default Sentry has Session Tracking enabled which slows down the process a little bit. However in case the device is not connected (the user disabled Wifi), sentry_close will run into the shutdown timeout, delaying the termination of the executable by 2 seconds by default.

I thought I can disable Session Tracking by using sentry_options_set_auto_session_tracking if I want to avoid communication when there is no error but that is not enough.

sentry_init will attempt to upload old runs in sentry__process_old_runs and the device will still run into the shutdown timeout when calling sentry_close. This happens in every run as long as the device is offline.

In my case the problem is quite severe, as multiple executables are started / terminated, some of which will terminate immediately if there is nothing to do.

Proposal:
Introduce an option sentry_options_set_auto_process_old_runs which stops uploading and pruning old runs completely. The app will only start communicating with a server when there is an actual error.
Introduce sentry_process_old_runs to manually process old runs at an appropriate point in time.

Alternatives considered:

  • Call sentry_init only once the device is online. This seems not to be in the spirit of the library. Crash reporting would be completely disabled while offline and I might miss interesting breadcrumbs until the point the device is registered as online.
  • Introduce sentry_set_uploading_enabled to disable uploading while the device is clearly offline. A basic version is easy to implement (same as user consent just without being persistent). A more complicated version would need to delay sentry__process_old_runs or even continue storing data until uploading is enabled again.

When does the problem happen

  • During build
  • During run-time
  • When capturing a hard crash

Environment

  • OS: Linux
  • Compiler: Clang 14
  • Backend: Crashpad
  • Transport: Curl
@kahest
Copy link
Member

kahest commented Jan 4, 2024

Hey @csk-ableton, thanks for the detailed report and your patience during holidays. We'll discuss the use case and your suggested solutions and follow up here.

@supervacuus
Copy link
Collaborator

Sorry for not getting back to you sooner, @csk-ableton. Thanks for the detailed analysis, but I might need to bug you with further questions so that I can understand the problem better. The reason for this is that you connect a couple of dots that could be causing your issue as part of their interaction or on their own.

It is very surprising that you run into a timeout if the network interface is disabled. I would expect that such an attempt is in the low 100 microseconds range, so even if you have many queued up requests it should not easily reach a 2 second timeout.

If you enable the debug logs, do you see how curl fails (curl_easy_perfom should provide a detailed message like "Could not resolve host", etc.)? In that case you can also see if the background thread for sending envelopes is actually reaching that timeout (it would log "background thread failed to shut down cleanly within timeout").

Sorry, that I am asking this, but before we consider adding an API i want to be absolutely sure that this is the root-cause.

In any case, considering your solution proposals: I think i would prefer a generic approach that would pause uploading in the worker thread (what you aptly named sentry_set_uploading_enabled(), because that would cover all uploads (also the ones issued by sentry__process_old_runs() provided we expose an initial state to the options), besides the ones done by the crashpad_handler upload thread.

How would you compare it given you suggested the above "only" as an alternative to deferring the execution of sentry__process_old_runs() to after sentry_init() and exposing an public interface that would allow to run it later on?

With all deferred deliveries we have to be very careful as to no affect user-consent, but envelopes that reached the transport background worker, must have been given consent and I currently see no way how consent could be applied by accident retroactively.

@csk-ableton
Copy link
Author

Sorry for not getting back to you sooner, @csk-ableton. Thanks for the detailed analysis, but I might need to bug you with further questions so that I can understand the problem better. The reason for this is that you connect a couple of dots that could be causing your issue as part of their interaction or on their own.

No problem for the delay, that sounds very reasonable and thank you for your reply!

It is very surprising that you run into a timeout if the network interface is disabled. I would expect that such an attempt is in the low 100 microseconds range, so even if you have many queued up requests it should not easily reach a 2 second timeout.

In my case the device has multiple network interfaces, one of which is Ethernet via USB which is usually not connected to the internet but just a user's computer. That might explain why it doesn't show your expected behavior? I'm not familiar with the exact behavior of Curl here. I noticed that no timeouts are provided for the call to curl_easy_perfom so the request will run forever in my case. I was a bit surprised by that as it might create a potential large background worker queue. I did use debug logs and actual debugging to verify. I run into your mentioned message "background thread failed to shut down cleanly within timeout" on shutdown.

In any case, considering your solution proposals: I think i would prefer a generic approach that would pause uploading in the worker thread (what you aptly named sentry_set_uploading_enabled(), because that would cover all uploads (also the ones issued by sentry__process_old_runs() provided we expose an initial state to the options), besides the ones done by the crashpad_handler upload thread.

How would you compare it given you suggested the above "only" as an alternative to deferring the execution of sentry__process_old_runs() to after sentry_init() and exposing an public interface that would allow to run it later on?

I would prefer such a generic API as well. It would mean that all my executables are guaranteed to terminate fast while uploading is disabled.

However there is still a remaining reason for me to be in control of processing old runs manually. I should now explain that I'm sharing the same Sentry database across all my executables in order for them to share the user consent setting. This however means that each executable is trying to process old runs even from other executables. It again means that my short-lived executables will potentially start uploading data even though they didn't run into any error. Maybe my setup is a bit too special with the shared database, but I would prefer if my executables would not all try to upload each others last run.

Another thought I had was the need for processing old runs at all in my case. At least when using the Crashpad handler it seems that uploading data when the device is online and dropping all data when it's offline would be simple and sufficient. But I'm probably missing the complete picture.

With all deferred deliveries we have to be very careful as to no affect user-consent, but envelopes that reached the transport background worker, must have been given consent and I currently see no way how consent could be applied by accident retroactively.

That makes sense, I just want to point out that currently due to the missing timeout in the Curl transport, a request can be queued for a very long time and still be sent even if the user revoked their consent in the meantime.

I hope even though my setup is more special it's a useful request.

@supervacuus
Copy link
Collaborator

You're right, both approaches can make sense in independent scenarios. We'll talk about how to prioritize this internally.

cc @kahest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Status: Needs Discussion
Development

No branches or pull requests

3 participants