Sentry Native delays app termination if device is not online #929

csk-ableton · 2023-12-21T14:06:48Z

Description

Sentry Native slows down the termination of my executables in case the device is not online. The termination is delayed by the time defined with sentry_options_set_shutdown_timeout (2 seconds by default).

By default Sentry has Session Tracking enabled which slows down the process a little bit. However in case the device is not connected (the user disabled Wifi), sentry_close will run into the shutdown timeout, delaying the termination of the executable by 2 seconds by default.

I thought I can disable Session Tracking by using sentry_options_set_auto_session_tracking if I want to avoid communication when there is no error but that is not enough.

sentry_init will attempt to upload old runs in sentry__process_old_runs and the device will still run into the shutdown timeout when calling sentry_close. This happens in every run as long as the device is offline.

In my case the problem is quite severe, as multiple executables are started / terminated, some of which will terminate immediately if there is nothing to do.

Proposal:
Introduce an option sentry_options_set_auto_process_old_runs which stops uploading and pruning old runs completely. The app will only start communicating with a server when there is an actual error.
Introduce sentry_process_old_runs to manually process old runs at an appropriate point in time.

Alternatives considered:

Call sentry_init only once the device is online. This seems not to be in the spirit of the library. Crash reporting would be completely disabled while offline and I might miss interesting breadcrumbs until the point the device is registered as online.
Introduce sentry_set_uploading_enabled to disable uploading while the device is clearly offline. A basic version is easy to implement (same as user consent just without being persistent). A more complicated version would need to delay sentry__process_old_runs or even continue storing data until uploading is enabled again.

When does the problem happen

During build
During run-time
When capturing a hard crash

Environment

OS: Linux
Compiler: Clang 14
Backend: Crashpad
Transport: Curl

The text was updated successfully, but these errors were encountered:

kahest · 2024-01-04T09:49:58Z

Hey @csk-ableton, thanks for the detailed report and your patience during holidays. We'll discuss the use case and your suggested solutions and follow up here.

supervacuus · 2024-01-10T14:14:34Z

Sorry for not getting back to you sooner, @csk-ableton. Thanks for the detailed analysis, but I might need to bug you with further questions so that I can understand the problem better. The reason for this is that you connect a couple of dots that could be causing your issue as part of their interaction or on their own.

It is very surprising that you run into a timeout if the network interface is disabled. I would expect that such an attempt is in the low 100 microseconds range, so even if you have many queued up requests it should not easily reach a 2 second timeout.

If you enable the debug logs, do you see how curl fails (curl_easy_perfom should provide a detailed message like "Could not resolve host", etc.)? In that case you can also see if the background thread for sending envelopes is actually reaching that timeout (it would log "background thread failed to shut down cleanly within timeout").

Sorry, that I am asking this, but before we consider adding an API i want to be absolutely sure that this is the root-cause.

In any case, considering your solution proposals: I think i would prefer a generic approach that would pause uploading in the worker thread (what you aptly named sentry_set_uploading_enabled(), because that would cover all uploads (also the ones issued by sentry__process_old_runs() provided we expose an initial state to the options), besides the ones done by the crashpad_handler upload thread.

How would you compare it given you suggested the above "only" as an alternative to deferring the execution of sentry__process_old_runs() to after sentry_init() and exposing an public interface that would allow to run it later on?

With all deferred deliveries we have to be very careful as to no affect user-consent, but envelopes that reached the transport background worker, must have been given consent and I currently see no way how consent could be applied by accident retroactively.

csk-ableton · 2024-01-10T14:58:01Z

Sorry for not getting back to you sooner, @csk-ableton. Thanks for the detailed analysis, but I might need to bug you with further questions so that I can understand the problem better. The reason for this is that you connect a couple of dots that could be causing your issue as part of their interaction or on their own.

No problem for the delay, that sounds very reasonable and thank you for your reply!

It is very surprising that you run into a timeout if the network interface is disabled. I would expect that such an attempt is in the low 100 microseconds range, so even if you have many queued up requests it should not easily reach a 2 second timeout.

In my case the device has multiple network interfaces, one of which is Ethernet via USB which is usually not connected to the internet but just a user's computer. That might explain why it doesn't show your expected behavior? I'm not familiar with the exact behavior of Curl here. I noticed that no timeouts are provided for the call to curl_easy_perfom so the request will run forever in my case. I was a bit surprised by that as it might create a potential large background worker queue. I did use debug logs and actual debugging to verify. I run into your mentioned message "background thread failed to shut down cleanly within timeout" on shutdown.

In any case, considering your solution proposals: I think i would prefer a generic approach that would pause uploading in the worker thread (what you aptly named sentry_set_uploading_enabled(), because that would cover all uploads (also the ones issued by sentry__process_old_runs() provided we expose an initial state to the options), besides the ones done by the crashpad_handler upload thread.

How would you compare it given you suggested the above "only" as an alternative to deferring the execution of sentry__process_old_runs() to after sentry_init() and exposing an public interface that would allow to run it later on?

I would prefer such a generic API as well. It would mean that all my executables are guaranteed to terminate fast while uploading is disabled.

However there is still a remaining reason for me to be in control of processing old runs manually. I should now explain that I'm sharing the same Sentry database across all my executables in order for them to share the user consent setting. This however means that each executable is trying to process old runs even from other executables. It again means that my short-lived executables will potentially start uploading data even though they didn't run into any error. Maybe my setup is a bit too special with the shared database, but I would prefer if my executables would not all try to upload each others last run.

Another thought I had was the need for processing old runs at all in my case. At least when using the Crashpad handler it seems that uploading data when the device is online and dropping all data when it's offline would be simple and sufficient. But I'm probably missing the complete picture.

With all deferred deliveries we have to be very careful as to no affect user-consent, but envelopes that reached the transport background worker, must have been given consent and I currently see no way how consent could be applied by accident retroactively.

That makes sense, I just want to point out that currently due to the missing timeout in the Curl transport, a request can be queued for a very long time and still be sent even if the user revoked their consent in the meantime.

I hope even though my setup is more special it's a useful request.

supervacuus · 2024-01-22T14:06:03Z

You're right, both approaches can make sense in independent scenarios. We'll talk about how to prioritize this internally.

cc @kahest

github-project-automation bot added this to Mobile & Cross Platform SDK Dec 21, 2023

getsantry bot added the Waiting for: Product Owner label Dec 21, 2023

getsantry bot added this to GitHub Issues with 👀 2 Dec 21, 2023

github-project-automation bot moved this to Needs Discussion in Mobile & Cross Platform SDK Dec 21, 2023

getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 2 Dec 21, 2023

getsantry bot removed the Waiting for: Product Owner label Jan 4, 2024

getsantry bot removed the status in GitHub Issues with 👀 2 Jan 4, 2024

supervacuus added enhancement New feature or request area: api area: core labels Jan 8, 2024

getsantry bot added the Waiting for: Product Owner label Jan 10, 2024

getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 2 Jan 10, 2024

kahest removed the Waiting for: Product Owner label Jan 12, 2024

getsantry bot removed the status in GitHub Issues with 👀 2 Jan 12, 2024

kahest added the Platform: Native label Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentry Native delays app termination if device is not online #929

Sentry Native delays app termination if device is not online #929

csk-ableton commented Dec 21, 2023

kahest commented Jan 4, 2024

supervacuus commented Jan 10, 2024

csk-ableton commented Jan 10, 2024

supervacuus commented Jan 22, 2024

Sentry Native delays app termination if device is not online #929

Sentry Native delays app termination if device is not online #929

Comments

csk-ableton commented Dec 21, 2023

kahest commented Jan 4, 2024

supervacuus commented Jan 10, 2024

csk-ableton commented Jan 10, 2024

supervacuus commented Jan 22, 2024