Panicking tasks should abort process if not handled #519

MichaelGG · 2014-12-13T01:52:19Z

RFC for issue opened here: rust-lang/rust#19610

bstrie · 2014-12-13T16:02:17Z

I must have been out of the loop here for a long time, because I could have sworn that a panic in a child task used to panic the parent task unless you explicitly spawned an unlinked task. When did this change?

SimonSapin · 2014-12-13T16:17:58Z

@bstrie This was called linked failure and was removed at some point.

sfackler · 2014-12-13T18:11:18Z

I think linked failure was axed when newrt was added.

thestinger · 2014-12-13T18:15:13Z

It's not possible to have robust linked failure in Rust. The best it can do is add a hook in a bunch of standard libraries functions to check for a panic and propagate it, and that would have a significant overhead.

thestinger · 2014-12-13T18:16:13Z

It also won't map to the new thread model, since it's not going to have a strict tree hierarchy.

thestinger · 2014-12-13T18:19:40Z

0000-panic-default-abort.md

+
+# Summary
+
+Currently, a spawned task that panics does not abort the process. A panic on a task is simply silently discarded. This is inappropriate behaviour, as the user's program may not be in a desired working state (a sort of zombie process). Instead, failing tasks should fail the entire process unless explicitly opted out of.


Silencing logic errors is certainly not a good thing. The current hard-wired support for printing to a stream at the panic site instead of leaving this up to the code handling the failure is also a bad design.

thestinger · 2014-12-13T18:36:48Z

I agree with avoiding implicit silencing of logic errors. However, I'm strongly against merging it in the current form where it argues for unrestricted exceptions (no memory boundaries for panics) and against aborting on logic errors. Those issues have a tenuous connection to this one at best and tying it to those makes it far more controversial than it would be otherwise.

I think there's an overwhelming consensus against exceptions as a mechanism for handling runtime errors, and even the existing unwinding support does not have community consensus behind it.

MichaelGG · 2014-12-14T05:41:38Z

What should Rust do when a task panics, then? Because the current system just ignores errors. What's so bad about aborting unless the user specified the task is non-critical? spawn versus "spawn_bg" or "spawn_try" or something along those lines. What else can you do when there's an unhandled problem in your code?

I'll delete references to unwinding and poisoning to make things clearer, but I wasn't proposing any actual changes, just including them as flavour text to provide context.

thestinger · 2014-12-14T05:54:13Z

I do think it should abort on logic errors. The option of opting into exceptions can exist, but silencing errors by default doesn't make sense and paying the compile-time / performance cost by default for a feature most applications don't need doesn't make sense either.

MichaelGG · 2014-12-14T06:19:52Z

Sounds like we're in agreement? The whole unwinding thing is out of scope of this RFC. I only mentioned it half tongue-in-cheek because I cannot see any downsides to not silently ignoring errors.

What should I change to get this PR merged?

Ericson2314 · 2014-12-14T22:03:45Z

I have always been a little partial to @thestinger's position that we should just scrap dynamic unwinding altogether and do a mandatory process abort. What rust code relies currently relies on unwinding?

ben0x539 · 2014-12-16T16:31:27Z

@Ericson2314 Skylight, afaik :P

reem · 2014-12-16T17:31:33Z

@Ericson2314 skylight, servo, rustc, iron, hyper, the list goes on.

MichaelGG · 2014-12-17T00:05:27Z

I removed the distracting comments about poisoning and unwinding in general. What is missing from this PR?

alexchandel · 2014-12-17T04:25:35Z

Unwinding is evil. All panics should abort the process. This is a step in the right direction.

@Ericson2314 All those things could be modified to not rely on unwinding, just as they are all modified every time a breaking change lands.

reem · 2014-12-17T07:18:28Z

@alexchandel that simply isn't true. There are several problem domains were aborting the process on something as simple as a bounds check is simply not acceptable. For instance, skylight is meant to be embedded into long-running production rails applications, and having it abort a running server because of an internal error is completely unacceptable.

Additionally, all of todays web servers rely on unwinding and panic isolation to ensure that they aren't taken down by spurious failures like a request failing etc. Even though those things should use Result, in practice there are many cases were propagating the error just generates a lot of boilerplate to just "fail later", a place where panic! currently shines.

pcwalton · 2014-12-17T07:37:58Z

Let's not turn this into the "remove unwinding from Rust" debate. If you want that debate, please file an RFC to remove unwinding from Rust.

MichaelGG · 2014-12-19T20:44:00Z

Does anyone have any arguments against aborting on "unhandled" panics? Should I attempt to write a patch to add this functionality to stdlib? With 1.0 coming, and this being a noticeable breaking change, I want to make sure I've done all needed to warrant proper inclusion.

thestinger · 2014-12-19T20:57:21Z

Aborting on unhandled panics and only printing an error in that case would make it a lot nicer. You wouldn't get errors printed to stderr if you were catching the panics. I don't know if other people feel the same way though.

aturon · 2014-12-19T21:10:00Z

@MichaelGG Just a heads-up: I've volunteered to shepherd this RFC, and have already talked some with other core team members about the issues you're raising. I will try to get you some detailed feedback in the near future, but it may be delayed by a week due to holidays.

BTW, you may be interested in the PR I just landed, which revamps Rust's notion of threads (removing tasks) and makes joining a child thread and extracting its result the default -- you have to explicitly detach if you don't want to join.

Anyway, more feedback soon, just wanted to let you know we're listening and thinking about this.

MichaelGG · 2014-12-20T01:04:03Z

Hi @aturon , thank you very much for the response. I didn't want to nag, just make sure I hadn't gotten this forgotten. Would it be worthwhile for me to attempt creating a patch for the threading system to support this RFC, or is that code that you or another involved team member would write?

aturon · 2014-12-30T00:37:03Z

@MichaelGG No, I don't think a patch is needed at this point, though if we decided to go this direction you'd be more than welcome to be involved in the implementation!

(I'm digging myself out of my inbox and should be able to respond more substantially to this RFC in the next couple of days.)

aturon · 2015-02-04T19:13:25Z

@MichaelGG I'm catching up on RFC shepherding and wanted to leave some thoughts on this RFC.

Exception safety

One of the basic issues raised here is dealing with exception safety when data is shared between threads. High-level communication between threads -- locks, semaphores, channels, and so on -- all have built-in support for "handling" a panic from other threads by propagation. While this does not force the other threads to panic, it makes clear that a panic has occurred and forces you to opt-in to continuing.

On the other hand, low-level atomics do not provide any such built-in measures. But data manipulated directly via low-level atomics is generally kept in an invariant-maintaining state at all times (i.e., with each atomic update), because other threads can observe the state at all times. That's one of the basic principles behind lock-free data structures. So the exception safety concern is much more limited in such cases.

What does it mean for a panic on a child thread to be caught?

Connected to the above, to make this idea work in practice we'd need a fuller picture of what it means to "handle" a panic from a child thread. The RFC talks about try_future (which is now basically Thread::scoped), but this is by no means the only way to detect and deal with a panic from another thread. As mentioned above, channels, mutexes, and other synchronization constructs also propagate panics.

It seems very unfortunate for a panic to be detected by another thread via a channel, but for the child thread to still abort the process. To get around that, all of the above synchronization constructs would have to set some kind of thread-local "panic handled" thread, which itself seems somewhat hokey and error-prone (consider that all future synchronization constructs would need to do this as well).

Layering on the `abort`

On the other hand, if abort is desired, it's very easy to add via a guard wrapping the body of spawned threads. Once can even create a simple wrapper API around spawn to do this. So this is really just a question of defaults.

To me, asynchronously aborting the process is a somewhat aggressive and opinionated default, and I think it may be reasonable to ask people to opt in to it. (There's precedent for that in some other languages as well.)

Future-proofing

Above I'm arguing that we shouldn't take this RFC, but what happens if the above arguments turn out to be wrong for some reason or another?

I think in the long run, we may well want to explore something like C++'s std::terminate, i.e. a customizable action that's fired whenever an exception is uncaught (though again, "catching" is a bit harder to define in Rust). That would make it even easier for people to choose their desired behavior in a global way, rather than trying to find a one-size-fits-all solution.

nikomatsakis · 2015-02-04T21:15:57Z

I am feeling a bit torn here. On the one hand, I think that the "propagate failure by default" perspective that we have "quasi"-inherited from Erlang is a sensible one. The default behavior probably ought not be to suppress failure (though precedent definitely is all over the map here, from what I can tell).

However, if we made spawn abort, then I think the current API as it stands leaves a gap. In particular, we currently have the option to spawn asynchronously or synchronously. When the synchronous pattern fits your code, that's great, and it makes the error handling strategy relatively obvious: errors in the child bubble up to the parent, all is hunky dory.

However, there are numerous patterns for which an asynchronous spawn is more appropriate. For example, a CPS-style of execution, where each thread will spawn its successors, or perhaps just send messages that may trigger more work. Now if you are writing code that works in this fashion, and a panic occurs, you may indeed have a plan for how to handle it -- you can e.g. have an RAII guard at the top that catches a panic and sends a message to some central handler. Unfortunately, that doesn't mix well with this RFC, because that panic will still propagate and result in your program being aborted. I think this is partly what @aturon was getting at.

So I feel like this RFC in isolation doesn't feel complete. I'd feel more amenable to this proposal if we altered the API so that there is some way for a thread to indicate that a panic has been handled. Some possibilities are a more general std::terminate-like mechanism (basically codifying the RAII pattern I described above), a recover mechanism, or at least the ability to specify a handler for panic when you launch the thread.

Absent that, the current behavior seems to make sense, because it gives people the ability to build up and choose their error handling strategy of choice (which can easily include an abort).

aturon · 2015-03-05T18:45:05Z

The discussion seems to've stalled out here, but there does seem to be a consensus that there should be some way of customizing the behavior of uncaught panics. I'm going to go ahead and close this PR for now (I'm trying to clean up the repo a bit) and open an issue for addressing this in the future.

The main concern in terms of 1.0 commitment is, of course, the default behavior, and I do think we should consider changing that, although abort-by-default is likely to be too strong given the above discussion.

Thanks for the PR!

aturon · 2015-03-05T18:47:20Z

RFC issue

MichaelGG · 2015-03-05T20:05:14Z

Yeah I went over Niko's issue, basically it's not fair to double panic. I've been thinking about it, and I don't think there's a real elegant solution, as Niko points out. Personally I'd just default to panicking and then allow a per-thread handler to decide if it's fatal or not. Even just a global boolean "when-you-die-in-a-thread-you-die-in-real-life" setting would work for me. But a simple hack like that isn't perhaps something we want to commit to at 1.0.

Michael Giagnocavo added 2 commits December 12, 2014 18:47

Panic should default to abort

a5dca19

Fix date

6a2397d

thestinger reviewed Dec 13, 2014
View reviewed changes

Remove mentioning of unrelated topics

fb8e42b

nrc assigned aturon Dec 18, 2014

aturon closed this Mar 5, 2015

aturon mentioned this pull request Mar 5, 2015

More robust treatment of uncaught panics for child threads #946

Open

aturon mentioned this pull request Mar 5, 2015

Finalze behavior for uncaught panics in child threads rust-lang/rust#23078

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panicking tasks should abort process if not handled #519

Panicking tasks should abort process if not handled #519

MichaelGG commented Dec 13, 2014

bstrie commented Dec 13, 2014

SimonSapin commented Dec 13, 2014

sfackler commented Dec 13, 2014

thestinger commented Dec 13, 2014

thestinger commented Dec 13, 2014

thestinger Dec 13, 2014

thestinger commented Dec 13, 2014

MichaelGG commented Dec 14, 2014

thestinger commented Dec 14, 2014

MichaelGG commented Dec 14, 2014

Ericson2314 commented Dec 14, 2014

ben0x539 commented Dec 16, 2014

reem commented Dec 16, 2014

MichaelGG commented Dec 17, 2014

alexchandel commented Dec 17, 2014

reem commented Dec 17, 2014

pcwalton commented Dec 17, 2014

MichaelGG commented Dec 19, 2014

thestinger commented Dec 19, 2014

aturon commented Dec 19, 2014

MichaelGG commented Dec 20, 2014

aturon commented Dec 30, 2014

aturon commented Feb 4, 2015

nikomatsakis commented Feb 4, 2015

aturon commented Mar 5, 2015

aturon commented Mar 5, 2015

MichaelGG commented Mar 5, 2015


		# Summary

		Currently, a spawned task that panics does not abort the process. A panic on a task is simply silently discarded. This is inappropriate behaviour, as the user's program may not be in a desired working state (a sort of zombie process). Instead, failing tasks should fail the entire process unless explicitly opted out of.

Panicking tasks should abort process if not handled #519

Panicking tasks should abort process if not handled #519

Conversation

MichaelGG commented Dec 13, 2014

bstrie commented Dec 13, 2014

SimonSapin commented Dec 13, 2014

sfackler commented Dec 13, 2014

thestinger commented Dec 13, 2014

thestinger commented Dec 13, 2014

thestinger Dec 13, 2014

Choose a reason for hiding this comment

thestinger commented Dec 13, 2014

MichaelGG commented Dec 14, 2014

thestinger commented Dec 14, 2014

MichaelGG commented Dec 14, 2014

Ericson2314 commented Dec 14, 2014

ben0x539 commented Dec 16, 2014

reem commented Dec 16, 2014

MichaelGG commented Dec 17, 2014

alexchandel commented Dec 17, 2014

reem commented Dec 17, 2014

pcwalton commented Dec 17, 2014

MichaelGG commented Dec 19, 2014

thestinger commented Dec 19, 2014

aturon commented Dec 19, 2014

MichaelGG commented Dec 20, 2014

aturon commented Dec 30, 2014

aturon commented Feb 4, 2015

Exception safety

What does it mean for a panic on a child thread to be caught?

Layering on the abort

Future-proofing

nikomatsakis commented Feb 4, 2015

aturon commented Mar 5, 2015

aturon commented Mar 5, 2015

MichaelGG commented Mar 5, 2015

Layering on the `abort`