Skip to content

Panicking tasks should abort process if not handled #519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

MichaelGG
Copy link

RFC for issue opened here: rust-lang/rust#19610

@bstrie
Copy link
Contributor

bstrie commented Dec 13, 2014

I must have been out of the loop here for a long time, because I could have sworn that a panic in a child task used to panic the parent task unless you explicitly spawned an unlinked task. When did this change?

@SimonSapin
Copy link
Contributor

@bstrie This was called linked failure and was removed at some point.

@sfackler
Copy link
Member

I think linked failure was axed when newrt was added.

@thestinger
Copy link

It's not possible to have robust linked failure in Rust. The best it can do is add a hook in a bunch of standard libraries functions to check for a panic and propagate it, and that would have a significant overhead.

@thestinger
Copy link

It also won't map to the new thread model, since it's not going to have a strict tree hierarchy.


# Summary

Currently, a spawned task that panics does not abort the process. A panic on a task is simply silently discarded. This is inappropriate behaviour, as the user's program may not be in a desired working state (a sort of zombie process). Instead, failing tasks should fail the entire process unless explicitly opted out of.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silencing logic errors is certainly not a good thing. The current hard-wired support for printing to a stream at the panic site instead of leaving this up to the code handling the failure is also a bad design.

@thestinger
Copy link

I agree with avoiding implicit silencing of logic errors. However, I'm strongly against merging it in the current form where it argues for unrestricted exceptions (no memory boundaries for panics) and against aborting on logic errors. Those issues have a tenuous connection to this one at best and tying it to those makes it far more controversial than it would be otherwise.

I think there's an overwhelming consensus against exceptions as a mechanism for handling runtime errors, and even the existing unwinding support does not have community consensus behind it.

@MichaelGG
Copy link
Author

What should Rust do when a task panics, then? Because the current system just ignores errors. What's so bad about aborting unless the user specified the task is non-critical? spawn versus "spawn_bg" or "spawn_try" or something along those lines. What else can you do when there's an unhandled problem in your code?

I'll delete references to unwinding and poisoning to make things clearer, but I wasn't proposing any actual changes, just including them as flavour text to provide context.

@thestinger
Copy link

I do think it should abort on logic errors. The option of opting into exceptions can exist, but silencing errors by default doesn't make sense and paying the compile-time / performance cost by default for a feature most applications don't need doesn't make sense either.

@MichaelGG
Copy link
Author

Sounds like we're in agreement? The whole unwinding thing is out of scope of this RFC. I only mentioned it half tongue-in-cheek because I cannot see any downsides to not silently ignoring errors.

What should I change to get this PR merged?

@Ericson2314
Copy link
Contributor

I have always been a little partial to @thestinger's position that we should just scrap dynamic unwinding altogether and do a mandatory process abort. What rust code relies currently relies on unwinding?

@ben0x539
Copy link

@Ericson2314 Skylight, afaik :P

@reem
Copy link

reem commented Dec 16, 2014

@Ericson2314 skylight, servo, rustc, iron, hyper, the list goes on.

@MichaelGG
Copy link
Author

I removed the distracting comments about poisoning and unwinding in general. What is missing from this PR?

@alexchandel
Copy link

Unwinding is evil. All panics should abort the process. This is a step in the right direction.

@Ericson2314 All those things could be modified to not rely on unwinding, just as they are all modified every time a breaking change lands.

@reem
Copy link

reem commented Dec 17, 2014

@alexchandel that simply isn't true. There are several problem domains were aborting the process on something as simple as a bounds check is simply not acceptable. For instance, skylight is meant to be embedded into long-running production rails applications, and having it abort a running server because of an internal error is completely unacceptable.

Additionally, all of todays web servers rely on unwinding and panic isolation to ensure that they aren't taken down by spurious failures like a request failing etc. Even though those things should use Result, in practice there are many cases were propagating the error just generates a lot of boilerplate to just "fail later", a place where panic! currently shines.

@pcwalton
Copy link
Contributor

Let's not turn this into the "remove unwinding from Rust" debate. If you want that debate, please file an RFC to remove unwinding from Rust.

@MichaelGG
Copy link
Author

Does anyone have any arguments against aborting on "unhandled" panics? Should I attempt to write a patch to add this functionality to stdlib? With 1.0 coming, and this being a noticeable breaking change, I want to make sure I've done all needed to warrant proper inclusion.

@thestinger
Copy link

Aborting on unhandled panics and only printing an error in that case would make it a lot nicer. You wouldn't get errors printed to stderr if you were catching the panics. I don't know if other people feel the same way though.

@aturon
Copy link
Member

aturon commented Dec 19, 2014

@MichaelGG Just a heads-up: I've volunteered to shepherd this RFC, and have already talked some with other core team members about the issues you're raising. I will try to get you some detailed feedback in the near future, but it may be delayed by a week due to holidays.

BTW, you may be interested in the PR I just landed, which revamps Rust's notion of threads (removing tasks) and makes joining a child thread and extracting its result the default -- you have to explicitly detach if you don't want to join.

Anyway, more feedback soon, just wanted to let you know we're listening and thinking about this.

@MichaelGG
Copy link
Author

Hi @aturon , thank you very much for the response. I didn't want to nag, just make sure I hadn't gotten this forgotten. Would it be worthwhile for me to attempt creating a patch for the threading system to support this RFC, or is that code that you or another involved team member would write?

@aturon
Copy link
Member

aturon commented Dec 30, 2014

@MichaelGG No, I don't think a patch is needed at this point, though if we decided to go this direction you'd be more than welcome to be involved in the implementation!

(I'm digging myself out of my inbox and should be able to respond more substantially to this RFC in the next couple of days.)

@aturon
Copy link
Member

aturon commented Feb 4, 2015

@MichaelGG I'm catching up on RFC shepherding and wanted to leave some thoughts on this RFC.

Exception safety

One of the basic issues raised here is dealing with exception safety when data is shared between threads. High-level communication between threads -- locks, semaphores, channels, and so on -- all have built-in support for "handling" a panic from other threads by propagation. While this does not force the other threads to panic, it makes clear that a panic has occurred and forces you to opt-in to continuing.

On the other hand, low-level atomics do not provide any such built-in measures. But data manipulated directly via low-level atomics is generally kept in an invariant-maintaining state at all times (i.e., with each atomic update), because other threads can observe the state at all times. That's one of the basic principles behind lock-free data structures. So the exception safety concern is much more limited in such cases.

What does it mean for a panic on a child thread to be caught?

Connected to the above, to make this idea work in practice we'd need a fuller picture of what it means to "handle" a panic from a child thread. The RFC talks about try_future (which is now basically Thread::scoped), but this is by no means the only way to detect and deal with a panic from another thread. As mentioned above, channels, mutexes, and other synchronization constructs also propagate panics.

It seems very unfortunate for a panic to be detected by another thread via a channel, but for the child thread to still abort the process. To get around that, all of the above synchronization constructs would have to set some kind of thread-local "panic handled" thread, which itself seems somewhat hokey and error-prone (consider that all future synchronization constructs would need to do this as well).

Layering on the abort

On the other hand, if abort is desired, it's very easy to add via a guard wrapping the body of spawned threads. Once can even create a simple wrapper API around spawn to do this. So this is really just a question of defaults.

To me, asynchronously aborting the process is a somewhat aggressive and opinionated default, and I think it may be reasonable to ask people to opt in to it. (There's precedent for that in some other languages as well.)

Future-proofing

Above I'm arguing that we shouldn't take this RFC, but what happens if the above arguments turn out to be wrong for some reason or another?

I think in the long run, we may well want to explore something like C++'s std::terminate, i.e. a customizable action that's fired whenever an exception is uncaught (though again, "catching" is a bit harder to define in Rust). That would make it even easier for people to choose their desired behavior in a global way, rather than trying to find a one-size-fits-all solution.

@nikomatsakis
Copy link
Contributor

I am feeling a bit torn here. On the one hand, I think that the "propagate failure by default" perspective that we have "quasi"-inherited from Erlang is a sensible one. The default behavior probably ought not be to suppress failure (though precedent definitely is all over the map here, from what I can tell).

However, if we made spawn abort, then I think the current API as it stands leaves a gap. In particular, we currently have the option to spawn asynchronously or synchronously. When the synchronous pattern fits your code, that's great, and it makes the error handling strategy relatively obvious: errors in the child bubble up to the parent, all is hunky dory.

However, there are numerous patterns for which an asynchronous spawn is more appropriate. For example, a CPS-style of execution, where each thread will spawn its successors, or perhaps just send messages that may trigger more work. Now if you are writing code that works in this fashion, and a panic occurs, you may indeed have a plan for how to handle it -- you can e.g. have an RAII guard at the top that catches a panic and sends a message to some central handler. Unfortunately, that doesn't mix well with this RFC, because that panic will still propagate and result in your program being aborted. I think this is partly what @aturon was getting at.

So I feel like this RFC in isolation doesn't feel complete. I'd feel more amenable to this proposal if we altered the API so that there is some way for a thread to indicate that a panic has been handled. Some possibilities are a more general std::terminate-like mechanism (basically codifying the RAII pattern I described above), a recover mechanism, or at least the ability to specify a handler for panic when you launch the thread.

Absent that, the current behavior seems to make sense, because it gives people the ability to build up and choose their error handling strategy of choice (which can easily include an abort).

@aturon
Copy link
Member

aturon commented Mar 5, 2015

The discussion seems to've stalled out here, but there does seem to be a consensus that there should be some way of customizing the behavior of uncaught panics. I'm going to go ahead and close this PR for now (I'm trying to clean up the repo a bit) and open an issue for addressing this in the future.

The main concern in terms of 1.0 commitment is, of course, the default behavior, and I do think we should consider changing that, although abort-by-default is likely to be too strong given the above discussion.

Thanks for the PR!

@aturon
Copy link
Member

aturon commented Mar 5, 2015

RFC issue

@MichaelGG
Copy link
Author

Yeah I went over Niko's issue, basically it's not fair to double panic. I've been thinking about it, and I don't think there's a real elegant solution, as Niko points out. Personally I'd just default to panicking and then allow a per-thread handler to decide if it's fatal or not. Even just a global boolean "when-you-die-in-a-thread-you-die-in-real-life" setting would work for me. But a simple hack like that isn't perhaps something we want to commit to at 1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.