-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Panicking tasks should abort process if not handled #519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I must have been out of the loop here for a long time, because I could have sworn that a panic in a child task used to panic the parent task unless you explicitly spawned an unlinked task. When did this change? |
@bstrie This was called linked failure and was removed at some point. |
I think linked failure was axed when newrt was added. |
It's not possible to have robust linked failure in Rust. The best it can do is add a hook in a bunch of standard libraries functions to check for a panic and propagate it, and that would have a significant overhead. |
It also won't map to the new thread model, since it's not going to have a strict tree hierarchy. |
|
||
# Summary | ||
|
||
Currently, a spawned task that panics does not abort the process. A panic on a task is simply silently discarded. This is inappropriate behaviour, as the user's program may not be in a desired working state (a sort of zombie process). Instead, failing tasks should fail the entire process unless explicitly opted out of. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silencing logic errors is certainly not a good thing. The current hard-wired support for printing to a stream at the panic site instead of leaving this up to the code handling the failure is also a bad design.
I agree with avoiding implicit silencing of logic errors. However, I'm strongly against merging it in the current form where it argues for unrestricted exceptions (no memory boundaries for panics) and against aborting on logic errors. Those issues have a tenuous connection to this one at best and tying it to those makes it far more controversial than it would be otherwise. I think there's an overwhelming consensus against exceptions as a mechanism for handling runtime errors, and even the existing unwinding support does not have community consensus behind it. |
What should Rust do when a task panics, then? Because the current system just ignores errors. What's so bad about aborting unless the user specified the task is non-critical? spawn versus "spawn_bg" or "spawn_try" or something along those lines. What else can you do when there's an unhandled problem in your code? I'll delete references to unwinding and poisoning to make things clearer, but I wasn't proposing any actual changes, just including them as flavour text to provide context. |
I do think it should abort on logic errors. The option of opting into exceptions can exist, but silencing errors by default doesn't make sense and paying the compile-time / performance cost by default for a feature most applications don't need doesn't make sense either. |
Sounds like we're in agreement? The whole unwinding thing is out of scope of this RFC. I only mentioned it half tongue-in-cheek because I cannot see any downsides to not silently ignoring errors. What should I change to get this PR merged? |
I have always been a little partial to @thestinger's position that we should just scrap dynamic unwinding altogether and do a mandatory process abort. What rust code relies currently relies on unwinding? |
@Ericson2314 Skylight, afaik :P |
@Ericson2314 skylight, servo, rustc, iron, hyper, the list goes on. |
I removed the distracting comments about poisoning and unwinding in general. What is missing from this PR? |
Unwinding is evil. All panics should abort the process. This is a step in the right direction. @Ericson2314 All those things could be modified to not rely on unwinding, just as they are all modified every time a breaking change lands. |
@alexchandel that simply isn't true. There are several problem domains were aborting the process on something as simple as a bounds check is simply not acceptable. For instance, skylight is meant to be embedded into long-running production rails applications, and having it abort a running server because of an internal error is completely unacceptable. Additionally, all of todays web servers rely on unwinding and panic isolation to ensure that they aren't taken down by spurious failures like a request failing etc. Even though those things should use Result, in practice there are many cases were propagating the error just generates a lot of boilerplate to just "fail later", a place where |
Let's not turn this into the "remove unwinding from Rust" debate. If you want that debate, please file an RFC to remove unwinding from Rust. |
Does anyone have any arguments against aborting on "unhandled" panics? Should I attempt to write a patch to add this functionality to stdlib? With 1.0 coming, and this being a noticeable breaking change, I want to make sure I've done all needed to warrant proper inclusion. |
Aborting on unhandled panics and only printing an error in that case would make it a lot nicer. You wouldn't get errors printed to stderr if you were catching the panics. I don't know if other people feel the same way though. |
@MichaelGG Just a heads-up: I've volunteered to shepherd this RFC, and have already talked some with other core team members about the issues you're raising. I will try to get you some detailed feedback in the near future, but it may be delayed by a week due to holidays. BTW, you may be interested in the PR I just landed, which revamps Rust's notion of threads (removing Anyway, more feedback soon, just wanted to let you know we're listening and thinking about this. |
Hi @aturon , thank you very much for the response. I didn't want to nag, just make sure I hadn't gotten this forgotten. Would it be worthwhile for me to attempt creating a patch for the threading system to support this RFC, or is that code that you or another involved team member would write? |
@MichaelGG No, I don't think a patch is needed at this point, though if we decided to go this direction you'd be more than welcome to be involved in the implementation! (I'm digging myself out of my inbox and should be able to respond more substantially to this RFC in the next couple of days.) |
@MichaelGG I'm catching up on RFC shepherding and wanted to leave some thoughts on this RFC. Exception safetyOne of the basic issues raised here is dealing with exception safety when data is shared between threads. High-level communication between threads -- locks, semaphores, channels, and so on -- all have built-in support for "handling" a panic from other threads by propagation. While this does not force the other threads to panic, it makes clear that a panic has occurred and forces you to opt-in to continuing. On the other hand, low-level atomics do not provide any such built-in measures. But data manipulated directly via low-level atomics is generally kept in an invariant-maintaining state at all times (i.e., with each atomic update), because other threads can observe the state at all times. That's one of the basic principles behind lock-free data structures. So the exception safety concern is much more limited in such cases. What does it mean for a panic on a child thread to be caught?Connected to the above, to make this idea work in practice we'd need a fuller picture of what it means to "handle" a panic from a child thread. The RFC talks about It seems very unfortunate for a panic to be detected by another thread via a channel, but for the child thread to still abort the process. To get around that, all of the above synchronization constructs would have to set some kind of thread-local "panic handled" thread, which itself seems somewhat hokey and error-prone (consider that all future synchronization constructs would need to do this as well). Layering on the
|
I am feeling a bit torn here. On the one hand, I think that the "propagate failure by default" perspective that we have "quasi"-inherited from Erlang is a sensible one. The default behavior probably ought not be to suppress failure (though precedent definitely is all over the map here, from what I can tell). However, if we made However, there are numerous patterns for which an asynchronous spawn is more appropriate. For example, a CPS-style of execution, where each thread will spawn its successors, or perhaps just send messages that may trigger more work. Now if you are writing code that works in this fashion, and a panic occurs, you may indeed have a plan for how to handle it -- you can e.g. have an RAII guard at the top that catches a panic and sends a message to some central handler. Unfortunately, that doesn't mix well with this RFC, because that panic will still propagate and result in your program being aborted. I think this is partly what @aturon was getting at. So I feel like this RFC in isolation doesn't feel complete. I'd feel more amenable to this proposal if we altered the API so that there is some way for a thread to indicate that a panic has been handled. Some possibilities are a more general Absent that, the current behavior seems to make sense, because it gives people the ability to build up and choose their error handling strategy of choice (which can easily include an abort). |
The discussion seems to've stalled out here, but there does seem to be a consensus that there should be some way of customizing the behavior of uncaught panics. I'm going to go ahead and close this PR for now (I'm trying to clean up the repo a bit) and open an issue for addressing this in the future. The main concern in terms of 1.0 commitment is, of course, the default behavior, and I do think we should consider changing that, although abort-by-default is likely to be too strong given the above discussion. Thanks for the PR! |
Yeah I went over Niko's issue, basically it's not fair to double panic. I've been thinking about it, and I don't think there's a real elegant solution, as Niko points out. Personally I'd just default to panicking and then allow a per-thread handler to decide if it's fatal or not. Even just a global boolean "when-you-die-in-a-thread-you-die-in-real-life" setting would work for me. But a simple hack like that isn't perhaps something we want to commit to at 1.0. |
RFC for issue opened here: rust-lang/rust#19610