Skip to content

Commit 91ca792

Browse files
committed
Merge branch 'stabilize-catch-panic'
2 parents 50057bc + 9ebc369 commit 91ca792

File tree

1 file changed

+342
-0
lines changed

1 file changed

+342
-0
lines changed

text/0000-stabilize-catch-panic.md

Lines changed: 342 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,342 @@
1+
- Feature Name: `catch_panic`
2+
- Start Date: 2015-07-24
3+
- RFC PR: (leave this empty)
4+
- Rust Issue: (leave this empty)
5+
6+
# Summary
7+
8+
Move `std::thread::catch_panic` to `std::panic::recover` after removing the
9+
`Send` bound from the closure parameter.
10+
11+
# Motivation
12+
13+
In today's stable Rust it's not possible to catch a panic on the thread that
14+
caused it. There are a number of situations, however, where this is
15+
either required for correctness or necessary for building a useful abstraction:
16+
17+
* It is currently defined as undefined behavior to have a Rust program panic
18+
across an FFI boundary. For example if C calls into Rust and Rust panics, then
19+
this is undefined behavior. Being able to catch a panic will allow writing
20+
C APIs in Rust that do not risk aborting the process they are embedded into.
21+
22+
* Abstractions like thread pools want to catch the panics of tasks being run
23+
instead of having the thread torn down (and having to spawn a new thread).
24+
25+
Stabilizing the `catch_panic` function would enable these two use cases, but
26+
let's also take a look at the current signature of the function:
27+
28+
```rust
29+
fn catch_panic<F, R>(f: F) -> thread::Result<R>
30+
where F: FnOnce() -> R + Send + 'static
31+
```
32+
33+
This function will run the closure `f` and if it panics return `Err(Box<Any>)`.
34+
If the closure doesn't panic it will return `Ok(val)` where `val` is the
35+
returned value of the closure. The closure, however, is restricted to only close
36+
over `Send` and `'static` data. These bounds can be overly restrictive, and due
37+
to thread-local storage [they can be subverted][tls-subvert], making it unclear
38+
what purpose they serve. This RFC proposes to remove the bounds as well.
39+
40+
[tls-subvert]: https://github.com/rust-lang/rust/issues/25662
41+
42+
Historically Rust has purposefully avoided the foray into the situation of
43+
catching panics, largely because of a problem typically referred to as
44+
"exception safety". To further understand the motivation of stabilization and
45+
relaxing the bounds, let's review what exception safety is and what it means for
46+
Rust.
47+
48+
# Background: What is exception safety?
49+
50+
Languages with exceptions have the property that a function can "return" early
51+
if an exception is thrown. While exceptions aren't too hard to reason about when
52+
thrown explicitly, they can be problematic when they are thrown by code being
53+
called -- especially when that code isn't known in advance. Code is **exception
54+
safe** if it works correctly even when the functions it calls into throw
55+
exceptions.
56+
57+
The idea of throwing an exception causing bugs may sound a bit alien, so it's
58+
helpful to drill down into exactly why this is the case. Bugs related to
59+
exception safety are comprised of two critical components:
60+
61+
1. An invariant of a data structure is broken.
62+
2. This broken invariant is the later observed.
63+
64+
Exceptional control flow often exacerbates this first component of breaking
65+
invariants. For example many data structures have a number of invariants that
66+
are dynamically upheld for correctness, and the type's routines can temporarily
67+
break these invariants to be fixed up before the function returns. If, however,
68+
an exception is thrown in this interim period the broken invariant could be
69+
accidentally exposed.
70+
71+
The second component, observing a broken invariant, can sometimes be difficult
72+
in the face of exceptions, but languages often have constructs to enable these
73+
sorts of witnesses. Two primary methods of doing so are something akin to
74+
finally blocks (code run on a normal or exceptional return) or just catching the
75+
exception. In both cases code which later runs that has access to the original
76+
data structure will see the broken invariants.
77+
78+
Now that we've got a better understanding of how an exception might cause a bug
79+
(e.g. how code can be "exception unsafe"), let's take a look how we can make
80+
code exception safe. To be exception safe, code needs to be prepared for an
81+
exception to be thrown whenever an invariant it relies on is broken, for
82+
example:
83+
84+
* Code can be audited to ensure it only calls functions which are statically
85+
known to not throw an exception.
86+
* Local "cleanup" handlers can be placed on the stack to restore invariants
87+
whenever a function returns, either normally or exceptionally. This can be
88+
done through finally blocks in some languages or via destructors in others.
89+
* Exceptions can be caught locally to perform cleanup before possibly re-raising
90+
the exception.
91+
92+
With all that in mind, we've now identified problems that can arise via
93+
exceptions (an invariant is broken and then observed) as well as methods to
94+
ensure that prevent this from happening. In languages like C++ this means that
95+
we can be memory safe in the face of exceptions and in languages like Java we
96+
can ensure that our logical invariants are upheld. Given this background let's
97+
take a look at how any of this applies to Rust.
98+
99+
# Background: What is exception safety in Rust?
100+
101+
> Note: This section describes the current state of Rust today without this RFC
102+
> implemented
103+
104+
Up to now we've been talking about exceptions and exception safety, but from a
105+
Rust perspective we can just replace this with panics and panic safety. Panics
106+
in Rust are currently implemented essentially as a C++ exception under the hood.
107+
As a result, **exception safety is something that needs to be handled in Rust
108+
code today**.
109+
110+
One of the primary examples where panics need to be handled in Rust is unsafe
111+
code. Let's take a look at an example where this matters:
112+
113+
```rust
114+
pub fn push_ten_more<T: Clone>(v: &mut Vec<T>, t: T) {
115+
unsafe {
116+
v.reserve(10);
117+
let len = v.len();
118+
v.set_len(len + 10);
119+
for i in 0..10 {
120+
ptr::write(v.as_mut_ptr().offset(len + i), t.clone());
121+
}
122+
}
123+
}
124+
```
125+
126+
While this code may look correct, it's actually not memory safe.
127+
`Vec` has an internal invariant that its first `len` elements are safe to drop
128+
at any time. Our function above has temporarily broken this invariant with the
129+
call to `set_len` (the next 10 elements are uninitialized). If the type `T`'s
130+
`clone` method panics then this broken invariant will escape the function. The
131+
broken `Vec` is then observed during its destructor, leading to the eventual
132+
memory unsafety.
133+
134+
It's important to keep in mind that panic safety in Rust is not solely limited
135+
to memory safety. *Logical invariants* are often just as critical to keep correct
136+
during execution and no `unsafe` code in Rust is needed to break a logical
137+
invariant. In practice, however, these sorts of bugs are rarely observed due to
138+
Rust's design:
139+
140+
* Rust doesn't expose uninitialized memory
141+
* Panics cannot be caught in a thread
142+
* Across threads data is poisoned by default on panics
143+
* Idiomatic Rust must opt in to extra sharing across boundaries (e.g. `RefCell`)
144+
* Destructors are relatively rare and uninteresting in safe code
145+
146+
These mitigations all address the *second* aspect of exception unsafety:
147+
observation of broken invariants. With the tactics in place, it ends up being
148+
the case that **safe Rust code can largely ignore exception safety
149+
concerns**. That being said, it does not mean that safe Rust code can *always*
150+
ignore exception safety issues. There are a number of methods to subvert the
151+
mitigation strategies listed above:
152+
153+
1. When poisoning data across threads, antidotes are available to access
154+
poisoned data. Namely the [`PoisonError` type][pet] allows safe access to the
155+
poisoned information.
156+
2. Single-threaded types with interior mutability, such as `RefCell`, allow for
157+
sharing data across stack frames such that a broken invariant could
158+
eventually be observed.
159+
3. Whenever a thread panics, the destructors for its stack variables will be run
160+
as the thread unwinds. Destructors may have access to data which was also
161+
accessible lower on the stack (such as through `RefCell` or `Rc`) which has a
162+
broken invariant, and the destructor may then witness this.
163+
164+
[pet]: http://doc.rust-lang.org/std/sync/struct.PoisonError.html
165+
166+
But all of these "subversions" fall outside the realm of normal, idiomatic, safe
167+
Rust code, and so they all serve as a "heads up" that panic safety might be an
168+
issue. Thus, in practice, Rust programmers worry about exception safety far less
169+
than in languages with full-blown exceptions.
170+
171+
Despite these methods to subvert the mitigations placed by default in Rust, a
172+
key part of exception safety in Rust is that **safe code can never lead to
173+
memory unsafety**, regardless of whether it panics or not. Memory unsafety
174+
triggered as part of a panic can always be traced back to an `unsafe` block.
175+
176+
With all that background out of the way now, let's take a look at the guts of
177+
this RFC.
178+
179+
# Detailed design
180+
181+
At its heart, the change this RFC is proposing is to move
182+
`std::thread::catch_panic` to a new `std::panic` module and rename the function
183+
to `catch`. Additionally, the `Send` bound from the closure parameter will be
184+
removed (`'static` will stay), modifying the signature to be:
185+
186+
```rust
187+
fn recover<F: FnOnce() -> R + 'static, R>(f: F) -> thread::Result<R>
188+
```
189+
190+
More generally, however, this RFC also claims that this stable function does
191+
not radically alter Rust's exception safety story (explained above).
192+
193+
## Will Rust have exceptions?
194+
195+
In a technical sense this RFC is not "adding exceptions to Rust" as they already
196+
exist in the form of panics. What this RFC is adding, however, is a construct
197+
via which to catch these exceptions within a thread, bringing the standard
198+
library closer to the exception support in other languages.
199+
200+
Catching a panic makes it easier to observe broken invariants of data structures
201+
shared across the `catch_panic` boundary, which can possibly increase the
202+
likelihood of exception safety issues arising.
203+
204+
The risk of this step is that catching panics becomes an idiomatic way to deal
205+
with error-handling, thereby making exception safety much more of a headache
206+
than it is today (as it's more likely that a broken invariant is later
207+
witnessed). The `catch_panic` function is intended to only be used
208+
where it's absolutely necessary, e.g. for FFI boundaries, but how can it be
209+
ensured that `catch_panic` isn't overused?
210+
211+
There are two key reasons `catch_panic` likely won't become idiomatic:
212+
213+
1. There are already strong and established conventions around error handling,
214+
and in particular around the use of panic and `Result` with stabilized usage
215+
of them in the standard library. There is little chance these conventions
216+
would change overnight.
217+
218+
2. There has long been a desire to treat every use of `panic!` as an abort
219+
which is motivated by portability, compile time, binary size, and a number of
220+
other factors. Assuming this step is taken, it would be extremely unwise for
221+
a library to signal expected errors via panics and rely on consumers using
222+
`catch_panic` to handle them.
223+
224+
For reference, here's a summary of the conventions around `Result` and `panic`,
225+
which still hold good after this RFC:
226+
227+
### Result vs Panic
228+
229+
There are two primary strategies for signaling that a function can fail in Rust
230+
today:
231+
232+
* `Results` represent errors/edge-cases that the author of the library knew
233+
about, and expects the consumer of the library to handle.
234+
235+
* `panic`s represent errors that the author of the library did not expect to
236+
occur, such as a contract violation, and therefore does not expect the
237+
consumer to handle in any particular way.
238+
239+
Another way to put this division is that:
240+
241+
* `Result`s represent errors that carry additional contextual information. This
242+
information allows them to be handled by the caller of the function producing
243+
the error, modified with additional contextual information, and eventually
244+
converted into an error message fit for a top-level program.
245+
246+
* `panic`s represent errors that carry no contextual information (except,
247+
perhaps, debug information). Because they represented an unexpected error,
248+
they cannot be easily handled by the caller of the function or presented to
249+
the top-level program (except to say "something unexpected has gone wrong").
250+
251+
Some pros of `Result` are that it signals specific edge cases that you as a
252+
consumer should think about handling and it allows the caller to decide
253+
precisely how to handle the error. A con with `Result` is that defining errors
254+
and writing down `Result` + `try!` is not always the most ergonomic.
255+
256+
The pros and cons of `panic` are essentially the opposite of `Result`, being
257+
easy to use (nothing to write down other than the panic) but difficult to
258+
determine when a panic can happen or handle it in a custom fashion, even with
259+
`catch_panic`.
260+
261+
These divisions justify the use of `panic`s for things like out-of-bounds
262+
indexing: such an error represents a programming mistake that (1) the author of
263+
the library was not aware of, by definition, and (2) cannot be meaningfully
264+
handled by the caller.
265+
266+
In terms of heuristics for use, `panic`s should rarely if ever be used to report
267+
routine errors for example through communication with the system or through IO.
268+
If a Rust program shells out to `rustc`, and `rustc` is not found, it might be
269+
tempting to use a panic because the error is unexpected and hard to recover
270+
from. A user of the program, however, would benefit from intermediate code
271+
adding contextual information about the in-progress operation, and the program
272+
could report the error in terms a they can understand. While the error is
273+
rare, **when it happens it is not a programmer error**. In short, panics are
274+
roughly analogous to an opaque "an unexpected error has occurred" message.
275+
276+
Stabilizing `catch_panic` does little to change the tradeoffs around `Result`
277+
and `panic` that led to these conventions.
278+
279+
## Why remove `Send`?
280+
281+
One of the primary use cases of `recover` is in an FFI context, where lots
282+
of `*mut` and `*const` pointers are flying around. These two types aren't
283+
`Send` by default, so having their values cross the `catch_panic` boundary
284+
would be highly un-ergonomic (albeit still possible). As a result, this RFC
285+
proposes removing the `Send` bound from the function.
286+
287+
## Why keep `'static`?
288+
289+
This RFC proposes leaving the `'static` bound on the closure parameter for now.
290+
There isn't a clearly strong case (such as for `Send`) to remove this parameter
291+
just yet, and it helps mitigate exception safety issues related to shared
292+
references across the `recover` boundary.
293+
294+
There is conversely also not a clearly strong case for *keeping* this bound, but
295+
as it's the more conservative route (and backwards compatible to remove) it will
296+
remain for now.
297+
298+
# Drawbacks
299+
300+
A drawback of this RFC is that it can water down Rust's error handling story.
301+
With the addition of a "catch" construct for exceptions, it may be unclear to
302+
library authors whether to use panics or `Result` for their error types. As we
303+
discussed above, however, Rust's design around error handling has always had to
304+
deal with these two strategies, and our conventions don't materially change by
305+
stabilizing `catch_panic`.
306+
307+
# Alternatives
308+
309+
One alternative, which is somewhat more of an addition, is to have the standard
310+
library entirely abandon all exception safety mitigation tactics. As explained
311+
in the motivation section, exception safety will not lead to memory unsafety
312+
unless paired with unsafe code, so it is perhaps within the realm of possibility
313+
to remove the tactics of poisoning from mutexes and simply require that
314+
consumers deal with exception safety 100% of the time.
315+
316+
This alternative is often motivated by saying that there are enough methods to
317+
subvert the default mitigation tactics that it's not worth trying to plug some
318+
holes and not others. Upon closer inspection, however, the areas where safe code
319+
needs to worry about exception safety are isolated to the single-threaded
320+
situations. For example `RefCell`, destructors, and `catch_panic` all only
321+
expose data possibly broken through a panic in a single thread.
322+
323+
Once a thread boundary is crossed, the only current way to share data mutably is
324+
via `Mutex` or `RwLock`, both of which are poisoned by default. This sort of
325+
sharing is fundamental to threaded code, and poisoning by default allows safe
326+
code to freely use many threads without having to consider exception safety
327+
across threads (as poisoned data will tear down all connected threads).
328+
329+
This property of multithreaded programming in Rust is seen as strong enough that
330+
poisoning should not be removed by default, and in fact a new hypothetical
331+
`thread::scoped` API (a rough counterpart of `catch_panic`) could also propagate
332+
panics by default (like poisoning) with an ability to opt out (like
333+
`PoisonError`).
334+
335+
# Unresolved questions
336+
337+
- Is it worth keeping the `'static` and `Send` bounds as a mitigation measure in
338+
practice, even if they aren't enforceable in theory? That would require thread
339+
pools to use unsafe code, but that could be acceptable.
340+
341+
- Should `catch_panic` be stabilized within `std::thread` where it lives today,
342+
or somewhere else?

0 commit comments

Comments
 (0)