|
| 1 | +- Feature Name: `catch_panic` |
| 2 | +- Start Date: 2015-07-24 |
| 3 | +- RFC PR: (leave this empty) |
| 4 | +- Rust Issue: (leave this empty) |
| 5 | + |
| 6 | +# Summary |
| 7 | + |
| 8 | +Move `std::thread::catch_panic` to `std::panic::recover` after removing the |
| 9 | +`Send` bound from the closure parameter. |
| 10 | + |
| 11 | +# Motivation |
| 12 | + |
| 13 | +In today's stable Rust it's not possible to catch a panic on the thread that |
| 14 | +caused it. There are a number of situations, however, where this is |
| 15 | +either required for correctness or necessary for building a useful abstraction: |
| 16 | + |
| 17 | +* It is currently defined as undefined behavior to have a Rust program panic |
| 18 | + across an FFI boundary. For example if C calls into Rust and Rust panics, then |
| 19 | + this is undefined behavior. Being able to catch a panic will allow writing |
| 20 | + C APIs in Rust that do not risk aborting the process they are embedded into. |
| 21 | + |
| 22 | +* Abstractions like thread pools want to catch the panics of tasks being run |
| 23 | + instead of having the thread torn down (and having to spawn a new thread). |
| 24 | + |
| 25 | +Stabilizing the `catch_panic` function would enable these two use cases, but |
| 26 | +let's also take a look at the current signature of the function: |
| 27 | + |
| 28 | +```rust |
| 29 | +fn catch_panic<F, R>(f: F) -> thread::Result<R> |
| 30 | + where F: FnOnce() -> R + Send + 'static |
| 31 | +``` |
| 32 | + |
| 33 | +This function will run the closure `f` and if it panics return `Err(Box<Any>)`. |
| 34 | +If the closure doesn't panic it will return `Ok(val)` where `val` is the |
| 35 | +returned value of the closure. The closure, however, is restricted to only close |
| 36 | +over `Send` and `'static` data. These bounds can be overly restrictive, and due |
| 37 | +to thread-local storage [they can be subverted][tls-subvert], making it unclear |
| 38 | +what purpose they serve. This RFC proposes to remove the bounds as well. |
| 39 | + |
| 40 | +[tls-subvert]: https://github.com/rust-lang/rust/issues/25662 |
| 41 | + |
| 42 | +Historically Rust has purposefully avoided the foray into the situation of |
| 43 | +catching panics, largely because of a problem typically referred to as |
| 44 | +"exception safety". To further understand the motivation of stabilization and |
| 45 | +relaxing the bounds, let's review what exception safety is and what it means for |
| 46 | +Rust. |
| 47 | + |
| 48 | +# Background: What is exception safety? |
| 49 | + |
| 50 | +Languages with exceptions have the property that a function can "return" early |
| 51 | +if an exception is thrown. While exceptions aren't too hard to reason about when |
| 52 | +thrown explicitly, they can be problematic when they are thrown by code being |
| 53 | +called -- especially when that code isn't known in advance. Code is **exception |
| 54 | +safe** if it works correctly even when the functions it calls into throw |
| 55 | +exceptions. |
| 56 | + |
| 57 | +The idea of throwing an exception causing bugs may sound a bit alien, so it's |
| 58 | +helpful to drill down into exactly why this is the case. Bugs related to |
| 59 | +exception safety are comprised of two critical components: |
| 60 | + |
| 61 | +1. An invariant of a data structure is broken. |
| 62 | +2. This broken invariant is the later observed. |
| 63 | + |
| 64 | +Exceptional control flow often exacerbates this first component of breaking |
| 65 | +invariants. For example many data structures have a number of invariants that |
| 66 | +are dynamically upheld for correctness, and the type's routines can temporarily |
| 67 | +break these invariants to be fixed up before the function returns. If, however, |
| 68 | +an exception is thrown in this interim period the broken invariant could be |
| 69 | +accidentally exposed. |
| 70 | + |
| 71 | +The second component, observing a broken invariant, can sometimes be difficult |
| 72 | +in the face of exceptions, but languages often have constructs to enable these |
| 73 | +sorts of witnesses. Two primary methods of doing so are something akin to |
| 74 | +finally blocks (code run on a normal or exceptional return) or just catching the |
| 75 | +exception. In both cases code which later runs that has access to the original |
| 76 | +data structure will see the broken invariants. |
| 77 | + |
| 78 | +Now that we've got a better understanding of how an exception might cause a bug |
| 79 | +(e.g. how code can be "exception unsafe"), let's take a look how we can make |
| 80 | +code exception safe. To be exception safe, code needs to be prepared for an |
| 81 | +exception to be thrown whenever an invariant it relies on is broken, for |
| 82 | +example: |
| 83 | + |
| 84 | +* Code can be audited to ensure it only calls functions which are statically |
| 85 | + known to not throw an exception. |
| 86 | +* Local "cleanup" handlers can be placed on the stack to restore invariants |
| 87 | + whenever a function returns, either normally or exceptionally. This can be |
| 88 | + done through finally blocks in some languages or via destructors in others. |
| 89 | +* Exceptions can be caught locally to perform cleanup before possibly re-raising |
| 90 | + the exception. |
| 91 | + |
| 92 | +With all that in mind, we've now identified problems that can arise via |
| 93 | +exceptions (an invariant is broken and then observed) as well as methods to |
| 94 | +ensure that prevent this from happening. In languages like C++ this means that |
| 95 | +we can be memory safe in the face of exceptions and in languages like Java we |
| 96 | +can ensure that our logical invariants are upheld. Given this background let's |
| 97 | +take a look at how any of this applies to Rust. |
| 98 | + |
| 99 | +# Background: What is exception safety in Rust? |
| 100 | + |
| 101 | +> Note: This section describes the current state of Rust today without this RFC |
| 102 | +> implemented |
| 103 | + |
| 104 | +Up to now we've been talking about exceptions and exception safety, but from a |
| 105 | +Rust perspective we can just replace this with panics and panic safety. Panics |
| 106 | +in Rust are currently implemented essentially as a C++ exception under the hood. |
| 107 | +As a result, **exception safety is something that needs to be handled in Rust |
| 108 | +code today**. |
| 109 | + |
| 110 | +One of the primary examples where panics need to be handled in Rust is unsafe |
| 111 | +code. Let's take a look at an example where this matters: |
| 112 | + |
| 113 | +```rust |
| 114 | +pub fn push_ten_more<T: Clone>(v: &mut Vec<T>, t: T) { |
| 115 | + unsafe { |
| 116 | + v.reserve(10); |
| 117 | + let len = v.len(); |
| 118 | + v.set_len(len + 10); |
| 119 | + for i in 0..10 { |
| 120 | + ptr::write(v.as_mut_ptr().offset(len + i), t.clone()); |
| 121 | + } |
| 122 | + } |
| 123 | +} |
| 124 | +``` |
| 125 | + |
| 126 | +While this code may look correct, it's actually not memory safe. |
| 127 | +`Vec` has an internal invariant that its first `len` elements are safe to drop |
| 128 | +at any time. Our function above has temporarily broken this invariant with the |
| 129 | +call to `set_len` (the next 10 elements are uninitialized). If the type `T`'s |
| 130 | +`clone` method panics then this broken invariant will escape the function. The |
| 131 | +broken `Vec` is then observed during its destructor, leading to the eventual |
| 132 | +memory unsafety. |
| 133 | + |
| 134 | +It's important to keep in mind that panic safety in Rust is not solely limited |
| 135 | +to memory safety. *Logical invariants* are often just as critical to keep correct |
| 136 | +during execution and no `unsafe` code in Rust is needed to break a logical |
| 137 | +invariant. In practice, however, these sorts of bugs are rarely observed due to |
| 138 | +Rust's design: |
| 139 | + |
| 140 | +* Rust doesn't expose uninitialized memory |
| 141 | +* Panics cannot be caught in a thread |
| 142 | +* Across threads data is poisoned by default on panics |
| 143 | +* Idiomatic Rust must opt in to extra sharing across boundaries (e.g. `RefCell`) |
| 144 | +* Destructors are relatively rare and uninteresting in safe code |
| 145 | + |
| 146 | +These mitigations all address the *second* aspect of exception unsafety: |
| 147 | +observation of broken invariants. With the tactics in place, it ends up being |
| 148 | +the case that **safe Rust code can largely ignore exception safety |
| 149 | +concerns**. That being said, it does not mean that safe Rust code can *always* |
| 150 | +ignore exception safety issues. There are a number of methods to subvert the |
| 151 | +mitigation strategies listed above: |
| 152 | + |
| 153 | +1. When poisoning data across threads, antidotes are available to access |
| 154 | + poisoned data. Namely the [`PoisonError` type][pet] allows safe access to the |
| 155 | + poisoned information. |
| 156 | +2. Single-threaded types with interior mutability, such as `RefCell`, allow for |
| 157 | + sharing data across stack frames such that a broken invariant could |
| 158 | + eventually be observed. |
| 159 | +3. Whenever a thread panics, the destructors for its stack variables will be run |
| 160 | + as the thread unwinds. Destructors may have access to data which was also |
| 161 | + accessible lower on the stack (such as through `RefCell` or `Rc`) which has a |
| 162 | + broken invariant, and the destructor may then witness this. |
| 163 | + |
| 164 | +[pet]: http://doc.rust-lang.org/std/sync/struct.PoisonError.html |
| 165 | + |
| 166 | +But all of these "subversions" fall outside the realm of normal, idiomatic, safe |
| 167 | +Rust code, and so they all serve as a "heads up" that panic safety might be an |
| 168 | +issue. Thus, in practice, Rust programmers worry about exception safety far less |
| 169 | +than in languages with full-blown exceptions. |
| 170 | + |
| 171 | +Despite these methods to subvert the mitigations placed by default in Rust, a |
| 172 | +key part of exception safety in Rust is that **safe code can never lead to |
| 173 | +memory unsafety**, regardless of whether it panics or not. Memory unsafety |
| 174 | +triggered as part of a panic can always be traced back to an `unsafe` block. |
| 175 | + |
| 176 | +With all that background out of the way now, let's take a look at the guts of |
| 177 | +this RFC. |
| 178 | + |
| 179 | +# Detailed design |
| 180 | + |
| 181 | +At its heart, the change this RFC is proposing is to move |
| 182 | +`std::thread::catch_panic` to a new `std::panic` module and rename the function |
| 183 | +to `catch`. Additionally, the `Send` bound from the closure parameter will be |
| 184 | +removed (`'static` will stay), modifying the signature to be: |
| 185 | + |
| 186 | +```rust |
| 187 | +fn recover<F: FnOnce() -> R + 'static, R>(f: F) -> thread::Result<R> |
| 188 | +``` |
| 189 | + |
| 190 | +More generally, however, this RFC also claims that this stable function does |
| 191 | +not radically alter Rust's exception safety story (explained above). |
| 192 | + |
| 193 | +## Will Rust have exceptions? |
| 194 | + |
| 195 | +In a technical sense this RFC is not "adding exceptions to Rust" as they already |
| 196 | +exist in the form of panics. What this RFC is adding, however, is a construct |
| 197 | +via which to catch these exceptions within a thread, bringing the standard |
| 198 | +library closer to the exception support in other languages. |
| 199 | + |
| 200 | +Catching a panic makes it easier to observe broken invariants of data structures |
| 201 | +shared across the `catch_panic` boundary, which can possibly increase the |
| 202 | +likelihood of exception safety issues arising. |
| 203 | + |
| 204 | +The risk of this step is that catching panics becomes an idiomatic way to deal |
| 205 | +with error-handling, thereby making exception safety much more of a headache |
| 206 | +than it is today (as it's more likely that a broken invariant is later |
| 207 | +witnessed). The `catch_panic` function is intended to only be used |
| 208 | +where it's absolutely necessary, e.g. for FFI boundaries, but how can it be |
| 209 | +ensured that `catch_panic` isn't overused? |
| 210 | + |
| 211 | +There are two key reasons `catch_panic` likely won't become idiomatic: |
| 212 | + |
| 213 | +1. There are already strong and established conventions around error handling, |
| 214 | + and in particular around the use of panic and `Result` with stabilized usage |
| 215 | + of them in the standard library. There is little chance these conventions |
| 216 | + would change overnight. |
| 217 | + |
| 218 | +2. There has long been a desire to treat every use of `panic!` as an abort |
| 219 | + which is motivated by portability, compile time, binary size, and a number of |
| 220 | + other factors. Assuming this step is taken, it would be extremely unwise for |
| 221 | + a library to signal expected errors via panics and rely on consumers using |
| 222 | + `catch_panic` to handle them. |
| 223 | + |
| 224 | +For reference, here's a summary of the conventions around `Result` and `panic`, |
| 225 | +which still hold good after this RFC: |
| 226 | + |
| 227 | +### Result vs Panic |
| 228 | + |
| 229 | +There are two primary strategies for signaling that a function can fail in Rust |
| 230 | +today: |
| 231 | + |
| 232 | +* `Results` represent errors/edge-cases that the author of the library knew |
| 233 | + about, and expects the consumer of the library to handle. |
| 234 | + |
| 235 | +* `panic`s represent errors that the author of the library did not expect to |
| 236 | + occur, such as a contract violation, and therefore does not expect the |
| 237 | + consumer to handle in any particular way. |
| 238 | + |
| 239 | +Another way to put this division is that: |
| 240 | + |
| 241 | +* `Result`s represent errors that carry additional contextual information. This |
| 242 | + information allows them to be handled by the caller of the function producing |
| 243 | + the error, modified with additional contextual information, and eventually |
| 244 | + converted into an error message fit for a top-level program. |
| 245 | + |
| 246 | +* `panic`s represent errors that carry no contextual information (except, |
| 247 | + perhaps, debug information). Because they represented an unexpected error, |
| 248 | + they cannot be easily handled by the caller of the function or presented to |
| 249 | + the top-level program (except to say "something unexpected has gone wrong"). |
| 250 | + |
| 251 | +Some pros of `Result` are that it signals specific edge cases that you as a |
| 252 | +consumer should think about handling and it allows the caller to decide |
| 253 | +precisely how to handle the error. A con with `Result` is that defining errors |
| 254 | +and writing down `Result` + `try!` is not always the most ergonomic. |
| 255 | + |
| 256 | +The pros and cons of `panic` are essentially the opposite of `Result`, being |
| 257 | +easy to use (nothing to write down other than the panic) but difficult to |
| 258 | +determine when a panic can happen or handle it in a custom fashion, even with |
| 259 | +`catch_panic`. |
| 260 | + |
| 261 | +These divisions justify the use of `panic`s for things like out-of-bounds |
| 262 | +indexing: such an error represents a programming mistake that (1) the author of |
| 263 | +the library was not aware of, by definition, and (2) cannot be meaningfully |
| 264 | +handled by the caller. |
| 265 | + |
| 266 | +In terms of heuristics for use, `panic`s should rarely if ever be used to report |
| 267 | +routine errors for example through communication with the system or through IO. |
| 268 | +If a Rust program shells out to `rustc`, and `rustc` is not found, it might be |
| 269 | +tempting to use a panic because the error is unexpected and hard to recover |
| 270 | +from. A user of the program, however, would benefit from intermediate code |
| 271 | +adding contextual information about the in-progress operation, and the program |
| 272 | +could report the error in terms a they can understand. While the error is |
| 273 | +rare, **when it happens it is not a programmer error**. In short, panics are |
| 274 | +roughly analogous to an opaque "an unexpected error has occurred" message. |
| 275 | + |
| 276 | +Stabilizing `catch_panic` does little to change the tradeoffs around `Result` |
| 277 | +and `panic` that led to these conventions. |
| 278 | + |
| 279 | +## Why remove `Send`? |
| 280 | + |
| 281 | +One of the primary use cases of `recover` is in an FFI context, where lots |
| 282 | +of `*mut` and `*const` pointers are flying around. These two types aren't |
| 283 | +`Send` by default, so having their values cross the `catch_panic` boundary |
| 284 | +would be highly un-ergonomic (albeit still possible). As a result, this RFC |
| 285 | +proposes removing the `Send` bound from the function. |
| 286 | + |
| 287 | +## Why keep `'static`? |
| 288 | + |
| 289 | +This RFC proposes leaving the `'static` bound on the closure parameter for now. |
| 290 | +There isn't a clearly strong case (such as for `Send`) to remove this parameter |
| 291 | +just yet, and it helps mitigate exception safety issues related to shared |
| 292 | +references across the `recover` boundary. |
| 293 | + |
| 294 | +There is conversely also not a clearly strong case for *keeping* this bound, but |
| 295 | +as it's the more conservative route (and backwards compatible to remove) it will |
| 296 | +remain for now. |
| 297 | + |
| 298 | +# Drawbacks |
| 299 | + |
| 300 | +A drawback of this RFC is that it can water down Rust's error handling story. |
| 301 | +With the addition of a "catch" construct for exceptions, it may be unclear to |
| 302 | +library authors whether to use panics or `Result` for their error types. As we |
| 303 | +discussed above, however, Rust's design around error handling has always had to |
| 304 | +deal with these two strategies, and our conventions don't materially change by |
| 305 | +stabilizing `catch_panic`. |
| 306 | + |
| 307 | +# Alternatives |
| 308 | + |
| 309 | +One alternative, which is somewhat more of an addition, is to have the standard |
| 310 | +library entirely abandon all exception safety mitigation tactics. As explained |
| 311 | +in the motivation section, exception safety will not lead to memory unsafety |
| 312 | +unless paired with unsafe code, so it is perhaps within the realm of possibility |
| 313 | +to remove the tactics of poisoning from mutexes and simply require that |
| 314 | +consumers deal with exception safety 100% of the time. |
| 315 | + |
| 316 | +This alternative is often motivated by saying that there are enough methods to |
| 317 | +subvert the default mitigation tactics that it's not worth trying to plug some |
| 318 | +holes and not others. Upon closer inspection, however, the areas where safe code |
| 319 | +needs to worry about exception safety are isolated to the single-threaded |
| 320 | +situations. For example `RefCell`, destructors, and `catch_panic` all only |
| 321 | +expose data possibly broken through a panic in a single thread. |
| 322 | + |
| 323 | +Once a thread boundary is crossed, the only current way to share data mutably is |
| 324 | +via `Mutex` or `RwLock`, both of which are poisoned by default. This sort of |
| 325 | +sharing is fundamental to threaded code, and poisoning by default allows safe |
| 326 | +code to freely use many threads without having to consider exception safety |
| 327 | +across threads (as poisoned data will tear down all connected threads). |
| 328 | + |
| 329 | +This property of multithreaded programming in Rust is seen as strong enough that |
| 330 | +poisoning should not be removed by default, and in fact a new hypothetical |
| 331 | +`thread::scoped` API (a rough counterpart of `catch_panic`) could also propagate |
| 332 | +panics by default (like poisoning) with an ability to opt out (like |
| 333 | +`PoisonError`). |
| 334 | + |
| 335 | +# Unresolved questions |
| 336 | + |
| 337 | +- Is it worth keeping the `'static` and `Send` bounds as a mitigation measure in |
| 338 | + practice, even if they aren't enforceable in theory? That would require thread |
| 339 | + pools to use unsafe code, but that could be acceptable. |
| 340 | + |
| 341 | +- Should `catch_panic` be stabilized within `std::thread` where it lives today, |
| 342 | + or somewhere else? |
0 commit comments