|
| 1 | +- Start Date: (fill me in with today's date, 2014-09-15) |
| 2 | +- RFC PR: (leave this empty) |
| 3 | +- Rust Issue: (leave this empty) |
| 4 | + |
| 5 | +# Summary |
| 6 | + |
| 7 | +This is a *conventions RFC* for settling the location of `unsafe` APIs relative |
| 8 | +to the types they work with, as well as the use of `raw` submodules. |
| 9 | + |
| 10 | +The brief summary is: |
| 11 | + |
| 12 | +* Unsafe APIs should be made into methods or static functions in the same cases |
| 13 | + that safe APIs would be. |
| 14 | + |
| 15 | +* `raw` submodules should be used only to *define* explicit low-level |
| 16 | + representations. |
| 17 | + |
| 18 | +# Motivation |
| 19 | + |
| 20 | +Many data structures provide unsafe APIs either for avoiding checks or working |
| 21 | +directly with their (otherwise private) representation. For example, `string` |
| 22 | +provides: |
| 23 | + |
| 24 | +* An `as_mut_vec` method on `String` that provides a `Vec<u8>` view of the |
| 25 | + string. This method makes it easy to work with the byte-based representation |
| 26 | + of the string, but thereby also allows violation of the utf8 guarantee. |
| 27 | + |
| 28 | +* A `raw` submodule with a number of free functions, like `from_parts`, that |
| 29 | + constructs a `String` instances from a raw-pointer-based representation, a |
| 30 | + `from_utf8` variant that does not actually check for utf8 validity, and so |
| 31 | + on. The unifying theme is that all of these functions avoid checking some key |
| 32 | + invariant. |
| 33 | + |
| 34 | +The problem is that currently, there is no clear/consistent guideline about |
| 35 | +which of these APIs should live as methods/static functions associated with a |
| 36 | +type, and which should live in a `raw` submodule. Both forms appear throughout |
| 37 | +the standard library. |
| 38 | + |
| 39 | +# Detailed design |
| 40 | + |
| 41 | +The proposed convention is: |
| 42 | + |
| 43 | +* When an unsafe function/method is clearly "about" a certain type (as a way of |
| 44 | + constructing, destructuring, or modifying values of that type), it should be a |
| 45 | + method or static function on that type. This is the same as the convention for |
| 46 | + placement of safe functions/methods. So functions like |
| 47 | + `string::raw::from_parts` would become static functions on `String`. |
| 48 | + |
| 49 | +* `raw` submodules should only be used to *define* low-level |
| 50 | + types/representations (and methods/functions on them). Methods for converting |
| 51 | + to/from such low-level types should be available directly on the high-level |
| 52 | + types. Examples: `core::raw`, `sync::raw`. |
| 53 | + |
| 54 | +The benefits are: |
| 55 | + |
| 56 | +* *Ergonomics*. You can gain easy access to unsafe APIs merely by having a value |
| 57 | + of the type (or, for static functions, importing the type). |
| 58 | + |
| 59 | +* *Consistency and simplicity*. The rules for placement of unsafe APIs are the |
| 60 | + same as those for safe APIs. |
| 61 | + |
| 62 | +The perspective here is that marking APIs `unsafe` is enough to deter their use |
| 63 | +in ordinary situations; they don't need to be further distinguished by placement |
| 64 | +into a separate module. |
| 65 | + |
| 66 | +There are also some naming conventions to go along with unsafe static functions |
| 67 | +and methods: |
| 68 | + |
| 69 | +* When an unsafe function/method is an unchecked variant of an otherwise safe |
| 70 | + API, it should be marked using an `_unchecked` suffix. |
| 71 | + |
| 72 | + For example, the `String` module should provide both `from_utf8` and |
| 73 | + `from_utf8_unchecked` constructors, where the latter does not actually check |
| 74 | + the utf8 encoding. The `string::raw::slice_bytes` and |
| 75 | + `string::raw::slice_unchecked` functions should be merged into a single |
| 76 | + `slice_unchecked` method on strings that checks neither bounds nor utf8 |
| 77 | + boundaries. |
| 78 | + |
| 79 | +* When an unsafe function/method produces or consumes a low-level representation |
| 80 | + of a data structure, the API should use `raw` in its name. Specifically, |
| 81 | + `from_raw_parts` is the typical name used for constructing a value from e.g. a |
| 82 | + pointer-based representation. |
| 83 | + |
| 84 | +* Otherwise, *consider* using a name that suggests *why* the API is unsafe. In |
| 85 | + some cases, like `String::as_mut_vec`, other stronger conventions apply, and the |
| 86 | + `unsafe` qualifier on the signature (together with API documentation) is |
| 87 | + enough. |
| 88 | + |
| 89 | +The unsafe methods and static functions for a given type should be placed in |
| 90 | +their own `impl` block, at the end of the module defining the type; this will |
| 91 | +ensure that they are grouped together in rustdoc. (Thanks @kballard for the |
| 92 | +suggestion.) |
| 93 | + |
| 94 | +# Drawbacks |
| 95 | + |
| 96 | +One potential drawback of these conventions is that the documentation for a |
| 97 | +module will be cluttered with rarely-used `unsafe` APIs, whereas the `raw` |
| 98 | +submodule approach neatly groups these APIs. But rustdoc could easily be |
| 99 | +changed to either hide or separate out `unsafe` APIs by default, and in the |
| 100 | +meantime the `impl` block grouping should help. |
| 101 | + |
| 102 | +More specifically, the convention of placing unsafe constructors in `raw` makes |
| 103 | +them very easy to find. But the usual `from_` convention, together with the |
| 104 | +naming conventions suggested above, should make it fairly easy to discover such |
| 105 | +constructors even when they're supplied directly as static functions. |
| 106 | + |
| 107 | +More generally, these conventions give `unsafe` APIs more equal status with safe |
| 108 | +APIs. Whether this is a *drawback* depends on your philosophy about the status |
| 109 | +of unsafe programming. But on a technical level, the key point is that the APIs |
| 110 | +are marked `unsafe`, so users still have to opt-in to using them. *Ed note: from |
| 111 | +my perspective, low-level/unsafe programming is important to support, and there |
| 112 | +is no reason to penalize its ergonomics given that it's opt-in anyway.* |
| 113 | + |
| 114 | +# Alternatives |
| 115 | + |
| 116 | +There are a few alternatives: |
| 117 | + |
| 118 | +* Rather than providing unsafe APIs directly as methods/static functions, they |
| 119 | + could be grouped into a single extension trait. For example, the `String` type |
| 120 | + could be accompanied by a `StringRaw` extension trait providing APIs for |
| 121 | + working with raw string representations. This would allow a clear grouping of |
| 122 | + unsafe APIs, while still providing them as methods/static functions and |
| 123 | + allowing them to easily be imported with e.g. `use std::string::StringRaw`. |
| 124 | + On the other hand, it still further penalizes the raw APIs (beyond marking |
| 125 | + them `unsafe`), and given that rustdoc could easily provide API grouping, it's |
| 126 | + unclear exactly what the benefit is. |
| 127 | + |
| 128 | +* ([Suggested by @kballard](https://github.com/rust-lang/rfcs/pull/240#issuecomment-55635468)): |
| 129 | + |
| 130 | + > Use `raw` for functions that construct a value of the type without checking |
| 131 | + > for one or more invariants. |
| 132 | +
|
| 133 | + The advantage is that it's easy to find such invariant-ignoring functions. The |
| 134 | + disadvantage is that their ergonomics is worsened, since they much be |
| 135 | + separately imported or referenced through a lengthy path: |
| 136 | + |
| 137 | + ```rust |
| 138 | + // Compare the ergonomics: |
| 139 | + string::raw::slice_unchecked(some_string, start, end) |
| 140 | + some_string.slice_unchecked(start, end) |
| 141 | + ``` |
| 142 | + |
| 143 | +* Another suggestion by @kballard is to keep the basic structure of `raw` |
| 144 | + submodules, but use associated types to improve the ergonomics. Details (and |
| 145 | + discussions of pros/cons) are in |
| 146 | + [this comment](https://github.com/rust-lang/rfcs/pull/240/files#r17572875). |
| 147 | + |
| 148 | +* Use `raw` submodules to group together *all* manipulation of low-level |
| 149 | + representations. No module in `std` currently does this; existing modules |
| 150 | + provide some free functions in `raw`, and some unsafe methods, without a clear |
| 151 | + driving principle. The ergonomics of moving *everything* into free functions |
| 152 | + in a `raw` submodule are quite poor. |
| 153 | + |
| 154 | +# Unresolved questions |
| 155 | + |
| 156 | +The `core::raw` module provides structs with public representations equivalent |
| 157 | +to several built-in and library types (boxes, closures, slices, etc.). It's not |
| 158 | +clear whether the name of this module, or the location of its contents, should |
| 159 | +change as a result of this RFC. The module is a special case, because not all of |
| 160 | +the types it deals with even have corresponding modules/type declarations -- so |
| 161 | +it probably suffices to leave decisions about it to the API stabilization |
| 162 | +process. |
0 commit comments