Skip to content

Commit c24d232

Browse files
committed
Merge remote-tracking branch 'aturon/unsafe-api-location'
2 parents fe84740 + 2e00888 commit c24d232

File tree

1 file changed

+162
-0
lines changed

1 file changed

+162
-0
lines changed

active/0000-unsafe-api-location.md

+162
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
- Start Date: (fill me in with today's date, 2014-09-15)
2+
- RFC PR: (leave this empty)
3+
- Rust Issue: (leave this empty)
4+
5+
# Summary
6+
7+
This is a *conventions RFC* for settling the location of `unsafe` APIs relative
8+
to the types they work with, as well as the use of `raw` submodules.
9+
10+
The brief summary is:
11+
12+
* Unsafe APIs should be made into methods or static functions in the same cases
13+
that safe APIs would be.
14+
15+
* `raw` submodules should be used only to *define* explicit low-level
16+
representations.
17+
18+
# Motivation
19+
20+
Many data structures provide unsafe APIs either for avoiding checks or working
21+
directly with their (otherwise private) representation. For example, `string`
22+
provides:
23+
24+
* An `as_mut_vec` method on `String` that provides a `Vec<u8>` view of the
25+
string. This method makes it easy to work with the byte-based representation
26+
of the string, but thereby also allows violation of the utf8 guarantee.
27+
28+
* A `raw` submodule with a number of free functions, like `from_parts`, that
29+
constructs a `String` instances from a raw-pointer-based representation, a
30+
`from_utf8` variant that does not actually check for utf8 validity, and so
31+
on. The unifying theme is that all of these functions avoid checking some key
32+
invariant.
33+
34+
The problem is that currently, there is no clear/consistent guideline about
35+
which of these APIs should live as methods/static functions associated with a
36+
type, and which should live in a `raw` submodule. Both forms appear throughout
37+
the standard library.
38+
39+
# Detailed design
40+
41+
The proposed convention is:
42+
43+
* When an unsafe function/method is clearly "about" a certain type (as a way of
44+
constructing, destructuring, or modifying values of that type), it should be a
45+
method or static function on that type. This is the same as the convention for
46+
placement of safe functions/methods. So functions like
47+
`string::raw::from_parts` would become static functions on `String`.
48+
49+
* `raw` submodules should only be used to *define* low-level
50+
types/representations (and methods/functions on them). Methods for converting
51+
to/from such low-level types should be available directly on the high-level
52+
types. Examples: `core::raw`, `sync::raw`.
53+
54+
The benefits are:
55+
56+
* *Ergonomics*. You can gain easy access to unsafe APIs merely by having a value
57+
of the type (or, for static functions, importing the type).
58+
59+
* *Consistency and simplicity*. The rules for placement of unsafe APIs are the
60+
same as those for safe APIs.
61+
62+
The perspective here is that marking APIs `unsafe` is enough to deter their use
63+
in ordinary situations; they don't need to be further distinguished by placement
64+
into a separate module.
65+
66+
There are also some naming conventions to go along with unsafe static functions
67+
and methods:
68+
69+
* When an unsafe function/method is an unchecked variant of an otherwise safe
70+
API, it should be marked using an `_unchecked` suffix.
71+
72+
For example, the `String` module should provide both `from_utf8` and
73+
`from_utf8_unchecked` constructors, where the latter does not actually check
74+
the utf8 encoding. The `string::raw::slice_bytes` and
75+
`string::raw::slice_unchecked` functions should be merged into a single
76+
`slice_unchecked` method on strings that checks neither bounds nor utf8
77+
boundaries.
78+
79+
* When an unsafe function/method produces or consumes a low-level representation
80+
of a data structure, the API should use `raw` in its name. Specifically,
81+
`from_raw_parts` is the typical name used for constructing a value from e.g. a
82+
pointer-based representation.
83+
84+
* Otherwise, *consider* using a name that suggests *why* the API is unsafe. In
85+
some cases, like `String::as_mut_vec`, other stronger conventions apply, and the
86+
`unsafe` qualifier on the signature (together with API documentation) is
87+
enough.
88+
89+
The unsafe methods and static functions for a given type should be placed in
90+
their own `impl` block, at the end of the module defining the type; this will
91+
ensure that they are grouped together in rustdoc. (Thanks @kballard for the
92+
suggestion.)
93+
94+
# Drawbacks
95+
96+
One potential drawback of these conventions is that the documentation for a
97+
module will be cluttered with rarely-used `unsafe` APIs, whereas the `raw`
98+
submodule approach neatly groups these APIs. But rustdoc could easily be
99+
changed to either hide or separate out `unsafe` APIs by default, and in the
100+
meantime the `impl` block grouping should help.
101+
102+
More specifically, the convention of placing unsafe constructors in `raw` makes
103+
them very easy to find. But the usual `from_` convention, together with the
104+
naming conventions suggested above, should make it fairly easy to discover such
105+
constructors even when they're supplied directly as static functions.
106+
107+
More generally, these conventions give `unsafe` APIs more equal status with safe
108+
APIs. Whether this is a *drawback* depends on your philosophy about the status
109+
of unsafe programming. But on a technical level, the key point is that the APIs
110+
are marked `unsafe`, so users still have to opt-in to using them. *Ed note: from
111+
my perspective, low-level/unsafe programming is important to support, and there
112+
is no reason to penalize its ergonomics given that it's opt-in anyway.*
113+
114+
# Alternatives
115+
116+
There are a few alternatives:
117+
118+
* Rather than providing unsafe APIs directly as methods/static functions, they
119+
could be grouped into a single extension trait. For example, the `String` type
120+
could be accompanied by a `StringRaw` extension trait providing APIs for
121+
working with raw string representations. This would allow a clear grouping of
122+
unsafe APIs, while still providing them as methods/static functions and
123+
allowing them to easily be imported with e.g. `use std::string::StringRaw`.
124+
On the other hand, it still further penalizes the raw APIs (beyond marking
125+
them `unsafe`), and given that rustdoc could easily provide API grouping, it's
126+
unclear exactly what the benefit is.
127+
128+
* ([Suggested by @kballard](https://github.com/rust-lang/rfcs/pull/240#issuecomment-55635468)):
129+
130+
> Use `raw` for functions that construct a value of the type without checking
131+
> for one or more invariants.
132+
133+
The advantage is that it's easy to find such invariant-ignoring functions. The
134+
disadvantage is that their ergonomics is worsened, since they much be
135+
separately imported or referenced through a lengthy path:
136+
137+
```rust
138+
// Compare the ergonomics:
139+
string::raw::slice_unchecked(some_string, start, end)
140+
some_string.slice_unchecked(start, end)
141+
```
142+
143+
* Another suggestion by @kballard is to keep the basic structure of `raw`
144+
submodules, but use associated types to improve the ergonomics. Details (and
145+
discussions of pros/cons) are in
146+
[this comment](https://github.com/rust-lang/rfcs/pull/240/files#r17572875).
147+
148+
* Use `raw` submodules to group together *all* manipulation of low-level
149+
representations. No module in `std` currently does this; existing modules
150+
provide some free functions in `raw`, and some unsafe methods, without a clear
151+
driving principle. The ergonomics of moving *everything* into free functions
152+
in a `raw` submodule are quite poor.
153+
154+
# Unresolved questions
155+
156+
The `core::raw` module provides structs with public representations equivalent
157+
to several built-in and library types (boxes, closures, slices, etc.). It's not
158+
clear whether the name of this module, or the location of its contents, should
159+
change as a result of this RFC. The module is a special case, because not all of
160+
the types it deals with even have corresponding modules/type declarations -- so
161+
it probably suffices to leave decisions about it to the API stabilization
162+
process.

0 commit comments

Comments
 (0)