Skip to content

Commit 700a362

Browse files
committed
first draft
1 parent 87e4b60 commit 700a362

File tree

1 file changed

+134
-11
lines changed

1 file changed

+134
-11
lines changed

active_discussion/representation.md

Lines changed: 134 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,137 @@
1-
# Data structure representation
1+
# Data structure representation and validity requirements
22

3-
In general, Rust makes few guarantees about memory layout, unless you
4-
define your structs as `#[repr(rust)]`. But there are some things that
5-
we do guarantee. Let's write about them.
3+
## Introduction
64

7-
TODO:
5+
This discussion is meant to focus on two things:
86

9-
- Find and link to the various RFCs
10-
- Enumerate things that we *might* in fact guarantee, even for non-C types:
11-
- e.g., `&T` and `Option<&T>` are both pointer sized
12-
- size of `extern fn` etc (at least on some platforms)?
13-
- For which `T` is `None` represented as a "null pointer" etc?
14-
- (Which "niche" optimizations can we rely on)
7+
- What guarantees does Rust make regarding the layout of data structures?
8+
- What invariants does the compiler require from the various Rust types?
9+
- the "validity invariant", as defined in [Ralf's blog post][bp]
10+
- What invariants can safe code expect to hold for the various Rust types?
11+
- the "safety invariant", as defined in [Ralf's blog post][bp]
12+
13+
[bp]: https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html
14+
15+
### Layout of data structures
16+
17+
In general, Rust makes few guarantees about the memory layout of your
18+
structures. For example, by default, the compiler has the freedom to
19+
rearrange the field order of your structures for more efficiency (as
20+
of this writing, we try to minimize the overall size of your
21+
structure, but this is the sort of detail that can easily change). For
22+
safe code, of course, any rearrangements "just work" transparently.
23+
24+
If, however, you need to write unsafe code, you may wish to have a
25+
fixed data structure layout. In that case, there are ways to specify
26+
and control how an individual struct will be laid out -- notably with
27+
`#[repr]` annotations. One purpose of this section, then, is to layout
28+
what sorts of guarantees we offer when it comes to layout, and also
29+
what effect the various `#[repr]` annotations have.
30+
31+
### Validity invariant
32+
33+
The "validity invariant" for each type defines what must hold whenever
34+
a value of this type is considered to be initialized. The compiler expects
35+
the validity invariant to hold **at all times** and is thus allowed to use
36+
these invariants to (e.g.) affect the layout of data structures or do other
37+
optimizations.
38+
39+
Therefore, the validity invariant must **at minimum** justify all the
40+
layout optimizations that the compiler does. We may want a stronger
41+
invariant, however, so as to leave room for future optimization.
42+
43+
As an example, a value of `&T` type can never be null -- therefore,
44+
`Option<&T>` can use null to represent `None`.
45+
46+
### Safety invariant
47+
48+
The "safety invariant" for each type defines what must hold whenever
49+
safe code has access to a type.
50+
51+
This invariant must **at minimum** justify all the things that our
52+
type system allows without an `unsafe` keyword being required.
53+
54+
For example, a value of `&T` must be dereferencable, since safe code
55+
could always choose to dereference it.
56+
57+
## Goals
58+
59+
- Define what we guarantee about the layout of various types
60+
and the effect of `#[repr]` annotations.
61+
- Define the **safety requirements** of various types that safe
62+
code requires (and which unsafe code must uphold at the safe/unsafe boundary).
63+
- Define the **validity requirements** of various types that unsafe
64+
programmers must uphold at all times.
65+
- Also examine when/how we could dynamically check these requirements.
66+
- Uncover the sorts of constraints that we may wish to satisfy in the
67+
future.
68+
69+
## Some interesting examples and questions
70+
71+
- `&T` where `T: Sized`
72+
- This is **guaranteed** to be a non-null pointer
73+
- `Option<&T>` where `T: Sized`
74+
- This is **guaranteed** to be a nullable pointer
75+
- `Option<extern "C" fn()>`
76+
- `usize`
77+
- Platform dependent size, but guaranteed to be able to store a pointer?
78+
- Also an array length?
79+
- Uninitialized bits -- for which types are uninitialized bits valid?
80+
- If you have `struct A { .. }` and `struct B { .. }` with no
81+
`#[repr]` annotations, and they have the same field types, can we
82+
say that they will have the same layout?
83+
- or do we have the freedom to rearrange the types of `A` but not
84+
`B`, e.g. based on PGO results
85+
86+
## Active threads
87+
88+
To start, we will create threads for each major categories of types
89+
(with a few suggested focus points):
90+
91+
- Integers and floating points
92+
- What about uninitialized values?
93+
- Booleans
94+
- Prior discussions ([#46156][], [#46176][]) documented bool as a single
95+
byte that is either 0 or 1.
96+
- Enums
97+
- See dedicated thread about "niches" and `Option`-style layout optimization
98+
below.
99+
- Define: C-like enum
100+
- Can a C-like enum ever have an invalid discriminant? (Presumably not)
101+
- Empty enums and the `!` type
102+
- [RFC 2195][] defined the layout of `#[repr(C)]` enums with payloads.
103+
- [RFC 2363][] offers a proposal to permit specifying discriminations.
104+
- Structs
105+
- Do we ever say *anything* about how a `#[repr(rust)]` struct is laid out?
106+
- e.g., what about different structs with same definition
107+
- across executions of the same program?
108+
- Tuples
109+
- Are these effectively anonymous structs?
110+
- Unions
111+
- Can we ever say anything about the initialized contents of a union?
112+
- Is `#[repr(C)]` meaningful on a union?
113+
- Fn pointers (`fn()`, `extern "C" fn()`)
114+
- References `&T` and `&mut T`
115+
- Out of scope: aliasing rules
116+
- We currently tell LLVM they are aligned and dereferenceable, have to justify that
117+
- Safe code may use them also
118+
- Raw pointers
119+
- Effectively same as integers?
120+
- Representation knobs:
121+
- Custom alignment ([RFC 1358])
122+
- Packed ([RFC 1240] talks about some safety issues)
123+
- ... what else?
124+
125+
We will also create categories for the following specific areas:
126+
127+
- Niches: Optimizing `Option`-like enums
128+
- Uninitialized memory: when/where are uninitializes values permitted, if ever?
129+
- ... what else?
130+
131+
132+
[#46156]: https://github.com/rust-lang/rust/pull/46156
133+
[#46176]: https://github.com/rust-lang/rust/pull/46176
134+
[RFC 2363]: https://github.com/rust-lang/rfcs/pull/2363
135+
[RFC 2195]: https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html
136+
[RFC 1358]: https://rust-lang.github.io/rfcs/1358-repr-align.html
137+
[RFC 1240]: https://rust-lang.github.io/rfcs/1240-repr-packed-unsafe-ref.html

0 commit comments

Comments
 (0)