|
1 |
| -# Data structure representation |
| 1 | +# Data structure representation and validity requirements |
2 | 2 |
|
3 |
| -In general, Rust makes few guarantees about memory layout, unless you |
4 |
| -define your structs as `#[repr(rust)]`. But there are some things that |
5 |
| -we do guarantee. Let's write about them. |
| 3 | +## Introduction |
6 | 4 |
|
7 |
| -TODO: |
| 5 | +This discussion is meant to focus on two things: |
8 | 6 |
|
9 |
| -- Find and link to the various RFCs |
10 |
| -- Enumerate things that we *might* in fact guarantee, even for non-C types: |
11 |
| - - e.g., `&T` and `Option<&T>` are both pointer sized |
12 |
| - - size of `extern fn` etc (at least on some platforms)? |
13 |
| - - For which `T` is `None` represented as a "null pointer" etc? |
14 |
| - - (Which "niche" optimizations can we rely on) |
| 7 | +- What guarantees does Rust make regarding the layout of data structures? |
| 8 | +- What invariants does the compiler require from the various Rust types? |
| 9 | + - the "validity invariant", as defined in [Ralf's blog post][bp] |
| 10 | +- What invariants can safe code expect to hold for the various Rust types? |
| 11 | + - the "safety invariant", as defined in [Ralf's blog post][bp] |
| 12 | + |
| 13 | +[bp]: https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html |
| 14 | + |
| 15 | +### Layout of data structures |
| 16 | + |
| 17 | +In general, Rust makes few guarantees about the memory layout of your |
| 18 | +structures. For example, by default, the compiler has the freedom to |
| 19 | +rearrange the field order of your structures for more efficiency (as |
| 20 | +of this writing, we try to minimize the overall size of your |
| 21 | +structure, but this is the sort of detail that can easily change). For |
| 22 | +safe code, of course, any rearrangements "just work" transparently. |
| 23 | + |
| 24 | +If, however, you need to write unsafe code, you may wish to have a |
| 25 | +fixed data structure layout. In that case, there are ways to specify |
| 26 | +and control how an individual struct will be laid out -- notably with |
| 27 | +`#[repr]` annotations. One purpose of this section, then, is to layout |
| 28 | +what sorts of guarantees we offer when it comes to layout, and also |
| 29 | +what effect the various `#[repr]` annotations have. |
| 30 | + |
| 31 | +### Validity invariant |
| 32 | + |
| 33 | +The "validity invariant" for each type defines what must hold whenever |
| 34 | +a value of this type is considered to be initialized. The compiler expects |
| 35 | +the validity invariant to hold **at all times** and is thus allowed to use |
| 36 | +these invariants to (e.g.) affect the layout of data structures or do other |
| 37 | +optimizations. |
| 38 | + |
| 39 | +Therefore, the validity invariant must **at minimum** justify all the |
| 40 | +layout optimizations that the compiler does. We may want a stronger |
| 41 | +invariant, however, so as to leave room for future optimization. |
| 42 | + |
| 43 | +As an example, a value of `&T` type can never be null -- therefore, |
| 44 | +`Option<&T>` can use null to represent `None`. |
| 45 | + |
| 46 | +### Safety invariant |
| 47 | + |
| 48 | +The "safety invariant" for each type defines what must hold whenever |
| 49 | +safe code has access to a type. |
| 50 | + |
| 51 | +This invariant must **at minimum** justify all the things that our |
| 52 | +type system allows without an `unsafe` keyword being required. |
| 53 | + |
| 54 | +For example, a value of `&T` must be dereferencable, since safe code |
| 55 | +could always choose to dereference it. |
| 56 | + |
| 57 | +## Goals |
| 58 | + |
| 59 | +- Define what we guarantee about the layout of various types |
| 60 | + and the effect of `#[repr]` annotations. |
| 61 | +- Define the **safety requirements** of various types that safe |
| 62 | + code requires (and which unsafe code must uphold at the safe/unsafe boundary). |
| 63 | +- Define the **validity requirements** of various types that unsafe |
| 64 | + programmers must uphold at all times. |
| 65 | + - Also examine when/how we could dynamically check these requirements. |
| 66 | +- Uncover the sorts of constraints that we may wish to satisfy in the |
| 67 | + future. |
| 68 | + |
| 69 | +## Some interesting examples and questions |
| 70 | + |
| 71 | +- `&T` where `T: Sized` |
| 72 | + - This is **guaranteed** to be a non-null pointer |
| 73 | +- `Option<&T>` where `T: Sized` |
| 74 | + - This is **guaranteed** to be a nullable pointer |
| 75 | +- `Option<extern "C" fn()>` |
| 76 | +- `usize` |
| 77 | + - Platform dependent size, but guaranteed to be able to store a pointer? |
| 78 | + - Also an array length? |
| 79 | +- Uninitialized bits -- for which types are uninitialized bits valid? |
| 80 | +- If you have `struct A { .. }` and `struct B { .. }` with no |
| 81 | + `#[repr]` annotations, and they have the same field types, can we |
| 82 | + say that they will have the same layout? |
| 83 | + - or do we have the freedom to rearrange the types of `A` but not |
| 84 | + `B`, e.g. based on PGO results |
| 85 | + |
| 86 | +## Active threads |
| 87 | + |
| 88 | +To start, we will create threads for each major categories of types |
| 89 | +(with a few suggested focus points): |
| 90 | + |
| 91 | +- Integers and floating points |
| 92 | + - What about uninitialized values? |
| 93 | +- Booleans |
| 94 | + - Prior discussions ([#46156][], [#46176][]) documented bool as a single |
| 95 | + byte that is either 0 or 1. |
| 96 | +- Enums |
| 97 | + - See dedicated thread about "niches" and `Option`-style layout optimization |
| 98 | + below. |
| 99 | + - Define: C-like enum |
| 100 | + - Can a C-like enum ever have an invalid discriminant? (Presumably not) |
| 101 | + - Empty enums and the `!` type |
| 102 | + - [RFC 2195][] defined the layout of `#[repr(C)]` enums with payloads. |
| 103 | + - [RFC 2363][] offers a proposal to permit specifying discriminations. |
| 104 | +- Structs |
| 105 | + - Do we ever say *anything* about how a `#[repr(rust)]` struct is laid out? |
| 106 | + - e.g., what about different structs with same definition |
| 107 | + - across executions of the same program? |
| 108 | +- Tuples |
| 109 | + - Are these effectively anonymous structs? |
| 110 | +- Unions |
| 111 | + - Can we ever say anything about the initialized contents of a union? |
| 112 | + - Is `#[repr(C)]` meaningful on a union? |
| 113 | +- Fn pointers (`fn()`, `extern "C" fn()`) |
| 114 | +- References `&T` and `&mut T` |
| 115 | + - Out of scope: aliasing rules |
| 116 | + - We currently tell LLVM they are aligned and dereferenceable, have to justify that |
| 117 | + - Safe code may use them also |
| 118 | +- Raw pointers |
| 119 | + - Effectively same as integers? |
| 120 | +- Representation knobs: |
| 121 | + - Custom alignment ([RFC 1358]) |
| 122 | + - Packed ([RFC 1240] talks about some safety issues) |
| 123 | +- ... what else? |
| 124 | + |
| 125 | +We will also create categories for the following specific areas: |
| 126 | + |
| 127 | +- Niches: Optimizing `Option`-like enums |
| 128 | +- Uninitialized memory: when/where are uninitializes values permitted, if ever? |
| 129 | +- ... what else? |
| 130 | + |
| 131 | + |
| 132 | +[#46156]: https://github.com/rust-lang/rust/pull/46156 |
| 133 | +[#46176]: https://github.com/rust-lang/rust/pull/46176 |
| 134 | +[RFC 2363]: https://github.com/rust-lang/rfcs/pull/2363 |
| 135 | +[RFC 2195]: https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html |
| 136 | +[RFC 1358]: https://rust-lang.github.io/rfcs/1358-repr-align.html |
| 137 | +[RFC 1240]: https://rust-lang.github.io/rfcs/1240-repr-packed-unsafe-ref.html |
0 commit comments