-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Document MaybeUninit bit validity #140463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
0792da3
21626ac
1494ec7
05e6a34
8d3a47e
75380ea
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -252,6 +252,47 @@ use crate::{fmt, intrinsics, ptr, slice}; | |
/// std::process::exit(*code); // UB! Accessing uninitialized memory. | ||
/// } | ||
/// ``` | ||
/// | ||
/// # Validity | ||
/// | ||
/// A `MaybeUninit<T>` has no validity requirement – any sequence of bytes of the appropriate length, | ||
/// initialized to any value or uninitialized, are a valid value of `MaybeUninit<T>`. Equivalently, | ||
/// it is always sound to perform `transmute::<[MaybeUninit<u8>; size_of::<T>()], MaybeUninit<T>>(...)`. | ||
/// | ||
/// Note that "round-tripping" via `MaybeUninit` does not always result in the original value. | ||
/// Concretely, given distinct `T` and `U` where `size_of::<T>() == size_of::<U>()`, the following | ||
/// code is not guaranteed to be sound: | ||
/// | ||
/// ```rust,no_run | ||
/// # use core::mem::{MaybeUninit, transmute}; | ||
/// # struct T; struct U; | ||
/// fn identity(t: T) -> T { | ||
/// unsafe { | ||
/// let u: MaybeUninit<U> = transmute(t); | ||
/// transmute(u) | ||
/// } | ||
/// } | ||
/// ``` | ||
/// | ||
/// If `T` contains initialized bytes at byte offsets where `U` contains padding bytes, these | ||
/// may not be preserved in `MaybeUninit<U>`, and so `transmute(u)` may produce a `T` with | ||
/// uninitialized bytes in these positions. This is an active area of discussion, and this code | ||
/// may become sound in the future. | ||
Comment on lines
+277
to
+280
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @RalfJung I'd like some advice on this. I'm confident that this is correct as written, but could we perhaps make a stronger statement? In particular, what happens if we round-trip a value which is invalid for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Per rust-lang/unsafe-code-guidelines#555 (comment), I've updated to the following text. Does that look good? /// Note that, so long as every byte position which is initialized in `T` is also initialized
/// in `U`, then the preceding `identity` example *is* sound.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As noted below I don't think the term "bytes initialized in
Comment on lines
+277
to
+280
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it makes sense to say that a type "contains initialized bytes" at some offset. That's a property of a representation. The typical term for representation bytes that are lost here is "padding". I don't think we have rigorously defined padding anywhere yet, but the term is sufficiently widely-used (and generally with a consistent meaning) that we may just be able to use it here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC, you're making two points:
Is that right? One thing that's probably worth distinguishing here is between values and layouts. In my mental model, an uninit byte is one of the possible values that a byte can have (e.g., it's the 257th value that can legally appear in a Based on this, maybe it's best to say:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No. I think both of the following concepts make sense:
But it makes less sense to talk about padding of a representation, or to talk about uninitialized bytes in a type. So for this PR, the two key points (and they are separate points) are:
The second point is just a logical consequence of the first, it does not add any new information. Not sure if it is worth mentioning. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Does this imply that a type contains padding bytes, not a type's representation? I'm thinking through the implications of what you said, and I think I understand something new that I didn't before, and I want to run it by you: In my existing mental model, a padding byte is a location in a type's layout such that every byte value at that location (including uninit) is valid (enums complicate this model, but I don't think that complication is relevant for this discussion - we can just stick to thinking about structs). The problem with this mental model is that, interpreted naively, it implies that different byte values in a padding byte could correspond to different logical values of the type. So e.g. in the type IIUC, by contrast your model is that the representation relation simply doesn't include padding bytes at all. So it'd be more accurate to describe the representation of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That's how I think about it. We can't tell which byte is a padding byte by looking at one representation -- it's a property of the type.
That would make the only byte of
The representation has 4 bytes. But only 3 of them actually affect the represented value (which is a tuple of two [mathematical] integers). We seem to be using the term "representation" slightly differently. For me, that's list a |
||
|
||
/// If byte offsets exists at which `T`'s representation does not permit uninitialized bytes but | ||
/// `U`'s representation does (e.g. due to padding), then the bytes in `T` at these offsets may | ||
/// not be preserved in `u`, and so `transmute(u)` may produce a `T` with uninitialized bytes at | ||
/// these offsets. This is an active area of discussion, and this code may become sound in the future. | ||
/// | ||
/// Note that, so long as no such byte offsets exist, then the preceding `identity` example *is* sound. | ||
/// | ||
/// # Provenance | ||
/// | ||
/// `MaybeUninit` values may contain [pointer provenance][provenance]. Concretely, for any | ||
/// value, `p: P`, which contains provenance, transmuting `p` to `MaybeUninit<[u8; size_of::<P>]>` | ||
/// and then back to `P` will produce a value identical to `p`, including provenance. | ||
/// | ||
/// [provenance]: ../ptr/index.html#provenance | ||
#[stable(feature = "maybe_uninit", since = "1.36.0")] | ||
// Lang item so we can wrap other types in it. This is useful for coroutines. | ||
#[lang = "maybe_uninit"] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this discussion here:
@RalfJung would you like me to add language like this to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I've added the following as a more concrete and fleshed out draft. I can edit or remove as preferred.