Skip to content

Commit 6c3c48d

Browse files
authored
Merge pull request #1909 from arielb1/unsized-rvalues
Unsized Rvalues
2 parents 28bdbbc + b60e08a commit 6c3c48d

File tree

1 file changed

+203
-0
lines changed

1 file changed

+203
-0
lines changed

text/1909-unsized-rvalues.md

+203
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
- Feature Name: unsized_locals
2+
- Start Date: 2017-02-11
3+
- RFC PR: https://github.com/rust-lang/rfcs/pull/1909
4+
- Rust Issue: https://github.com/rust-lang/rust/issues/48055
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
Allow for local variables, function arguments, and some expressions to have an unsized type, and implement it by storing the temporaries in variably-sized allocas.
10+
11+
Have repeat expressions with a length that captures local variables be such an expression, returning an `[T]` slice.
12+
13+
Provide some optimization guarantees that unnecessary temporaries will not create unnecessary allocas.
14+
15+
# Motivation
16+
[motivation]: #motivation
17+
18+
There are 2 motivations for this RFC:
19+
20+
1. Passing unsized values, such as trait objects, to functions by value is often desired. Currently, this must be done through a `Box<T>` with an unnecessary allocation.
21+
22+
One particularly common example is passing closures that consume their environment without using monomorphization. One would like for this code to work:
23+
24+
```Rust
25+
fn takes_closure(f: FnOnce()) { f(); }
26+
```
27+
28+
But today you have to use a hack, such as taking a `Box<FnBox<()>>`.
29+
30+
2. Allocating a runtime-sized variable on the stack is important for good performance in some use-cases - see RFC #1808, which this is intended to supersede.
31+
32+
# Detailed design
33+
[design]: #detailed-design
34+
35+
## Unsized Rvalues - language
36+
37+
Remove the rule that requires all locals and rvalues to have a sized type. Instead, require the following:
38+
39+
1. The following expressions must always return a Sized type:
40+
1. Function calls, method calls, operator expressions
41+
- implementing unsized return values for function calls would require the *called function* to do the alloca in our stack frame.
42+
2. ADT expressions
43+
- see alternatives
44+
3. cast expressions
45+
- this seems like an implementation simplicity thing. These can only be trivial casts.
46+
2. The RHS of assignment expressions must always have a Sized type.
47+
- Assigning an unsized type is impossible because we don't know how much memory is available at the destination. This applies to ExprAssign assignments and not to StmtLet let-statements.
48+
49+
This also allows passing unsized values to functions, with the ABI being as if a `&move` pointer was passed (a `(by-move-data, extra)` pair). This also means that methods taking `self` by value are object-safe, though vtable shims are sometimes needed to translate the ABI (as the callee-side intentionally does not pass `extra` to the fn in the vtable, no vtable shim is needed if the vtable function already takes its argument indirectly).
50+
51+
For example:
52+
53+
```Rust
54+
struct StringData {
55+
len: usize,
56+
data: [u8],
57+
}
58+
59+
fn foo(s1: Box<StringData>, s2: Box<StringData>, cond: bool) {
60+
// this creates a VLA copy of either `s1.1` or `s2.1` on
61+
// the stack.
62+
let mut s = if cond {
63+
s1.data
64+
} else {
65+
s2.data
66+
};
67+
drop(s1);
68+
drop(s2);
69+
foo(s);
70+
}
71+
72+
fn example(f: for<'a> FnOnce(&'a X<'a>)) {
73+
let x = X::new();
74+
f(x); // aka FnOnce::call_once(f, (x,));
75+
}
76+
```
77+
78+
## VLA expressions
79+
80+
Allow repeat expressions to capture variables from their surrounding environment. If a repeat expression captures such a variable, it has type `[T]` with the length being evaluated at run-time. If the repeat expression does not capture any variable, the length is evaluated at compile-time. For example:
81+
```Rust
82+
extern "C" {
83+
fn random() -> usize;
84+
}
85+
86+
fn foo(n: usize) {
87+
let x = [0u8; n]; // x: [u8]
88+
let x = [0u8; n + (random() % 100)]; // x: [u8]
89+
let x = [0u8; 42]; // x: [u8; 42], like today
90+
let x = [0u8; random() % 100]; //~ ERROR constant evaluation error
91+
}
92+
```
93+
"captures a variable" - as in RFC #1558 - is used as the condition for making the return be `[T]` because it is simple, easy to understand, and introduces no type-checking complications.
94+
95+
The last error message could have a user-helpful note, for example "extract the length to a local variable if you want a variable-length array".
96+
97+
## Unsized Rvalues - MIR
98+
99+
The way this is implemented in MIR is that operands, rvalues, and temporaries are allowed to be unsized. An unsized operand is always "by-ref". Unsized rvalues are either a `Use` or a `Repeat` and both can be translated easily.
100+
101+
Unsized locals can never be reassigned within a scope. When first assigning to an unsized local, a stack allocation is made with the correct size.
102+
103+
MIR construction remains unchanged.
104+
105+
## Guaranteed Temporary Elision
106+
107+
MIR likes to create lots of temporaries for OOE reason. We should optimize them out in a guaranteed way in these cases (FIXME: extend these guarantees to locals aka NRVO?).
108+
109+
TODO: add description of problem & solution.
110+
111+
# How We Teach This
112+
[teach]: #how-we-teach-this
113+
114+
Passing arguments to functions by value should not be too complicated to teach. I would like VLAs to be mentioned in the book.
115+
116+
The "guaranteed temporary elimination" rules require more work to teach. It might be better to come up with new rules entirely.
117+
118+
# Drawbacks
119+
[drawbacks]: #drawbacks
120+
121+
In Unsafe code, it is very easy to create unintended temporaries, such as in:
122+
```Rust
123+
unsafe fn poke(ptr: *mut [u8]) { /* .. */ }
124+
unsafe fn foo(mut a: [u8]) {
125+
let ptr: *mut [u8] = &mut a;
126+
// here, `a` must be copied to a temporary, because
127+
// `poke(ptr)` might access the original.
128+
bar(a, poke(ptr));
129+
}
130+
```
131+
132+
If we make `[u8]` be `Copy`, that would be even easier, because even uses of `poke(ptr);` after the function call could potentially access the supposedly-valid data behind `a`.
133+
134+
And even if it is not as easy, it is possible to accidentally create temporaries in safe code.
135+
136+
Unsized temporaries are dangerous - they can easily cause aborts through stack overflow.
137+
138+
# Alternatives
139+
[alternatives]: #alternatives
140+
141+
## The bikeshed
142+
143+
There are several alternative options for the VLA syntax.
144+
145+
1. The RFC choice, `[t; φ]` has type `[T; φ]` if `φ` captures no variables and type `[T]` if φ captures a variable.
146+
- pro: can be understood using "HIR"/resolution only.
147+
- pro: requires no additional syntax.
148+
- con: might be confusing at first glance.
149+
- con: `[t; foo()]` requires the length to be extracted to a local.
150+
2. The "permissive" choice: `[t; φ]` has type `[T; φ]` if `φ` is a constexpr, otherwise `[T]`
151+
- pro: allows the most code
152+
- pro: requires no additional syntax.
153+
- con: depends on what is exactly a const expression. This is a big issue because that is both non-local and might change between rustc versions.
154+
3. Use the expected type - `[t; φ]` has type `[T]` if it is evaluated in a context that expects that type (for example `[t; foo()]: [T]`) and `[T; _]` otherwise.
155+
- pro: in most cases, very human-visible.
156+
- pro: requires no additional syntax.
157+
- con: relies on the notion of "expected type". While I think we *do* have to rely on that in the unsafe code semantics of `&foo` borrow expressions (as in, whether a borrow is treated as a "safe" or "unsafe" borrow - I'll write more details sometime), it might be better to not rely on expected types too much.
158+
4. use an explicit syntax, for example `[t; virtual φ]`.
159+
- bikeshed: exact syntax.
160+
- pro: very explicit and visible.
161+
- con: more syntax.
162+
5. use an intrinsic, `std::intrinsics::repeat(t, n)` or something.
163+
- pro: theoretically minimizes changes to the language.
164+
- con: requires returning unsized values from intrinsics.
165+
- con: unergonomic to use.
166+
167+
## Unsized ADT Expressions
168+
169+
Allowing unsized ADT expressions would make unsized structs constructible without using unsafe code, as in:
170+
```Rust
171+
let len_ = s.len();
172+
let p = Box::new(PascalString {
173+
length: len_,
174+
data: *s
175+
});
176+
```
177+
178+
However, without some way to guarantee that this can be done without allocas, that might be a large footgun.
179+
180+
## Copy Slices
181+
182+
One somewhat-orthogonal proposal that came up was to make `Clone` (and therefore `Copy`) not depend on `Sized`, and to make `[u8]` be `Copy`, by moving the `Self: Sized` bound from the trait to the methods, i.e. using the following declaration:
183+
```Rust
184+
pub trait Clone {
185+
fn clone(&self) -> Self where Self: Sized;
186+
fn clone_from(&mut self, source: &Self) where Self: Sized {
187+
// ...
188+
}
189+
}
190+
```
191+
192+
That would be a backwards-compatability-breaking change, because today `T: Clone + ?Sized` (or of course `Self: Clone` in a trait context, with no implied `Self: Sized`) implies that `T: Sized`, but it might be that its impact is small enough to allow (and even if not, it might be worth it for Rust 2.0).
193+
194+
# Unresolved questions
195+
[unresolved]: #unresolved-questions
196+
197+
How can we mitigate the risk of unintended unsized or large allocas? Note that the problem already exists today with large structs/arrays. A MIR lint against large/variable stack sizes would probably help users avoid these stack overflows. Do we want it in Clippy? rustc?
198+
199+
How do we handle truely-unsized DSTs when we get them? They can theoretically be passed to functions, but they can never be put in temporaries.
200+
201+
Accumulative allocas (aka `'fn` borrows) are beyond the scope of this RFC.
202+
203+
See alternatives.

0 commit comments

Comments
 (0)