Skip to content

Commit 9e2260e

Browse files
committed
Unsized Rvalues
1 parent b23226f commit 9e2260e

File tree

1 file changed

+167
-0
lines changed

1 file changed

+167
-0
lines changed

text/0000-unsized-rvalues.md

Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
- Feature Name: unsized_locals
2+
- Start Date: 2017-02-11
3+
- RFC PR: (leave this empty)
4+
- Rust Issue: (leave this empty)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
Allow for local variables, function arguments, and some expressions to have an unsized type, and implement it by storing the temporaries in variably-sized allocas.
10+
11+
Have repeat expressions with a length that captures local variables be such an expression, returning an `[T]` slice.
12+
13+
Provide some optimization guarantees that unnecessary temporaries will not create unnecessary allocas.
14+
15+
# Motivation
16+
[motivation]: #motivation
17+
18+
There are 2 motivations for this RFC:
19+
20+
1) Passing unsized values, such as trait objects, to functions by value is often desired. Currently, this must be done through a `Box<T>` with an unnecessary allocation.
21+
22+
One particularly common example is passing closures that consume their environment without using monomorphization. One would like for this code to work:
23+
24+
```Rust
25+
fn takes_closure(f: FnOnce()) { f(); }
26+
```
27+
28+
But today you have to use a hack, such as taking a `Box<FnBox<()>>`.
29+
30+
2) Allocating a runtime-sized variable on the stack is important for good performance in some use-cases - see RFC #1808, which this is intended to supersede.
31+
32+
# Detailed design
33+
[design]: #detailed-design
34+
35+
## Unsized Rvalues - language
36+
37+
Remove the rule that requires all locals and rvalues to have a sized type. Instead, require the following:
38+
a) The following expressions must always return a Sized type:
39+
a1) Function calls, method calls, operator expressions
40+
- implementing unsized return values for function calls would require the *called function* to do the alloca in our stack frame.
41+
a2) ADT expressions
42+
- see alternatives
43+
a3) cast expressions
44+
- this seems like an implementation simplicity thing. These can only be trivial casts.
45+
b) The RHS of assignment expressions must always have a Sized type.
46+
- Assigning an unsized type is impossible because we don't know how much memory is available at the destination. This applies to ExprAssign assignments and not to StmtLet let-statements.
47+
48+
This also allows passing unsized values to functions, with the ABI being as if a `&move` pointer was passed (a `(by-move-data, extra)` pair). This also means that methods taking `self` by value are object-safe, though vtable shims are sometimes needed to translate the ABI (as the callee-side intentionally does not pass `extra` to the fn in the vtable, no vtable shim is needed if the vtable function already takes its argument indirectly).
49+
50+
For example:
51+
52+
```Rust
53+
struct StringData {
54+
len: usize,
55+
data: [u8],
56+
}
57+
58+
fn foo(s1: Box<StringData>, s2: Box<StringData>, cond: bool) {
59+
// this creates a VLA copy of either `s1.1` or `s2.1` on
60+
// the stack.
61+
let mut s = if cond {
62+
s1.1
63+
} else {
64+
s2.1
65+
};
66+
drop(s1);
67+
drop(s2);
68+
foo(s);
69+
}
70+
71+
fn example(f: for<'a> FnOnce(&'a X<'a>)) {
72+
let x = X::new();
73+
f(x); // aka FnOnce::call_once(f, (x,));
74+
}
75+
```
76+
77+
## VLA expressions
78+
79+
Allow repeat expressions to capture variables from their surrounding environment. If a repeat expression captures such a variable, it has type `[T]` with the length being evaluated at run-time. If the repeat expression does not capture any variable, the length is evaluated at compile-time. For example:
80+
```Rust
81+
extern "C" {
82+
fn random() -> usize;
83+
}
84+
85+
fn foo(n: usize) {
86+
let x = [0u8; n]; // x: [u8]
87+
let x = [0u8; n + (random() % 100)]; // x: [u8]
88+
let x = [0u8; 42]; // x: [u8; 42], like today
89+
let x = [0u8; random() % 100]; //~ ERROR constant evaluation error
90+
}
91+
```
92+
93+
"captures a variable" - as in RFC #1558 - is used as the condition for making the return be `[T]` because it is simple, easy to understand, and introduces no type-checking complications.
94+
95+
## Unsized Rvalues - MIR
96+
97+
The way this is implemented in MIR is that operands, rvalues, and temporaries are allowed to be unsized. An unsized operand is always "by-ref". Unsized rvalues are either a `Use` or a `Repeat` and both can be translated easily.
98+
99+
Unsized locals can never be reassigned within a scope. When first assigning to an unsized local, a stack allocation is made with the correct size.
100+
101+
MIR construction remains unchanged.
102+
103+
## Guaranteed Temporary Elision
104+
105+
MIR likes to create lots of temporaries for OOE reason. We should optimize them out in a guaranteed way in these cases (FIXME: extend these guarantees to locals aka NRVO?).
106+
107+
TODO: add description of problem & solution.
108+
109+
# How We Teach This
110+
[teach]: #how-we-teach-this
111+
112+
Passing arguments to functions by value should not be too complicated to teach. I would like VLAs to be mentioned in the book.
113+
114+
The "guaranteed temporary elimination" rules require more work to teach. It might be better to come up with new rules entirely.
115+
116+
# Drawbacks
117+
[drawbacks]: #drawbacks
118+
119+
In Unsafe code, it is very easy to create unintended temporaries, such as in:
120+
```Rust
121+
unsafe fnf poke(ptr: *mut [u8]) { /* .. */ }
122+
unsafe fn foo(mut a: [u8]) {
123+
let ptr: *mut [u8] = &mut a;
124+
// here, `a` must be copied to a temporary, because
125+
// `poke(ptr)` might access the original.
126+
bar(a, poke(ptr));
127+
}
128+
```
129+
130+
If we make `[u8]` be `Copy`, that would be even easier, because even uses of `poke(ptr);` after the function call could potentially access the supposedly-valid data behind `a`.
131+
132+
And even if it is not as easy, it is possible to accidentally create temporaries in safe code.
133+
134+
Unsized temporaries are dangerous - they can easily cause aborts through stack overflow.
135+
136+
# Alternatives
137+
[alternatives]: #alternatives
138+
139+
Allowing unsized ADT expressions would make unsized structs constructible without using unsafe code, as in:
140+
```Rust
141+
let len_ = s.len();
142+
let p = Box::new(PascalString {
143+
length: len_,
144+
data: *s
145+
});
146+
```
147+
148+
However, without some way to guarantee that this can be done without allocas, that might be a large footgun.
149+
150+
One somewhat-orthogonal proposal that came up was to make `Clone` (and therefore `Copy`) not depend on `Sized`, and to make `[u8]` be `Copy`, by moving the `Self: Sized` bound from the trait to the methods, i.e. using the following declaration:
151+
```Rust
152+
pub trait Clone {
153+
fn clone(&self) -> Self where Self: Sized;
154+
fn clone_from(&mut self, source: &Self) where Self: Sized {
155+
// ...
156+
}
157+
}
158+
```
159+
160+
# Unresolved questions
161+
[unresolved]: #unresolved-questions
162+
163+
How can we mitigate the risk of unintended unsized or large allocas? Note that the problem already exists today with large structs/arrays. A MIR lint against large/variable stack sizes would probably help users avoid these stack overflows. Do we want it in Clippy? rustc?
164+
165+
How do we handle truely-unsized DSTs when we get them? They can theoretically be passed to functions, but they can never be put in temporaries.
166+
167+
See alternatives.

0 commit comments

Comments
 (0)