|
| 1 | +- Feature Name: unsized_locals |
| 2 | +- Start Date: 2017-02-11 |
| 3 | +- RFC PR: (leave this empty) |
| 4 | +- Rust Issue: (leave this empty) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +Allow for local variables, function arguments, and some expressions to have an unsized type, and implement it by storing the temporaries in variably-sized allocas. |
| 10 | + |
| 11 | +Have repeat expressions with a length that captures local variables be such an expression, returning an `[T]` slice. |
| 12 | + |
| 13 | +Provide some optimization guarantees that unnecessary temporaries will not create unnecessary allocas. |
| 14 | + |
| 15 | +# Motivation |
| 16 | +[motivation]: #motivation |
| 17 | + |
| 18 | +There are 2 motivations for this RFC: |
| 19 | + |
| 20 | +1) Passing unsized values, such as trait objects, to functions by value is often desired. Currently, this must be done through a `Box<T>` with an unnecessary allocation. |
| 21 | + |
| 22 | +One particularly common example is passing closures that consume their environment without using monomorphization. One would like for this code to work: |
| 23 | + |
| 24 | +```Rust |
| 25 | +fn takes_closure(f: FnOnce()) { f(); } |
| 26 | +``` |
| 27 | + |
| 28 | +But today you have to use a hack, such as taking a `Box<FnBox<()>>`. |
| 29 | + |
| 30 | +2) Allocating a runtime-sized variable on the stack is important for good performance in some use-cases - see RFC #1808, which this is intended to supersede. |
| 31 | + |
| 32 | +# Detailed design |
| 33 | +[design]: #detailed-design |
| 34 | + |
| 35 | +## Unsized Rvalues - language |
| 36 | + |
| 37 | +Remove the rule that requires all locals and rvalues to have a sized type. Instead, require the following: |
| 38 | +a) The following expressions must always return a Sized type: |
| 39 | + a1) Function calls, method calls, operator expressions |
| 40 | + - implementing unsized return values for function calls would require the *called function* to do the alloca in our stack frame. |
| 41 | + a2) ADT expressions |
| 42 | + - see alternatives |
| 43 | + a3) cast expressions |
| 44 | + - this seems like an implementation simplicity thing. These can only be trivial casts. |
| 45 | +b) The RHS of assignment expressions must always have a Sized type. |
| 46 | + - Assigning an unsized type is impossible because we don't know how much memory is available at the destination. This applies to ExprAssign assignments and not to StmtLet let-statements. |
| 47 | + |
| 48 | +This also allows passing unsized values to functions, with the ABI being as if a `&move` pointer was passed (a `(by-move-data, extra)` pair). This also means that methods taking `self` by value are object-safe, though vtable shims are sometimes needed to translate the ABI (as the callee-side intentionally does not pass `extra` to the fn in the vtable, no vtable shim is needed if the vtable function already takes its argument indirectly). |
| 49 | + |
| 50 | +For example: |
| 51 | + |
| 52 | +```Rust |
| 53 | +struct StringData { |
| 54 | + len: usize, |
| 55 | + data: [u8], |
| 56 | +} |
| 57 | + |
| 58 | +fn foo(s1: Box<StringData>, s2: Box<StringData>, cond: bool) { |
| 59 | + // this creates a VLA copy of either `s1.1` or `s2.1` on |
| 60 | + // the stack. |
| 61 | + let mut s = if cond { |
| 62 | + s1.1 |
| 63 | + } else { |
| 64 | + s2.1 |
| 65 | + }; |
| 66 | + drop(s1); |
| 67 | + drop(s2); |
| 68 | + foo(s); |
| 69 | +} |
| 70 | + |
| 71 | +fn example(f: for<'a> FnOnce(&'a X<'a>)) { |
| 72 | + let x = X::new(); |
| 73 | + f(x); // aka FnOnce::call_once(f, (x,)); |
| 74 | +} |
| 75 | +``` |
| 76 | + |
| 77 | +## VLA expressions |
| 78 | + |
| 79 | +Allow repeat expressions to capture variables from their surrounding environment. If a repeat expression captures such a variable, it has type `[T]` with the length being evaluated at run-time. If the repeat expression does not capture any variable, the length is evaluated at compile-time. For example: |
| 80 | +```Rust |
| 81 | +extern "C" { |
| 82 | + fn random() -> usize; |
| 83 | +} |
| 84 | + |
| 85 | +fn foo(n: usize) { |
| 86 | + let x = [0u8; n]; // x: [u8] |
| 87 | + let x = [0u8; n + (random() % 100)]; // x: [u8] |
| 88 | + let x = [0u8; 42]; // x: [u8; 42], like today |
| 89 | + let x = [0u8; random() % 100]; //~ ERROR constant evaluation error |
| 90 | +} |
| 91 | +``` |
| 92 | + |
| 93 | +"captures a variable" - as in RFC #1558 - is used as the condition for making the return be `[T]` because it is simple, easy to understand, and introduces no type-checking complications. |
| 94 | + |
| 95 | +## Unsized Rvalues - MIR |
| 96 | + |
| 97 | +The way this is implemented in MIR is that operands, rvalues, and temporaries are allowed to be unsized. An unsized operand is always "by-ref". Unsized rvalues are either a `Use` or a `Repeat` and both can be translated easily. |
| 98 | + |
| 99 | +Unsized locals can never be reassigned within a scope. When first assigning to an unsized local, a stack allocation is made with the correct size. |
| 100 | + |
| 101 | +MIR construction remains unchanged. |
| 102 | + |
| 103 | +## Guaranteed Temporary Elision |
| 104 | + |
| 105 | +MIR likes to create lots of temporaries for OOE reason. We should optimize them out in a guaranteed way in these cases (FIXME: extend these guarantees to locals aka NRVO?). |
| 106 | + |
| 107 | +TODO: add description of problem & solution. |
| 108 | + |
| 109 | +# How We Teach This |
| 110 | +[teach]: #how-we-teach-this |
| 111 | + |
| 112 | +Passing arguments to functions by value should not be too complicated to teach. I would like VLAs to be mentioned in the book. |
| 113 | + |
| 114 | +The "guaranteed temporary elimination" rules require more work to teach. It might be better to come up with new rules entirely. |
| 115 | + |
| 116 | +# Drawbacks |
| 117 | +[drawbacks]: #drawbacks |
| 118 | + |
| 119 | +In Unsafe code, it is very easy to create unintended temporaries, such as in: |
| 120 | +```Rust |
| 121 | +unsafe fnf poke(ptr: *mut [u8]) { /* .. */ } |
| 122 | +unsafe fn foo(mut a: [u8]) { |
| 123 | + let ptr: *mut [u8] = &mut a; |
| 124 | + // here, `a` must be copied to a temporary, because |
| 125 | + // `poke(ptr)` might access the original. |
| 126 | + bar(a, poke(ptr)); |
| 127 | +} |
| 128 | +``` |
| 129 | + |
| 130 | +If we make `[u8]` be `Copy`, that would be even easier, because even uses of `poke(ptr);` after the function call could potentially access the supposedly-valid data behind `a`. |
| 131 | + |
| 132 | +And even if it is not as easy, it is possible to accidentally create temporaries in safe code. |
| 133 | + |
| 134 | +Unsized temporaries are dangerous - they can easily cause aborts through stack overflow. |
| 135 | + |
| 136 | +# Alternatives |
| 137 | +[alternatives]: #alternatives |
| 138 | + |
| 139 | +Allowing unsized ADT expressions would make unsized structs constructible without using unsafe code, as in: |
| 140 | +```Rust |
| 141 | +let len_ = s.len(); |
| 142 | +let p = Box::new(PascalString { |
| 143 | + length: len_, |
| 144 | + data: *s |
| 145 | +}); |
| 146 | +``` |
| 147 | + |
| 148 | +However, without some way to guarantee that this can be done without allocas, that might be a large footgun. |
| 149 | + |
| 150 | +One somewhat-orthogonal proposal that came up was to make `Clone` (and therefore `Copy`) not depend on `Sized`, and to make `[u8]` be `Copy`, by moving the `Self: Sized` bound from the trait to the methods, i.e. using the following declaration: |
| 151 | +```Rust |
| 152 | +pub trait Clone { |
| 153 | + fn clone(&self) -> Self where Self: Sized; |
| 154 | + fn clone_from(&mut self, source: &Self) where Self: Sized { |
| 155 | + // ... |
| 156 | + } |
| 157 | +} |
| 158 | +``` |
| 159 | + |
| 160 | +# Unresolved questions |
| 161 | +[unresolved]: #unresolved-questions |
| 162 | + |
| 163 | +How can we mitigate the risk of unintended unsized or large allocas? Note that the problem already exists today with large structs/arrays. A MIR lint against large/variable stack sizes would probably help users avoid these stack overflows. Do we want it in Clippy? rustc? |
| 164 | + |
| 165 | +How do we handle truely-unsized DSTs when we get them? They can theoretically be passed to functions, but they can never be put in temporaries. |
| 166 | + |
| 167 | +See alternatives. |
0 commit comments