Commit b954d4d
committed
Rollup merge of rust-lang#52051 - scottmcm:swap-directly, r=alexcrichton
mem::swap the obvious way for types smaller than the SIMD optimization's block size
LLVM isn't able to remove the alloca for the unaligned block in the post-SIMD tail in some cases, so doing this helps SRoA work in cases where it currently doesn't. Found in the `replace_with` RFC discussion.
Examples of the improvements:
<details>
<summary>swapping `[u16; 3]` takes 1/3 fewer instructions and no stackalloc</summary>
```rust
type Demo = [u16; 3];
pub fn swap_demo(x: &mut Demo, y: &mut Demo) {
std::mem::swap(x, y);
}
```
nightly:
```asm
_ZN4blah9swap_demo17ha1732a9b71393a7eE:
.seh_proc _ZN4blah9swap_demo17ha1732a9b71393a7eE
sub rsp, 32
.seh_stackalloc 32
.seh_endprologue
movzx eax, word ptr [rcx + 4]
mov word ptr [rsp + 4], ax
mov eax, dword ptr [rcx]
mov dword ptr [rsp], eax
movzx eax, word ptr [rdx + 4]
mov word ptr [rcx + 4], ax
mov eax, dword ptr [rdx]
mov dword ptr [rcx], eax
movzx eax, word ptr [rsp + 4]
mov word ptr [rdx + 4], ax
mov eax, dword ptr [rsp]
mov dword ptr [rdx], eax
add rsp, 32
ret
.seh_handlerdata
.section .text,"xr",one_only,_ZN4blah9swap_demo17ha1732a9b71393a7eE
.seh_endproc
```
this PR:
```asm
_ZN4blah9swap_demo17ha1732a9b71393a7eE:
mov r8d, dword ptr [rcx]
movzx r9d, word ptr [rcx + 4]
movzx eax, word ptr [rdx + 4]
mov word ptr [rcx + 4], ax
mov eax, dword ptr [rdx]
mov dword ptr [rcx], eax
mov word ptr [rdx + 4], r9w
mov dword ptr [rdx], r8d
ret
```
</details>
<details>
<summary>`replace_with` optimizes down much better</summary>
Inspired by rust-lang/rfcs#2490,
```rust
fn replace_with<T, F>(x: &mut Option<T>, f: F)
where F: FnOnce(Option<T>) -> Option<T>
{
*x = f(x.take());
}
pub fn inc_opt(mut x: &mut Option<i32>) {
replace_with(&mut x, |i| i.map(|j| j + 1));
}
```
Rust 1.26.0:
```asm
_ZN4blah7inc_opt17heb0acb64c51777cfE:
mov rax, qword ptr [rcx]
movabs r8, 4294967296
add r8, rax
shl rax, 32
movabs rdx, -4294967296
and rdx, r8
xor r8d, r8d
test rax, rax
cmove rdx, rax
setne r8b
or rdx, r8
mov qword ptr [rcx], rdx
ret
```
Nightly (better thanks to ScalarPair, maybe?):
```asm
_ZN4blah7inc_opt17h66df690be0b5899dE:
mov r8, qword ptr [rcx]
mov rdx, r8
shr rdx, 32
xor eax, eax
test r8d, r8d
setne al
add edx, 1
mov dword ptr [rcx], eax
mov dword ptr [rcx + 4], edx
ret
```
This PR:
```asm
_ZN4blah7inc_opt17h1426dc215ecbdb19E:
xor eax, eax
cmp dword ptr [rcx], 0
setne al
mov dword ptr [rcx], eax
add dword ptr [rcx + 4], 1
ret
```
Where that add is beautiful -- using an addressing mode to not even need to explicitly go through a register -- and the remaining imperfection is well-known (rust-lang#49420 (comment)).
</details>3 files changed
+41
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
638 | 638 | | |
639 | 639 | | |
640 | 640 | | |
641 | | - | |
| 641 | + | |
642 | 642 | | |
643 | 643 | | |
644 | 644 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
187 | 187 | | |
188 | 188 | | |
189 | 189 | | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
190 | 203 | | |
191 | 204 | | |
192 | 205 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
0 commit comments