Skip to content

Commit ec42fd0

Browse files
committed
RFC: mem::black_box and mem::clobber
1 parent fd70ea3 commit ec42fd0

File tree

1 file changed

+157
-0
lines changed

1 file changed

+157
-0
lines changed

text/0000-bench-utils.md

+157
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
- Feature Name: black_box-and-clobber
2+
- Start Date: 2018-03-12
3+
- RFC PR: (leave this empty)
4+
- Rust Issue: (leave this empty)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
This RFC adds two functions to `core::mem`: `black_box` and `clobber`, which are
10+
mainly useful for writing benchmarks.
11+
12+
# Motivation
13+
[motivation]: #motivation
14+
15+
The `black_box` and `clobber` functions are useful for writing synthetic
16+
benchmarks where, due to the constrained nature of the benchmark, the compiler
17+
is able to perform optimizations that wouldn't otherwise trigger in practice.
18+
19+
The implementation of these functions is backend-specific and requires inline
20+
assembly. Such that if the standard library does not provide them, the users are
21+
required to use brittle workarounds on nightly.
22+
23+
# Guide-level explanation
24+
[guide-level-explanation]: #guide-level-explanation
25+
26+
27+
## `mem::black_box`
28+
29+
The function:
30+
31+
```rust
32+
pub fn black_box<T>(x: T) -> T;
33+
```
34+
35+
prevents the value `x` from being optimized away and flushes pending reads/writes
36+
to memory. It does not prevent optimizations on the expression generating the
37+
value `x` nor on the return value of the function. For
38+
example ([`rust.godbolt.org`](https://godbolt.org/g/YP2GCJ)):
39+
40+
```rust
41+
fn foo(x: i32) -> i32{
42+
mem::black_box(2 + x);
43+
3
44+
}
45+
let a = foo(2);
46+
```
47+
48+
Here, the compiler can simplify the expression `2 + x` into `2 + 2` and then
49+
`4`, but it is not allowed to discard `4`. Instead, it must store `4` into a
50+
register even though it is not used by anything afterwards.
51+
52+
## `mem::clobber`
53+
54+
The function
55+
56+
```rust
57+
pub fn clobber() -> ();
58+
```
59+
60+
flushes all pending writes to memory. Memory managed by block scope objects must
61+
be "escaped" with `black_box` .
62+
63+
Using `mem::{black_box, clobber}` we can benchmark `Vec::push` as follows:
64+
65+
```rust
66+
fn bench_vec_push_back(bench: Bencher) -> BenchResult {
67+
let n = /* large enough number */;
68+
let mut v = Vec::with_capacity(n);
69+
bench.iter(|| {
70+
// Escape the vector pointer:
71+
mem::black_box(v.as_ptr());
72+
v.push(42_u8);
73+
// Flush the write of 42 back to memory:
74+
mem::clobber();
75+
})
76+
}
77+
```
78+
79+
To measure the cost of `Vec::push`, we pre-allocate the `Vec` to avoid
80+
re-allocating memory during the iteration. Since we are allocating a vector,
81+
writing values to it, and dropping it, LLVM is actually able of optimize code
82+
like this away ([`rust.godbolt.org`](https://godbolt.org/g/QMs77J)).
83+
84+
To make this a suitable benchmark, we use `mem::clobber()` to force LLVM to
85+
write `42` back to memory. Note, however, that if we try this LLVM still manages
86+
to optimize our benchmark away ([`rust.godbolt.org`](https://godbolt.org/g/r9K2Bk))!
87+
88+
The problem is that the memory of our vector is managed by an object in block
89+
scope. That is, since we haven't shared this memory with anything, no other code
90+
in our program can have a pointer to it, so LLVM does not need to schedule any
91+
writes to this memory, and there are no pending memory writes to flush!
92+
93+
What we must do is tell LLVM that something might also have a pointer to this
94+
memory, and this is what we use `mem::black_box` for in this case
95+
([`rust.godbolt.or`](https://godbolt.org/g/3wBxay)).
96+
97+
# Reference-level explanation
98+
[reference-level-explanation]: #reference-level-explanation
99+
100+
* `mem::black_box(x)`: flushes all pending writes/read to memory and prevents
101+
`x` from being optimized away while still allowing optimizations on the
102+
expression that generates `x`.
103+
* `mem::clobber`: flushes all pending writes to memory.
104+
105+
# Drawbacks
106+
[drawbacks]: #drawbacks
107+
108+
TBD.
109+
110+
# Rationale and alternatives
111+
[alternatives]: #alternatives
112+
113+
An alternative design was proposed during the discussion on
114+
[rust-lang/rfcs/issues/1484](https://github.com/rust-lang/rfcs/issues/1484), in
115+
which the following two functions are provided instead:
116+
117+
```rust
118+
#[inline(always)]
119+
pub fn value_fence<T>(x: T) -> T {
120+
let y = unsafe { (&x as *const T).read_volatile() };
121+
std::mem::forget(x);
122+
y
123+
}
124+
125+
#[inline(always)]
126+
pub fn evaluate_and_drop<T>(x: T) {
127+
unsafe {
128+
let mut y = std::mem::uninitialized();
129+
std::ptr::write_volatile(&mut y as *mut T, x);
130+
drop(y); // not necessary but for clarity
131+
}
132+
}
133+
```
134+
135+
This approach is not pursued in this RFC because these two functions:
136+
137+
* add overhead ([`rust.godbolt.org`](https://godbolt.org/g/aCpPfg)): `volatile`
138+
reads and stores aren't no ops, but the proposed `black_box` and `clobber`
139+
functions are.
140+
* are implementable on stable Rust: while we could add them to `std` they do not
141+
necessarily need to be there.
142+
143+
# Prior art
144+
[prior-art]: #prior-art
145+
146+
These two exact functions are provided in the [`Google
147+
Benchmark`](https://github.com/google/benchmark) C++ library: are called
148+
[`DoNotOptimize`](https://github.com/google/benchmark/blob/61497236ddc0d797a47ef612831fb6ab34dc5c9d/include/benchmark/benchmark.h#L306)
149+
(`black_box`) and
150+
[`ClobberMemory`](https://github.com/google/benchmark/blob/61497236ddc0d797a47ef612831fb6ab34dc5c9d/include/benchmark/benchmark.h#L317).
151+
The `black_box` function with slightly different semantics is provided by the `test` crate:
152+
[`test::black_box`](https://github.com/rust-lang/rust/blob/master/src/libtest/lib.rs#L1551).
153+
154+
# Unresolved questions
155+
[unresolved]: #unresolved-questions
156+
157+
TBD.

0 commit comments

Comments
 (0)