Skip to content

Commit 8a9ae3f

Browse files
committed
RFC: mem::black_box and mem::clobber
1 parent fd70ea3 commit 8a9ae3f

File tree

1 file changed

+138
-0
lines changed

1 file changed

+138
-0
lines changed

text/0000-bench-utils.md

+138
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
- Feature Name: black_box-and-clobber
2+
- Start Date: 2018-03-12
3+
- RFC PR: (leave this empty)
4+
- Rust Issue: (leave this empty)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
This RFC adds two functions to `core::mem`: `black_box` and `clobber`, which are
10+
mainly useful for writing benchmarks.
11+
12+
# Motivation
13+
[motivation]: #motivation
14+
15+
The `black_box` and `clobber` functions are useful for writing synthetic
16+
benchmarks where, due to the constrained nature of the benchmark, the compiler
17+
is able to perform optimizations that wouldn't otherwise trigger in practice.
18+
19+
The implementation of these functions is backend-specific and requires inline
20+
assembly. Such that if the standard library does not provide them, the users are
21+
required to use brittle workarounds on nightly.
22+
23+
# Guide-level explanation
24+
[guide-level-explanation]: #guide-level-explanation
25+
26+
27+
## `mem::black_box`
28+
29+
The function:
30+
31+
```rust
32+
pub fn black_box<T>(x: T) -> T;
33+
```
34+
35+
prevents the value `x` from being optimized away and flushes pending reads/writes
36+
to memory. It does not prevent optimizations on the expression generating the
37+
value `x` nor on the return value of the function. For
38+
example ([`rust.godbolt.org`](https://godbolt.org/g/YP2GCJ)):
39+
40+
```rust
41+
fn foo(x: i32) -> i32{
42+
mem::black_box(2 + x);
43+
3
44+
}
45+
let a = foo(2);
46+
```
47+
48+
Here, the compiler can simplify the expression `2 + x` into `2 + 2` and then
49+
`4`, but it is not allowed to discard `4`. Instead, it must store `4` into a
50+
register even though it is not used by anything afterwards.
51+
52+
## `mem::clobber`
53+
54+
The function
55+
56+
```rust
57+
pub fn clobber() -> ();
58+
```
59+
60+
flushes all pending writes to memory. Memory managed by block scope objects must
61+
be "escaped" with `black_box` .
62+
63+
Using `mem::{black_box, clobber}` we can benchmark `Vec::push` as follows:
64+
65+
```rust
66+
fn bench_vec_push_back(bench: Bencher) -> BenchResult {
67+
let n = /* large enough number */;
68+
let mut v = Vec::with_capacity(n);
69+
bench.iter(|| {
70+
// Escape the vector pointer:
71+
mem::black_box(v.as_ptr());
72+
v.push_back(42_u8);
73+
// Flush 42 write to memory:
74+
mem::clobber();
75+
})
76+
}
77+
```
78+
# Reference-level explanation
79+
[reference-level-explanation]: #reference-level-explanation
80+
81+
* `mem::black_box(x)`: flushes all pending writes/read to memory and prevents
82+
`x` from being optimized away while still allowing optimizations on the
83+
expression that generates `x`.
84+
* `mem::clobber`: flushes all pending writes to memory.
85+
86+
# Drawbacks
87+
[drawbacks]: #drawbacks
88+
89+
TBD.
90+
91+
# Rationale and alternatives
92+
[alternatives]: #alternatives
93+
94+
An alternative design was proposed during the discussion on
95+
[rust-lang/rfcs/issues/1484](https://github.com/rust-lang/rfcs/issues/1484), in
96+
which the following two functions are provided instead:
97+
98+
```rust
99+
#[inline(always)]
100+
pub fn value_fence<T>(x: T) -> T {
101+
let y = unsafe { (&x as *const T).read_volatile() };
102+
std::mem::forget(x);
103+
y
104+
}
105+
106+
#[inline(always)]
107+
pub fn evaluate_and_drop<T>(x: T) {
108+
unsafe {
109+
let mut y = std::mem::uninitialized();
110+
std::ptr::write_volatile(&mut y as *mut T, x);
111+
drop(y); // not necessary but for clarity
112+
}
113+
}
114+
```
115+
116+
This approach is not pursued in this RFC because these two functions:
117+
118+
* add overhead ([`rust.godbolt.com`](https://godbolt.org/g/aCpPfg)): `volatile`
119+
reads and stores aren't no ops, but the proposed `black_box` and `clobber`
120+
functions are.
121+
* are implementable on stable Rust: while we could add them to `std` they do not
122+
necessarily need to be there.
123+
124+
# Prior art
125+
[prior-art]: #prior-art
126+
127+
These two exact functions are provided in the [`Google
128+
Benchmark`](https://github.com/google/benchmark) C++ library: are called
129+
[`DoNotOptimize`](https://github.com/google/benchmark/blob/61497236ddc0d797a47ef612831fb6ab34dc5c9d/include/benchmark/benchmark.h#L306)
130+
(`black_box`) and
131+
[`ClobberMemory`](https://github.com/google/benchmark/blob/61497236ddc0d797a47ef612831fb6ab34dc5c9d/include/benchmark/benchmark.h#L317).
132+
The `black_box` function with slightly different semantics is provided by the `test` crate:
133+
[`test::black_box`](https://github.com/rust-lang/rust/blob/master/src/libtest/lib.rs#L1551).
134+
135+
# Unresolved questions
136+
[unresolved]: #unresolved-questions
137+
138+
TBD.

0 commit comments

Comments
 (0)