Skip to content

Commit b3f4441

Browse files
committed
RFC: mem::black_box and mem::clobber
1 parent fd70ea3 commit b3f4441

File tree

1 file changed

+141
-0
lines changed

1 file changed

+141
-0
lines changed

text/0000-bench-utils.md

+141
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
- Feature Name: black_box-and-clobber
2+
- Start Date: 2018-03-12
3+
- RFC PR: (leave this empty)
4+
- Rust Issue: (leave this empty)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
This RFC adds two functions to `core::mem`: `black_box` and `clobber`, which are
10+
mainly useful for writing benchmarks.
11+
12+
# Motivation
13+
[motivation]: #motivation
14+
15+
The `black_box` and `clobber` functions are useful for writing synthetic
16+
benchmarks where, due to the constrained nature of the benchmark, the compiler
17+
is able to perform optimizations that it wouldn't be able to otherwise perform
18+
in practice.
19+
20+
The implementation of these functions is backend-specific and requires inline
21+
assembly. Such that if the standard library does not provide them, the users are
22+
required to use brittle workarounds on nightly.
23+
24+
# Guide-level explanation
25+
[guide-level-explanation]: #guide-level-explanation
26+
27+
28+
## `mem::black_box`
29+
30+
The function:
31+
32+
```rust
33+
pub fn black_box<T>(x: T) -> T;
34+
```
35+
36+
prevents the value `x` from being optimized away, flushing pending reads/writes
37+
to memory. It does not prevent optimizations on the expression generating the
38+
value `x` nor on the return value of the function. For
39+
example ([`rust.godbolt.org`](https://godbolt.org/g/YP2GCJ)):
40+
41+
```rust
42+
fn foo(x: i32) -> i32{
43+
mem::black_box(2 + x);
44+
3
45+
}
46+
let a = foo(2);
47+
```
48+
49+
Here, the compiler can simplify the expression `2 + x` into `2 + 2` and then
50+
`4`, but it is not allowed to discard `4`. Instead, it must store `4` into a
51+
register even though it is not used by anything afterwards.
52+
53+
## `mem::clobber`
54+
55+
The function
56+
57+
```rust
58+
pub fn clobber() -> ();
59+
```
60+
61+
flushes all pending writes to memory. Memory managed by block scope objects must
62+
be "escaped" with `black_box`
63+
64+
## Benchmarking `Vec::push`
65+
66+
With `mem::{black_box, clobber}` we can benchmark `Vec::push` as follows:
67+
68+
```rust
69+
fn bench_vec_push_back(bench: Bencher) -> BenchResult {
70+
let n = /* large enough number */;
71+
let mut v = Vec::with_capacity(n);
72+
bench.iter(|| {
73+
// Escape the vector pointer:
74+
mem::black_box(v.as_ptr());
75+
v.push_back(42_u8);
76+
// Flush 42 write to memory:
77+
mem::clobber();
78+
})
79+
}
80+
```
81+
# Reference-level explanation
82+
[reference-level-explanation]: #reference-level-explanation
83+
84+
* `mem::black_box(x)`: flushes all pending writes/read to memory and prevents
85+
`x` from being optimized away while still allowing optimizations on the
86+
expression that generates `x`.
87+
* `mem::clobber`: flushes all pending writes to memory.
88+
89+
# Drawbacks
90+
[drawbacks]: #drawbacks
91+
92+
TBD.
93+
94+
# Rationale and alternatives
95+
[alternatives]: #alternatives
96+
97+
An alternative design was proposed during the discussion on
98+
[rust-lang/rfcs/issues/1484](https://github.com/rust-lang/rfcs/issues/1484), in
99+
which the following two functions are provided instead:
100+
101+
```rust
102+
#[inline(always)]
103+
pub fn value_fence<T>(x: T) -> T {
104+
let y = unsafe { (&x as *const T).read_volatile() };
105+
std::mem::forget(x);
106+
y
107+
}
108+
109+
#[inline(always)]
110+
pub fn evaluate_and_drop<T>(x: T) {
111+
unsafe {
112+
let mut y = std::mem::uninitialized();
113+
std::ptr::write_volatile(&mut y as *mut T, x);
114+
drop(y); // not necessary but for clarity
115+
}
116+
}
117+
```
118+
119+
This approach is not pursued in this RFC because these two functions:
120+
121+
* add overhead ([`rust.godbolt.com`](https://godbolt.org/g/aCpPfg)): `volatile`
122+
reads and stores aren't no ops, but the proposed `black_box` and `clobber`
123+
functions are.
124+
* are implementable on stable Rust: while we could add them to `std` they do not
125+
necessarily need to be there.
126+
127+
# Prior art
128+
[prior-art]: #prior-art
129+
130+
These two exact functions are provided in the [`Google
131+
Benchmark`](https://github.com/google/benchmark) C++ library: are called
132+
[`DoNotOptimize`](https://github.com/google/benchmark/blob/61497236ddc0d797a47ef612831fb6ab34dc5c9d/include/benchmark/benchmark.h#L306)
133+
(`black_box`) and
134+
[`ClobberMemory`](https://github.com/google/benchmark/blob/61497236ddc0d797a47ef612831fb6ab34dc5c9d/include/benchmark/benchmark.h#L317).
135+
The `black_box` function with slightly different semantics is provided by the `test` crate:
136+
[`test::black_box`](https://github.com/rust-lang/rust/blob/master/src/libtest/lib.rs#L1551).
137+
138+
# Unresolved questions
139+
[unresolved]: #unresolved-questions
140+
141+
TBD.

0 commit comments

Comments
 (0)