From ec42fd0b7fbc0448d1794bfecaa43621eab76814 Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Mon, 12 Mar 2018 16:18:19 +0100 Subject: [PATCH 01/22] RFC: mem::black_box and mem::clobber --- text/0000-bench-utils.md | 157 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 157 insertions(+) create mode 100644 text/0000-bench-utils.md diff --git a/text/0000-bench-utils.md b/text/0000-bench-utils.md new file mode 100644 index 00000000000..cca5f1c6656 --- /dev/null +++ b/text/0000-bench-utils.md @@ -0,0 +1,157 @@ +- Feature Name: black_box-and-clobber +- Start Date: 2018-03-12 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary +[summary]: #summary + +This RFC adds two functions to `core::mem`: `black_box` and `clobber`, which are +mainly useful for writing benchmarks. + +# Motivation +[motivation]: #motivation + +The `black_box` and `clobber` functions are useful for writing synthetic +benchmarks where, due to the constrained nature of the benchmark, the compiler +is able to perform optimizations that wouldn't otherwise trigger in practice. + +The implementation of these functions is backend-specific and requires inline +assembly. Such that if the standard library does not provide them, the users are +required to use brittle workarounds on nightly. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + + +## `mem::black_box` + +The function: + +```rust +pub fn black_box(x: T) -> T; +``` + +prevents the value `x` from being optimized away and flushes pending reads/writes +to memory. It does not prevent optimizations on the expression generating the +value `x` nor on the return value of the function. For +example ([`rust.godbolt.org`](https://godbolt.org/g/YP2GCJ)): + +```rust +fn foo(x: i32) -> i32{ + mem::black_box(2 + x); + 3 +} +let a = foo(2); +``` + +Here, the compiler can simplify the expression `2 + x` into `2 + 2` and then +`4`, but it is not allowed to discard `4`. Instead, it must store `4` into a +register even though it is not used by anything afterwards. + +## `mem::clobber` + +The function + +```rust +pub fn clobber() -> (); +``` + +flushes all pending writes to memory. Memory managed by block scope objects must +be "escaped" with `black_box` . + +Using `mem::{black_box, clobber}` we can benchmark `Vec::push` as follows: + +```rust +fn bench_vec_push_back(bench: Bencher) -> BenchResult { + let n = /* large enough number */; + let mut v = Vec::with_capacity(n); + bench.iter(|| { + // Escape the vector pointer: + mem::black_box(v.as_ptr()); + v.push(42_u8); + // Flush the write of 42 back to memory: + mem::clobber(); + }) +} +``` + +To measure the cost of `Vec::push`, we pre-allocate the `Vec` to avoid +re-allocating memory during the iteration. Since we are allocating a vector, +writing values to it, and dropping it, LLVM is actually able of optimize code +like this away ([`rust.godbolt.org`](https://godbolt.org/g/QMs77J)). + +To make this a suitable benchmark, we use `mem::clobber()` to force LLVM to +write `42` back to memory. Note, however, that if we try this LLVM still manages +to optimize our benchmark away ([`rust.godbolt.org`](https://godbolt.org/g/r9K2Bk))! + +The problem is that the memory of our vector is managed by an object in block +scope. That is, since we haven't shared this memory with anything, no other code +in our program can have a pointer to it, so LLVM does not need to schedule any +writes to this memory, and there are no pending memory writes to flush! + +What we must do is tell LLVM that something might also have a pointer to this +memory, and this is what we use `mem::black_box` for in this case +([`rust.godbolt.or`](https://godbolt.org/g/3wBxay)). + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +* `mem::black_box(x)`: flushes all pending writes/read to memory and prevents + `x` from being optimized away while still allowing optimizations on the + expression that generates `x`. +* `mem::clobber`: flushes all pending writes to memory. + +# Drawbacks +[drawbacks]: #drawbacks + +TBD. + +# Rationale and alternatives +[alternatives]: #alternatives + +An alternative design was proposed during the discussion on +[rust-lang/rfcs/issues/1484](https://github.com/rust-lang/rfcs/issues/1484), in +which the following two functions are provided instead: + +```rust +#[inline(always)] +pub fn value_fence(x: T) -> T { + let y = unsafe { (&x as *const T).read_volatile() }; + std::mem::forget(x); + y +} + +#[inline(always)] +pub fn evaluate_and_drop(x: T) { + unsafe { + let mut y = std::mem::uninitialized(); + std::ptr::write_volatile(&mut y as *mut T, x); + drop(y); // not necessary but for clarity + } +} +``` + +This approach is not pursued in this RFC because these two functions: + +* add overhead ([`rust.godbolt.org`](https://godbolt.org/g/aCpPfg)): `volatile` + reads and stores aren't no ops, but the proposed `black_box` and `clobber` + functions are. +* are implementable on stable Rust: while we could add them to `std` they do not + necessarily need to be there. + +# Prior art +[prior-art]: #prior-art + +These two exact functions are provided in the [`Google +Benchmark`](https://github.com/google/benchmark) C++ library: are called +[`DoNotOptimize`](https://github.com/google/benchmark/blob/61497236ddc0d797a47ef612831fb6ab34dc5c9d/include/benchmark/benchmark.h#L306) +(`black_box`) and +[`ClobberMemory`](https://github.com/google/benchmark/blob/61497236ddc0d797a47ef612831fb6ab34dc5c9d/include/benchmark/benchmark.h#L317). +The `black_box` function with slightly different semantics is provided by the `test` crate: +[`test::black_box`](https://github.com/rust-lang/rust/blob/master/src/libtest/lib.rs#L1551). + +# Unresolved questions +[unresolved]: #unresolved-questions + +TBD. From 68b4dde8c709a07a8dcdbae9703e26ec6492b10c Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Tue, 13 Mar 2018 10:28:00 +0100 Subject: [PATCH 02/22] black_box stores back to memory, not registers --- text/0000-bench-utils.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/text/0000-bench-utils.md b/text/0000-bench-utils.md index cca5f1c6656..b31bf9a7898 100644 --- a/text/0000-bench-utils.md +++ b/text/0000-bench-utils.md @@ -45,9 +45,9 @@ fn foo(x: i32) -> i32{ let a = foo(2); ``` -Here, the compiler can simplify the expression `2 + x` into `2 + 2` and then -`4`, but it is not allowed to discard `4`. Instead, it must store `4` into a -register even though it is not used by anything afterwards. +In the call to `foo(2)` the compiler is allowed to simplify the expression `2 + x` +down to `4`, but `4` must be stored into memory even though it is not read by +anything aftewards. ## `mem::clobber` From 8eae73749b5bd5a7b7e73e7037c29286e1a6823f Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Wed, 29 Aug 2018 12:01:56 +0200 Subject: [PATCH 03/22] update with the memory model discussion --- text/0000-bench-utils.md | 166 +++++++++++++++++++++++++-------------- 1 file changed, 107 insertions(+), 59 deletions(-) diff --git a/text/0000-bench-utils.md b/text/0000-bench-utils.md index b31bf9a7898..ae64276d30f 100644 --- a/text/0000-bench-utils.md +++ b/text/0000-bench-utils.md @@ -1,4 +1,4 @@ -- Feature Name: black_box-and-clobber +- Feature Name: black_box - Start Date: 2018-03-12 - RFC PR: (leave this empty) - Rust Issue: (leave this empty) @@ -6,40 +6,47 @@ # Summary [summary]: #summary -This RFC adds two functions to `core::mem`: `black_box` and `clobber`, which are -mainly useful for writing benchmarks. +This RFC adds one function, `core::hint::black_box`, which is a hint to the +optimizer to disable certain compiler optimizations. # Motivation [motivation]: #motivation -The `black_box` and `clobber` functions are useful for writing synthetic -benchmarks where, due to the constrained nature of the benchmark, the compiler -is able to perform optimizations that wouldn't otherwise trigger in practice. +A tool for preventing compiler optimizations is widely useful. One application +is writing synthetic benchmarks, where, due to the constrained nature of the +benchmark, the compiler is able to perform optimizations that wouldn't otherwise +trigger in practice. Another application is writing constant time code, where it +is undesirable for the compiler to optimize certain operations depending on the +context in which they are executed. -The implementation of these functions is backend-specific and requires inline -assembly. Such that if the standard library does not provide them, the users are -required to use brittle workarounds on nightly. +The implementation of this function is backend-specific and currently requires +inline assembly. No viable alternative is available in stable Rust. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -## `mem::black_box` +## `hint::black_box` The function: ```rust -pub fn black_box(x: T) -> T; +/// An _unknown_ function that returns `x`. +pub unsafe fn black_box(x: T) -> T; ``` -prevents the value `x` from being optimized away and flushes pending reads/writes -to memory. It does not prevent optimizations on the expression generating the -value `x` nor on the return value of the function. For -example ([`rust.godbolt.org`](https://godbolt.org/g/YP2GCJ)): +is an _unknown_ function, that is, a function that the compiler cannot make any +assumptions about. It can potentially use `x` in any possible valid way that +`unsafe` Rust code is allowed to, and requires the compiler to be maximally +pessimistic in terms of optimizations. The compiler is still allowed to optimize +the expression generating `x`. This function returns `x` and is a no-op in the +virtual machine. + +For example ([`rust.godbolt.org`](https://godbolt.org/g/YP2GCJ)): ```rust fn foo(x: i32) -> i32{ - mem::black_box(2 + x); + hint::black_box(2 + x); 3 } let a = foo(2); @@ -47,60 +54,83 @@ let a = foo(2); In the call to `foo(2)` the compiler is allowed to simplify the expression `2 + x` down to `4`, but `4` must be stored into memory even though it is not read by -anything aftewards. +anything afterwards because `black_box` could try to read it. -## `mem::clobber` +### Benchmarking `Vec::push` -The function +The `hint::black_box` is useful for producing synthetic benchmarks that more +accurately represent the behavior of a real application. In the following +snippet, the function `bench` executes `Vec::push` 4 times in a loop: ```rust -pub fn clobber() -> (); -``` - -flushes all pending writes to memory. Memory managed by block scope objects must -be "escaped" with `black_box` . - -Using `mem::{black_box, clobber}` we can benchmark `Vec::push` as follows: +fn push_cap(v: &mut Vec) { + for i in 0..4 { + v.push(i); + } +} -```rust -fn bench_vec_push_back(bench: Bencher) -> BenchResult { - let n = /* large enough number */; - let mut v = Vec::with_capacity(n); - bench.iter(|| { - // Escape the vector pointer: - mem::black_box(v.as_ptr()); - v.push(42_u8); - // Flush the write of 42 back to memory: - mem::clobber(); - }) +pub fn bench_push() -> Duration { + let mut v = Vec::with_capacity(4); + let now = Instant::now(); + push_cap(&mut v); + now.elapsed() } ``` -To measure the cost of `Vec::push`, we pre-allocate the `Vec` to avoid -re-allocating memory during the iteration. Since we are allocating a vector, -writing values to it, and dropping it, LLVM is actually able of optimize code -like this away ([`rust.godbolt.org`](https://godbolt.org/g/QMs77J)). +Here, we allocate the `Vec`, push into it without growing its capacity, and drop +it, without ever using it for anything. If we look at the assembly +(https://rust.godbolt.org/z/wDckJF): + + +```asm +example::bench_push: + sub rsp, 24 + call std::time::Instant::now@PLT + mov qword ptr [rsp + 8], rax + mov qword ptr [rsp + 16], rdx + lea rdi, [rsp + 8] + call std::time::Instant::elapsed@PLT + add rsp, 24 + ret +``` -To make this a suitable benchmark, we use `mem::clobber()` to force LLVM to -write `42` back to memory. Note, however, that if we try this LLVM still manages -to optimize our benchmark away ([`rust.godbolt.org`](https://godbolt.org/g/r9K2Bk))! +it is pretty amazing: LLVM has actually managed to completely optimize the `Vec` +allocation and call to `push_cap` away! In our real application, we would +probably use the vector for something, preventing all of these optimizations +from triggering, but in this synthetic benchmark LLVM optimizations are +producing a benchmark that won't tell us anything about the cost of `Vec::push`. -The problem is that the memory of our vector is managed by an object in block -scope. That is, since we haven't shared this memory with anything, no other code -in our program can have a pointer to it, so LLVM does not need to schedule any -writes to this memory, and there are no pending memory writes to flush! +We can use `hint::black_box` to create a more realistic synthetic benchmark +(https://rust.godbolt.org/z/CeXmxN): -What we must do is tell LLVM that something might also have a pointer to this -memory, and this is what we use `mem::black_box` for in this case -([`rust.godbolt.or`](https://godbolt.org/g/3wBxay)). +```rust +fn push_cap(v: &mut Vec) { + for i in 0..4 { + black_box(v.as_ptr()); + v.push(black_box(i)); + black_box(v.as_ptr()); + } +} +``` + +that prevents LLVM from assuming anything about the vector across the calls to +`Vec::push`. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -* `mem::black_box(x)`: flushes all pending writes/read to memory and prevents - `x` from being optimized away while still allowing optimizations on the - expression that generates `x`. -* `mem::clobber`: flushes all pending writes to memory. +The + +``` +mod core::hint { + /// An _unknown_ unsafe function that returns `x`. + pub unsafe fn black_box(x: T) -> T; +} +``` + +is an _unknown_ `unsafe` function that can perform any valid operation on `x` +that `unsafe` Rust is allowed to perform. This function returns `x` and is a +no-op in the virtual machine. # Drawbacks [drawbacks]: #drawbacks @@ -110,6 +140,23 @@ TBD. # Rationale and alternatives [alternatives]: #alternatives +Further rationale influencing this design is available in +https://github.com/nikomatsakis/rust-memory-model/issues/45 + +## `clobber` + +A previous version of this RFC also provided a `clobber` function: + +```rust +/// Flushes all pending writes to memory. +pub fn clobber() -> (); +``` + +In https://github.com/nikomatsakis/rust-memory-model/issues/45 it was realized +that such a function cannot work properly within Rust's memory model. + +## `value_fence` / `evaluate_and_drop` + An alternative design was proposed during the discussion on [rust-lang/rfcs/issues/1484](https://github.com/rust-lang/rfcs/issues/1484), in which the following two functions are provided instead: @@ -118,14 +165,14 @@ which the following two functions are provided instead: #[inline(always)] pub fn value_fence(x: T) -> T { let y = unsafe { (&x as *const T).read_volatile() }; - std::mem::forget(x); + std::hint::forget(x); y } #[inline(always)] pub fn evaluate_and_drop(x: T) { unsafe { - let mut y = std::mem::uninitialized(); + let mut y = std::hint::uninitialized(); std::ptr::write_volatile(&mut y as *mut T, x); drop(y); // not necessary but for clarity } @@ -143,12 +190,13 @@ This approach is not pursued in this RFC because these two functions: # Prior art [prior-art]: #prior-art -These two exact functions are provided in the [`Google +Similar functionality is provided in the [`Google Benchmark`](https://github.com/google/benchmark) C++ library: are called [`DoNotOptimize`](https://github.com/google/benchmark/blob/61497236ddc0d797a47ef612831fb6ab34dc5c9d/include/benchmark/benchmark.h#L306) (`black_box`) and [`ClobberMemory`](https://github.com/google/benchmark/blob/61497236ddc0d797a47ef612831fb6ab34dc5c9d/include/benchmark/benchmark.h#L317). -The `black_box` function with slightly different semantics is provided by the `test` crate: +The `black_box` function with slightly different semantics is provided by the +`test` crate: [`test::black_box`](https://github.com/rust-lang/rust/blob/master/src/libtest/lib.rs#L1551). # Unresolved questions From 6ccddcc324c7669fd73ab63bf3421357e4d4be84 Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Wed, 29 Aug 2018 12:07:41 +0200 Subject: [PATCH 04/22] update examples with unsafe code --- text/0000-bench-utils.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/text/0000-bench-utils.md b/text/0000-bench-utils.md index ae64276d30f..28143111354 100644 --- a/text/0000-bench-utils.md +++ b/text/0000-bench-utils.md @@ -46,7 +46,7 @@ For example ([`rust.godbolt.org`](https://godbolt.org/g/YP2GCJ)): ```rust fn foo(x: i32) -> i32{ - hint::black_box(2 + x); + unsafe { hint::black_box(2 + x) }; 3 } let a = foo(2); @@ -106,9 +106,11 @@ We can use `hint::black_box` to create a more realistic synthetic benchmark ```rust fn push_cap(v: &mut Vec) { for i in 0..4 { - black_box(v.as_ptr()); - v.push(black_box(i)); - black_box(v.as_ptr()); + unsafe { + black_box(v.as_ptr()); + v.push(black_box(i)); + black_box(v.as_ptr()); + } } } ``` From ec04a5755de73298ea637882afcf8403d293c9b6 Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Sun, 9 Sep 2018 18:41:02 +0200 Subject: [PATCH 05/22] remove unsafe and clarify what it is allowed to do --- text/0000-bench-utils.md | 39 ++++++++++++++++++++------------------- 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/text/0000-bench-utils.md b/text/0000-bench-utils.md index 28143111354..1a7976b8341 100644 --- a/text/0000-bench-utils.md +++ b/text/0000-bench-utils.md @@ -32,29 +32,30 @@ The function: ```rust /// An _unknown_ function that returns `x`. -pub unsafe fn black_box(x: T) -> T; +pub fn black_box(x: T) -> T; ``` is an _unknown_ function, that is, a function that the compiler cannot make any -assumptions about. It can potentially use `x` in any possible valid way that -`unsafe` Rust code is allowed to, and requires the compiler to be maximally -pessimistic in terms of optimizations. The compiler is still allowed to optimize -the expression generating `x`. This function returns `x` and is a no-op in the -virtual machine. +assumptions about. It can use `x` in any possible valid way that Rust code is +allowed to without introducing undefined behavior in the calling code. This +requires the compiler to be maximally pessimistic in terms of optimizations, but +the compiler is still allowed to optimize the expression generating `x`. This +function returns `x` and is a no-op in the virtual machine. For example ([`rust.godbolt.org`](https://godbolt.org/g/YP2GCJ)): ```rust fn foo(x: i32) -> i32{ - unsafe { hint::black_box(2 + x) }; + hint::black_box(2 + x); 3 } let a = foo(2); ``` -In the call to `foo(2)` the compiler is allowed to simplify the expression `2 + x` -down to `4`, but `4` must be stored into memory even though it is not read by -anything afterwards because `black_box` could try to read it. +In the call to `foo(2)` the compiler is allowed to simplify the expression `2 + +x` down to `4`, but `4` must be materialized, for example, by storing it into +memory, a register, etc., even though `4` is not read by anything afterwards +because `black_box` could try to read it. ### Benchmarking `Vec::push` @@ -106,11 +107,9 @@ We can use `hint::black_box` to create a more realistic synthetic benchmark ```rust fn push_cap(v: &mut Vec) { for i in 0..4 { - unsafe { - black_box(v.as_ptr()); - v.push(black_box(i)); - black_box(v.as_ptr()); - } + black_box(v.as_ptr()); + v.push(black_box(i)); + black_box(v.as_ptr()); } } ``` @@ -126,13 +125,15 @@ The ``` mod core::hint { /// An _unknown_ unsafe function that returns `x`. - pub unsafe fn black_box(x: T) -> T; + pub fn black_box(x: T) -> T; } ``` -is an _unknown_ `unsafe` function that can perform any valid operation on `x` -that `unsafe` Rust is allowed to perform. This function returns `x` and is a -no-op in the virtual machine. +is an _unknown_ function that can perform any valid operation on `x` that Rust +is allowed to perform without introducing undefined behavior in the calling +code. You can rely on `black_box` being a `NOP` just returning `x`, but the +compiler will optimize under the pessimistic assumption that `black_box` might +do anything with the data it got. # Drawbacks [drawbacks]: #drawbacks From f9c8094133db8e05df81bb3f9d59e61bb6d39b9b Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Mon, 17 Sep 2018 15:55:05 +0200 Subject: [PATCH 06/22] incorporate ralf comments --- text/0000-bench-utils.md | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/text/0000-bench-utils.md b/text/0000-bench-utils.md index 1a7976b8341..862a917aef4 100644 --- a/text/0000-bench-utils.md +++ b/text/0000-bench-utils.md @@ -35,12 +35,14 @@ The function: pub fn black_box(x: T) -> T; ``` -is an _unknown_ function, that is, a function that the compiler cannot make any -assumptions about. It can use `x` in any possible valid way that Rust code is -allowed to without introducing undefined behavior in the calling code. This -requires the compiler to be maximally pessimistic in terms of optimizations, but -the compiler is still allowed to optimize the expression generating `x`. This -function returns `x` and is a no-op in the virtual machine. +returns `x` and is an _unknown_ function, that is, a function that the compiler +cannot make any assumptions about. It can use `x` in any possible valid way that +Rust code is allowed to without introducing undefined behavior in the calling +code. This requires the compiler to be maximally pessimistic in terms of +optimizations, but the compiler is still allowed to optimize the expression +generating `x`. While the compiler must assume that `black_box` performs any +legal mutation of `x`, the programmer can rely on `black_box` not actually +having any effect (other than inhibiting optimizations). For example ([`rust.godbolt.org`](https://godbolt.org/g/YP2GCJ)): @@ -54,8 +56,8 @@ let a = foo(2); In the call to `foo(2)` the compiler is allowed to simplify the expression `2 + x` down to `4`, but `4` must be materialized, for example, by storing it into -memory, a register, etc., even though `4` is not read by anything afterwards -because `black_box` could try to read it. +memory, a register, etc. because `black_box` could try to read it even though +`4` is not read by anything afterwards. ### Benchmarking `Vec::push` @@ -102,7 +104,8 @@ from triggering, but in this synthetic benchmark LLVM optimizations are producing a benchmark that won't tell us anything about the cost of `Vec::push`. We can use `hint::black_box` to create a more realistic synthetic benchmark -(https://rust.godbolt.org/z/CeXmxN): +since the compiler has to assume that `black_box` observes and mutates its +argument it cannot optimize the whole benchmark away (https://rust.godbolt.org/z/CeXmxN): ```rust fn push_cap(v: &mut Vec) { From c21c6b70e20f79454877625e1c14a84a03eb3e51 Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Mon, 17 Sep 2018 16:15:11 +0200 Subject: [PATCH 07/22] incorporate more comments from ralf --- text/0000-bench-utils.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/text/0000-bench-utils.md b/text/0000-bench-utils.md index 862a917aef4..47ac2aa669f 100644 --- a/text/0000-bench-utils.md +++ b/text/0000-bench-utils.md @@ -55,9 +55,9 @@ let a = foo(2); ``` In the call to `foo(2)` the compiler is allowed to simplify the expression `2 + -x` down to `4`, but `4` must be materialized, for example, by storing it into -memory, a register, etc. because `black_box` could try to read it even though -`4` is not read by anything afterwards. +x` down to `4`. However, `4` must be materialized, for example, by storing it +into memory, a register, etc. because `black_box` could try to read it, even +though `4` is not read by anything afterwards. ### Benchmarking `Vec::push` @@ -103,9 +103,9 @@ probably use the vector for something, preventing all of these optimizations from triggering, but in this synthetic benchmark LLVM optimizations are producing a benchmark that won't tell us anything about the cost of `Vec::push`. -We can use `hint::black_box` to create a more realistic synthetic benchmark -since the compiler has to assume that `black_box` observes and mutates its -argument it cannot optimize the whole benchmark away (https://rust.godbolt.org/z/CeXmxN): +We can use hint::black_box to create a more realistic synthetic benchmark. The +compiler has to assume that `black_box` observes and mutates its argument, hence +it cannot optimize away the whole benchmark (https://rust.godbolt.org/z/CeXmxN): ```rust fn push_cap(v: &mut Vec) { From 80da3868581cbeb9b7b8965f70139ca7dfdfd340 Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Wed, 19 Sep 2018 14:24:10 +0200 Subject: [PATCH 08/22] incorporate more feedback --- text/0000-bench-utils.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/text/0000-bench-utils.md b/text/0000-bench-utils.md index 47ac2aa669f..6c960617a49 100644 --- a/text/0000-bench-utils.md +++ b/text/0000-bench-utils.md @@ -39,8 +39,7 @@ returns `x` and is an _unknown_ function, that is, a function that the compiler cannot make any assumptions about. It can use `x` in any possible valid way that Rust code is allowed to without introducing undefined behavior in the calling code. This requires the compiler to be maximally pessimistic in terms of -optimizations, but the compiler is still allowed to optimize the expression -generating `x`. While the compiler must assume that `black_box` performs any +optimizations. While the compiler must assume that `black_box` performs any legal mutation of `x`, the programmer can rely on `black_box` not actually having any effect (other than inhibiting optimizations). @@ -55,9 +54,9 @@ let a = foo(2); ``` In the call to `foo(2)` the compiler is allowed to simplify the expression `2 + -x` down to `4`. However, `4` must be materialized, for example, by storing it -into memory, a register, etc. because `black_box` could try to read it, even -though `4` is not read by anything afterwards. +x` down to `4`. However, even though `4` is not read by anything afterwards, it +must be computed and materialized, for example, by storing it into memory, a +register, etc. because `black_box` could try to read it. ### Benchmarking `Vec::push` From b0e04a6005e774fc936143db6b44511e3ad22efc Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Mon, 8 Oct 2018 10:04:25 +0200 Subject: [PATCH 09/22] amend specification of black_box --- text/0000-bench-utils.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/text/0000-bench-utils.md b/text/0000-bench-utils.md index 6c960617a49..8c709aed392 100644 --- a/text/0000-bench-utils.md +++ b/text/0000-bench-utils.md @@ -126,21 +126,22 @@ The ``` mod core::hint { - /// An _unknown_ unsafe function that returns `x`. + /// An _unknown_ function that returns `x`. pub fn black_box(x: T) -> T; } ``` -is an _unknown_ function that can perform any valid operation on `x` that Rust -is allowed to perform without introducing undefined behavior in the calling -code. You can rely on `black_box` being a `NOP` just returning `x`, but the -compiler will optimize under the pessimistic assumption that `black_box` might -do anything with the data it got. +hint is a `NOP` that returns `x`. It behaves like an _unknown_ function that can +perform any valid operation on `x` that Rust is allowed to perform without +introducing undefined behavior in the calling code. + +Note: in practice this means that the compiler will optimize under the +pessimistic assumption that `black_box` might do anything with the data it got. # Drawbacks [drawbacks]: #drawbacks -TBD. +Slightly increases the surface complexity of `libcore`. # Rationale and alternatives [alternatives]: #alternatives From ea2dfeb44327ab6b04631648e0d13a7167d95582 Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Wed, 10 Oct 2018 10:20:27 +0200 Subject: [PATCH 10/22] incorporate centril feedback --- text/0000-bench-utils.md | 99 ++++++++++++++++++++++------------------ 1 file changed, 54 insertions(+), 45 deletions(-) diff --git a/text/0000-bench-utils.md b/text/0000-bench-utils.md index 8c709aed392..bf93c296f00 100644 --- a/text/0000-bench-utils.md +++ b/text/0000-bench-utils.md @@ -6,44 +6,47 @@ # Summary [summary]: #summary -This RFC adds one function, `core::hint::black_box`, which is a hint to the -optimizer to disable certain compiler optimizations. +This RFC adds `core::hint::black_box`, a hint to disable certain compiler +optimizations. # Motivation [motivation]: #motivation -A tool for preventing compiler optimizations is widely useful. One application -is writing synthetic benchmarks, where, due to the constrained nature of the +A hint to disable compiler optimizations is widely useful. One such application +is writing synthetic benchmarks where, due to the constrained nature of the benchmark, the compiler is able to perform optimizations that wouldn't otherwise -trigger in practice. Another application is writing constant time code, where it -is undesirable for the compiler to optimize certain operations depending on the -context in which they are executed. +trigger in practice. -The implementation of this function is backend-specific and currently requires -inline assembly. No viable alternative is available in stable Rust. +There are currently no viable stable Rust alternatives for `black_box`. The +current nightly Rust implementations all rely on inline assembly. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation - ## `hint::black_box` -The function: +The hint: ```rust -/// An _unknown_ function that returns `x`. pub fn black_box(x: T) -> T; ``` -returns `x` and is an _unknown_ function, that is, a function that the compiler -cannot make any assumptions about. It can use `x` in any possible valid way that -Rust code is allowed to without introducing undefined behavior in the calling -code. This requires the compiler to be maximally pessimistic in terms of -optimizations. While the compiler must assume that `black_box` performs any -legal mutation of `x`, the programmer can rely on `black_box` not actually -having any effect (other than inhibiting optimizations). +behaves like the [identity function][identity_fn]: it just returns `x` and has +no effects. However, Rust implementations are _encouraged_ to assume that +`black_box` can use `x` in any possible valid way that Rust code is allowed to +without introducing undefined behavior in the calling code. That is, +implementations are encouraged to be maximally pessimistic in terms of +optimizations. + +This property makes `black_box` useful for writing code in which certain +optimizations are not desired. However, disabling optimizations is not +guaranteed, which means that `black_box` is not a solution for programs that +rely on certain optimizations being disabled for correctness, like, for example, +constant time code. + +### Example 1 - basics -For example ([`rust.godbolt.org`](https://godbolt.org/g/YP2GCJ)): +Example 1 ([`rust.godbolt.org`](https://godbolt.org/g/YP2GCJ)): ```rust fn foo(x: i32) -> i32{ @@ -53,16 +56,17 @@ fn foo(x: i32) -> i32{ let a = foo(2); ``` -In the call to `foo(2)` the compiler is allowed to simplify the expression `2 + -x` down to `4`. However, even though `4` is not read by anything afterwards, it -must be computed and materialized, for example, by storing it into memory, a -register, etc. because `black_box` could try to read it. +In this example, the compiler simplifies the expression `2 + x` down to `4`. +However, even though `4` is not read by anything afterwards, it must be computed +and materialized, for example, by storing it into memory, a register, etc. +because the current Rust implementation assumes that `black_box` could try to +read it. -### Benchmarking `Vec::push` +### Example 2 - benchmarking `Vec::push` The `hint::black_box` is useful for producing synthetic benchmarks that more accurately represent the behavior of a real application. In the following -snippet, the function `bench` executes `Vec::push` 4 times in a loop: +example, the function `bench` executes `Vec::push` 4 times in a loop: ```rust fn push_cap(v: &mut Vec) { @@ -79,9 +83,9 @@ pub fn bench_push() -> Duration { } ``` -Here, we allocate the `Vec`, push into it without growing its capacity, and drop -it, without ever using it for anything. If we look at the assembly -(https://rust.godbolt.org/z/wDckJF): +This example allocates a `Vec`, pushes into it without growing its capacity, and +drops it, without ever using it for anything. The current Rust implementation +emits the following `x86_64` machine code (https://rust.godbolt.org/z/wDckJF): ```asm @@ -96,15 +100,14 @@ example::bench_push: ret ``` -it is pretty amazing: LLVM has actually managed to completely optimize the `Vec` -allocation and call to `push_cap` away! In our real application, we would -probably use the vector for something, preventing all of these optimizations -from triggering, but in this synthetic benchmark LLVM optimizations are -producing a benchmark that won't tell us anything about the cost of `Vec::push`. +LLVM is pretty amazing: it has optimized the `Vec` allocation and the calls to +`push_cap` away. In doing so, it has made our benchmark useless. It won't +measure the time it takes to perform the calls to `Vec::push` as we intended. -We can use hint::black_box to create a more realistic synthetic benchmark. The -compiler has to assume that `black_box` observes and mutates its argument, hence -it cannot optimize away the whole benchmark (https://rust.godbolt.org/z/CeXmxN): +In real applications, the program will use the vector for something, preventing +these optimizations. To produce a benchmark that takes that into account, we can +hint the compiler that the `Vec` is used for something +(https://rust.godbolt.org/z/CeXmxN): ```rust fn push_cap(v: &mut Vec) { @@ -116,8 +119,9 @@ fn push_cap(v: &mut Vec) { } ``` -that prevents LLVM from assuming anything about the vector across the calls to -`Vec::push`. +Inspecting the machine code reveals that, for this particular Rust +implementation, `black_box` successfully prevents LLVM from performing the +optimization that removes the `Vec::push` calls that we wanted to measure. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -126,17 +130,22 @@ The ``` mod core::hint { - /// An _unknown_ function that returns `x`. + /// Identity function that disables optimizations. pub fn black_box(x: T) -> T; } ``` -hint is a `NOP` that returns `x`. It behaves like an _unknown_ function that can -perform any valid operation on `x` that Rust is allowed to perform without -introducing undefined behavior in the calling code. +is a `NOP` that returns `x`, that is, its operational semantics are equivalent +to the [identity function][identity_fn]. + + +Implementations are encouraged, _but not required_, to treat `black_box` as an +_unknown_ function that can perform any valid operation on `x` that Rust is +allowed to perform without introducing undefined behavior in the calling code. +That is, to optimize `black_box` under the pessimistic assumption that it might +do anything with the data it got. -Note: in practice this means that the compiler will optimize under the -pessimistic assumption that `black_box` might do anything with the data it got. +[identity_fn]: https://doc.rust-lang.org/nightly/std/convert/fn.identity.html # Drawbacks [drawbacks]: #drawbacks From b1cce1aefaa7c926c96c5001703c8feaed0659ab Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Wed, 10 Oct 2018 11:00:09 +0200 Subject: [PATCH 11/22] add const fn unresolved question --- text/0000-bench-utils.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/text/0000-bench-utils.md b/text/0000-bench-utils.md index bf93c296f00..384d9f96161 100644 --- a/text/0000-bench-utils.md +++ b/text/0000-bench-utils.md @@ -128,7 +128,7 @@ optimization that removes the `Vec::push` calls that we wanted to measure. The -``` +```rust mod core::hint { /// Identity function that disables optimizations. pub fn black_box(x: T) -> T; @@ -217,4 +217,8 @@ The `black_box` function with slightly different semantics is provided by the # Unresolved questions [unresolved]: #unresolved-questions -TBD. +@Centril asked whether `black_box` should be a `const fn`. The current +implementation uses inline assembly. It is unclear at this point whether +`black_box` should be a `const fn`, and if it should, how exactly would we go +about it. We do not have to resolve this issue before stabilization since we can +always make it a `const fn` later, but we should not forget about it either. From 197beb3226d26e12db3800ab4ad34f856fc05d91 Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Thu, 11 Oct 2018 23:41:45 +0200 Subject: [PATCH 12/22] typo --- text/0000-bench-utils.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-bench-utils.md b/text/0000-bench-utils.md index 384d9f96161..f1c63a67aec 100644 --- a/text/0000-bench-utils.md +++ b/text/0000-bench-utils.md @@ -56,7 +56,7 @@ fn foo(x: i32) -> i32{ let a = foo(2); ``` -In this example, the compiler simplifies the expression `2 + x` down to `4`. +In this example, the compiler may simplify the expression `2 + x` down to `4`. However, even though `4` is not read by anything afterwards, it must be computed and materialized, for example, by storing it into memory, a register, etc. because the current Rust implementation assumes that `black_box` could try to From d688bdea113af359e264968b9b7b47bbc129306c Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Tue, 19 Feb 2019 11:31:17 +0100 Subject: [PATCH 13/22] Rename file to match feature name --- text/{0000-bench-utils.md => 0000-black-box.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename text/{0000-bench-utils.md => 0000-black-box.md} (100%) diff --git a/text/0000-bench-utils.md b/text/0000-black-box.md similarity index 100% rename from text/0000-bench-utils.md rename to text/0000-black-box.md From 51b12b91c5cdd029aaec327d9a5bb84bdb6f6e18 Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Tue, 19 Feb 2019 11:49:07 +0100 Subject: [PATCH 14/22] Clarify motivation --- text/0000-black-box.md | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/text/0000-black-box.md b/text/0000-black-box.md index f1c63a67aec..7d817f098a2 100644 --- a/text/0000-black-box.md +++ b/text/0000-black-box.md @@ -6,19 +6,24 @@ # Summary [summary]: #summary -This RFC adds `core::hint::black_box`, a hint to disable certain compiler -optimizations. +This RFC adds `core::hint::black_box` (see [black box]), an identity function +that hints the compiler to be maximally pessimistic in terms of the assumptions +about what `black_box` could do. + +[black box]: https://en.wikipedia.org/wiki/Black_box # Motivation [motivation]: #motivation -A hint to disable compiler optimizations is widely useful. One such application -is writing synthetic benchmarks where, due to the constrained nature of the -benchmark, the compiler is able to perform optimizations that wouldn't otherwise -trigger in practice. +Due to the constrained nature of synthetic benchmarks, the compiler is often +able to perform optimizations that wouldn't otherwise trigger in practice, like +completely removing a benchmark if it has no side-effects. -There are currently no viable stable Rust alternatives for `black_box`. The -current nightly Rust implementations all rely on inline assembly. +Currently, stable Rust users need to introduce expensive operations into their +programs to prevent these optimizations. Examples thereof are volatile loads and +stores, or calling unknown functions via C FFI. These operations incur overheads +that often would not be present in the application the synthetic benchmark is +trying to model. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation @@ -49,7 +54,7 @@ constant time code. Example 1 ([`rust.godbolt.org`](https://godbolt.org/g/YP2GCJ)): ```rust -fn foo(x: i32) -> i32{ +fn foo(x: i32) -> i32 { hint::black_box(2 + x); 3 } @@ -143,7 +148,7 @@ Implementations are encouraged, _but not required_, to treat `black_box` as an _unknown_ function that can perform any valid operation on `x` that Rust is allowed to perform without introducing undefined behavior in the calling code. That is, to optimize `black_box` under the pessimistic assumption that it might -do anything with the data it got. +do anything with the data it got, even though it actually does nothing. [identity_fn]: https://doc.rust-lang.org/nightly/std/convert/fn.identity.html @@ -221,4 +226,4 @@ The `black_box` function with slightly different semantics is provided by the implementation uses inline assembly. It is unclear at this point whether `black_box` should be a `const fn`, and if it should, how exactly would we go about it. We do not have to resolve this issue before stabilization since we can -always make it a `const fn` later, but we should not forget about it either. +always make it a `const fn` later, but we should not forget about it either From aae4d0a6ee10eaf01daf9df0deeff67884ca650c Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Tue, 19 Feb 2019 11:50:44 +0100 Subject: [PATCH 15/22] Make black_box a const fn --- text/0000-black-box.md | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/text/0000-black-box.md b/text/0000-black-box.md index 7d817f098a2..94bbc2efb47 100644 --- a/text/0000-black-box.md +++ b/text/0000-black-box.md @@ -33,7 +33,7 @@ trying to model. The hint: ```rust -pub fn black_box(x: T) -> T; +pub const fn black_box(x: T) -> T; ``` behaves like the [identity function][identity_fn]: it just returns `x` and has @@ -136,7 +136,7 @@ The ```rust mod core::hint { /// Identity function that disables optimizations. - pub fn black_box(x: T) -> T; + pub const fn black_box(x: T) -> T; } ``` @@ -221,9 +221,3 @@ The `black_box` function with slightly different semantics is provided by the # Unresolved questions [unresolved]: #unresolved-questions - -@Centril asked whether `black_box` should be a `const fn`. The current -implementation uses inline assembly. It is unclear at this point whether -`black_box` should be a `const fn`, and if it should, how exactly would we go -about it. We do not have to resolve this issue before stabilization since we can -always make it a `const fn` later, but we should not forget about it either From 1448bccafd2e687bda375645d1fe1e1cdccb1c3b Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Tue, 19 Feb 2019 11:51:41 +0100 Subject: [PATCH 16/22] Add unresolved question about naming --- text/0000-black-box.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/text/0000-black-box.md b/text/0000-black-box.md index 94bbc2efb47..7d626778b0c 100644 --- a/text/0000-black-box.md +++ b/text/0000-black-box.md @@ -221,3 +221,5 @@ The `black_box` function with slightly different semantics is provided by the # Unresolved questions [unresolved]: #unresolved-questions + +* Naming: it is unclear whether `black_box` is the right name for this primitive at this point. From bb7876c2e6f302baa1c525b1de2243f5335a2939 Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Tue, 19 Feb 2019 12:05:55 +0100 Subject: [PATCH 17/22] Add pros/cons about naming and mention alternative names --- text/0000-black-box.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/text/0000-black-box.md b/text/0000-black-box.md index 7d626778b0c..d03c08b2a05 100644 --- a/text/0000-black-box.md +++ b/text/0000-black-box.md @@ -222,4 +222,11 @@ The `black_box` function with slightly different semantics is provided by the # Unresolved questions [unresolved]: #unresolved-questions -* Naming: it is unclear whether `black_box` is the right name for this primitive at this point. +* Naming: it is unclear whether `black_box` is the right name for this primitive + at this point. Some argumens in favor or against are that: + * pro: [black box] is a common term in computer programming, that conveys that + nothing can be assumed about it except for its inputs and outputs. + * con: `_box` has nothing to do with `Box` or `box`-syntax, which might be confusing + + Alternative names suggested: `pessimize`, `unoptimize`, `unprocessed`, `unknown`, + `do_not_optimize` (Google Benchmark). From d94514c85beb0ba45776fd1af8aff640d0ca2cfd Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Tue, 19 Feb 2019 12:19:01 +0100 Subject: [PATCH 18/22] Do not make this API a const fn --- text/0000-black-box.md | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/text/0000-black-box.md b/text/0000-black-box.md index d03c08b2a05..145b862b2a8 100644 --- a/text/0000-black-box.md +++ b/text/0000-black-box.md @@ -33,7 +33,7 @@ trying to model. The hint: ```rust -pub const fn black_box(x: T) -> T; +pub fn black_box(x: T) -> T; ``` behaves like the [identity function][identity_fn]: it just returns `x` and has @@ -136,7 +136,7 @@ The ```rust mod core::hint { /// Identity function that disables optimizations. - pub const fn black_box(x: T) -> T; + pub fn black_box(x: T) -> T; } ``` @@ -184,7 +184,7 @@ which the following two functions are provided instead: ```rust #[inline(always)] pub fn value_fence(x: T) -> T { - let y = unsafe { (&x as *const T).read_volatile() }; + let y = unsafe { (&x as *T).read_volatile() }; std::hint::forget(x); y } @@ -222,10 +222,16 @@ The `black_box` function with slightly different semantics is provided by the # Unresolved questions [unresolved]: #unresolved-questions +* `const fn`: it is unclear whether `black_box` should be a `const fn`. If it + were, that would hint that it cannot have any side-effects, or that it cannot + do anything that `fn`s cannot do. + * Naming: it is unclear whether `black_box` is the right name for this primitive at this point. Some argumens in favor or against are that: - * pro: [black box] is a common term in computer programming, that conveys that - nothing can be assumed about it except for its inputs and outputs. + * pro: [black box] is a common term in computer programming, that conveys + that nothing can be assumed about it except for its inputs and outputs. + con: [black box] often hints that the function has no side-effects, but + this is not something that can be assumed about this API. * con: `_box` has nothing to do with `Box` or `box`-syntax, which might be confusing Alternative names suggested: `pessimize`, `unoptimize`, `unprocessed`, `unknown`, From 474aff0a32dbc9a10accd925758c7a8cfed7d770 Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Tue, 19 Feb 2019 13:20:44 +0100 Subject: [PATCH 19/22] rework guide level explanation --- text/0000-black-box.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/text/0000-black-box.md b/text/0000-black-box.md index 145b862b2a8..4bd98c161c3 100644 --- a/text/0000-black-box.md +++ b/text/0000-black-box.md @@ -44,10 +44,8 @@ implementations are encouraged to be maximally pessimistic in terms of optimizations. This property makes `black_box` useful for writing code in which certain -optimizations are not desired. However, disabling optimizations is not -guaranteed, which means that `black_box` is not a solution for programs that -rely on certain optimizations being disabled for correctness, like, for example, -constant time code. +optimizations are not desired, but too unreliable when disabling these +optimizations is required for correctness. ### Example 1 - basics From ce164eeed0093f5447878247ff219358da8a3b5f Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Tue, 19 Feb 2019 14:15:30 +0100 Subject: [PATCH 20/22] Fix typo --- text/0000-black-box.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-black-box.md b/text/0000-black-box.md index 4bd98c161c3..117ac64f39e 100644 --- a/text/0000-black-box.md +++ b/text/0000-black-box.md @@ -222,7 +222,7 @@ The `black_box` function with slightly different semantics is provided by the * `const fn`: it is unclear whether `black_box` should be a `const fn`. If it were, that would hint that it cannot have any side-effects, or that it cannot - do anything that `fn`s cannot do. + do anything that `const fn`s cannot do. * Naming: it is unclear whether `black_box` is the right name for this primitive at this point. Some argumens in favor or against are that: From 1b0f6472541347557c3804f3695a62f1d4ccf81a Mon Sep 17 00:00:00 2001 From: gnzlbg Date: Thu, 25 Jul 2019 10:58:23 +0200 Subject: [PATCH 21/22] Rename to bench_black_box and add bench_input/output alternative --- ...0-black-box.md => 0000-bench-black-box.md} | 92 ++++++++++++++----- 1 file changed, 69 insertions(+), 23 deletions(-) rename text/{0000-black-box.md => 0000-bench-black-box.md} (63%) diff --git a/text/0000-black-box.md b/text/0000-bench-black-box.md similarity index 63% rename from text/0000-black-box.md rename to text/0000-bench-black-box.md index 117ac64f39e..d7c3932abbe 100644 --- a/text/0000-black-box.md +++ b/text/0000-bench-black-box.md @@ -1,4 +1,4 @@ -- Feature Name: black_box +- Feature Name: bench_black_box - Start Date: 2018-03-12 - RFC PR: (leave this empty) - Rust Issue: (leave this empty) @@ -6,11 +6,11 @@ # Summary [summary]: #summary -This RFC adds `core::hint::black_box` (see [black box]), an identity function +This RFC adds `core::hint::bench_black_box` (see [black box]), an identity function that hints the compiler to be maximally pessimistic in terms of the assumptions -about what `black_box` could do. +about what `bench_black_box` could do. -[black box]: https://en.wikipedia.org/wiki/Black_box +[black box]: https://en.wikipedia.org/wiki/black_box # Motivation [motivation]: #motivation @@ -28,22 +28,22 @@ trying to model. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -## `hint::black_box` +## `hint::bench_black_box` The hint: ```rust -pub fn black_box(x: T) -> T; +pub fn bench_black_box(x: T) -> T; ``` behaves like the [identity function][identity_fn]: it just returns `x` and has no effects. However, Rust implementations are _encouraged_ to assume that -`black_box` can use `x` in any possible valid way that Rust code is allowed to +`bench_black_box` can use `x` in any possible valid way that Rust code is allowed to without introducing undefined behavior in the calling code. That is, implementations are encouraged to be maximally pessimistic in terms of optimizations. -This property makes `black_box` useful for writing code in which certain +This property makes `bench_black_box` useful for writing code in which certain optimizations are not desired, but too unreliable when disabling these optimizations is required for correctness. @@ -53,7 +53,7 @@ Example 1 ([`rust.godbolt.org`](https://godbolt.org/g/YP2GCJ)): ```rust fn foo(x: i32) -> i32 { - hint::black_box(2 + x); + hint::bench_black_box(2 + x); 3 } let a = foo(2); @@ -62,12 +62,12 @@ let a = foo(2); In this example, the compiler may simplify the expression `2 + x` down to `4`. However, even though `4` is not read by anything afterwards, it must be computed and materialized, for example, by storing it into memory, a register, etc. -because the current Rust implementation assumes that `black_box` could try to +because the current Rust implementation assumes that `bench_black_box` could try to read it. ### Example 2 - benchmarking `Vec::push` -The `hint::black_box` is useful for producing synthetic benchmarks that more +The `hint::bench_black_box` is useful for producing synthetic benchmarks that more accurately represent the behavior of a real application. In the following example, the function `bench` executes `Vec::push` 4 times in a loop: @@ -115,15 +115,15 @@ hint the compiler that the `Vec` is used for something ```rust fn push_cap(v: &mut Vec) { for i in 0..4 { - black_box(v.as_ptr()); - v.push(black_box(i)); - black_box(v.as_ptr()); + bench_black_box(v.as_ptr()); + v.push(bench_black_box(i)); + bench_black_box(v.as_ptr()); } } ``` Inspecting the machine code reveals that, for this particular Rust -implementation, `black_box` successfully prevents LLVM from performing the +implementation, `bench_black_box` successfully prevents LLVM from performing the optimization that removes the `Vec::push` calls that we wanted to measure. # Reference-level explanation @@ -134,7 +134,7 @@ The ```rust mod core::hint { /// Identity function that disables optimizations. - pub fn black_box(x: T) -> T; + pub fn bench_black_box(x: T) -> T; } ``` @@ -142,10 +142,10 @@ is a `NOP` that returns `x`, that is, its operational semantics are equivalent to the [identity function][identity_fn]. -Implementations are encouraged, _but not required_, to treat `black_box` as an +Implementations are encouraged, _but not required_, to treat `bench_black_box` as an _unknown_ function that can perform any valid operation on `x` that Rust is allowed to perform without introducing undefined behavior in the calling code. -That is, to optimize `black_box` under the pessimistic assumption that it might +That is, to optimize `bench_black_box` under the pessimistic assumption that it might do anything with the data it got, even though it actually does nothing. [identity_fn]: https://doc.rust-lang.org/nightly/std/convert/fn.identity.html @@ -200,18 +200,55 @@ pub fn evaluate_and_drop(x: T) { This approach is not pursued in this RFC because these two functions: * add overhead ([`rust.godbolt.org`](https://godbolt.org/g/aCpPfg)): `volatile` - reads and stores aren't no ops, but the proposed `black_box` and `clobber` + reads and stores aren't no ops, but the proposed `bench_black_box` and `clobber` functions are. * are implementable on stable Rust: while we could add them to `std` they do not necessarily need to be there. +## `bench_input` / `bench_outpu` + +@eddyb proposed +[here](https://github.com/rust-lang/rfcs/pull/2360#issuecomment-463594450) (and +the discussion that followed) to add two other hints instead: + +* `bench_input`: `fn(T) -> T` (identity-like) may prevent some optimizations + from seeing through the valid `T` value, more specifically, things like + const/load-folding and range-analysis miri would still check the argument, and + so it couldn't be e.g. uninitialized the argument computation can be + optimized-out (unlike `bench_output`) mostly implementable today with the same + strategy as `black_box`. + +* `bench_output`: `fn(T) -> ()` (drop-like) may prevent some optimizations from + optimizing out the computation of its argument the argument is not treated as + "escaping into unknown code", i.e., you can't implement `bench_output(x)` as + `{ bench_input(&mut x); x }`. What that would likely prevent is placing `x` + into a register instead of memory, but optimizations might still see the old + value of `x`, as if it couldn't have been mutated potentially implementable + like `black_box` but `readonly`/`readnone` in LLVM. + +From the RFC discussion there was consensus that we might want to add these +benchmarking hints in the future as well because their are easier to specify and +provide stronger guarantees than `bench_black_box`. + +Right now, however, it is unclear whether these two hints can be implemented +strictly in LLVM. The comment thread shows that the best we can actually do +ends up implementing both of these as `bench_black_box` with the same effects. + +Without a strict implementation, it is unclear which value these two intrinsics +would add, and more importantly, since their difference in semantics cannot be +shown, it is also unclear how we could teach users to use them correctly. + +If we ever able to implement these correctly, we might want to consider +deprecating `bench_black_box` at that point, but whether it will be worth +deprecating is not clear either. + # Prior art [prior-art]: #prior-art Similar functionality is provided in the [`Google Benchmark`](https://github.com/google/benchmark) C++ library: are called [`DoNotOptimize`](https://github.com/google/benchmark/blob/61497236ddc0d797a47ef612831fb6ab34dc5c9d/include/benchmark/benchmark.h#L306) -(`black_box`) and +(`bench_black_box`) and [`ClobberMemory`](https://github.com/google/benchmark/blob/61497236ddc0d797a47ef612831fb6ab34dc5c9d/include/benchmark/benchmark.h#L317). The `black_box` function with slightly different semantics is provided by the `test` crate: @@ -220,12 +257,21 @@ The `black_box` function with slightly different semantics is provided by the # Unresolved questions [unresolved]: #unresolved-questions -* `const fn`: it is unclear whether `black_box` should be a `const fn`. If it +* `const fn`: it is unclear whether `bench_black_box` should be a `const fn`. If it were, that would hint that it cannot have any side-effects, or that it cannot do anything that `const fn`s cannot do. -* Naming: it is unclear whether `black_box` is the right name for this primitive - at this point. Some argumens in favor or against are that: +* Naming: during the RFC discussion it was unclear whether `black_box` is the + right name for this primitive but we settled on `bench_black_box` for the time + being. We should resolve the naming before stabilization. + + Also, we might want to add other benchmarking hints in the future, like + `bench_input` and `bench_output`, so we might want to put all of this + into a `bench` sub-module within the `core::hint` module. That might + be a good place to explain how the benchmarking hints should be used + holistically. + + Some arguments in favor or against using "black box" are that: * pro: [black box] is a common term in computer programming, that conveys that nothing can be assumed about it except for its inputs and outputs. con: [black box] often hints that the function has no side-effects, but From bda0ba64092c2feb242726557665009124f162e6 Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Mon, 2 Sep 2019 22:45:00 +0200 Subject: [PATCH 22/22] RFC 2360 --- text/{0000-bench-black-box.md => 2360-bench-black-box.md} | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) rename text/{0000-bench-black-box.md => 2360-bench-black-box.md} (98%) diff --git a/text/0000-bench-black-box.md b/text/2360-bench-black-box.md similarity index 98% rename from text/0000-bench-black-box.md rename to text/2360-bench-black-box.md index d7c3932abbe..70133284bf0 100644 --- a/text/0000-bench-black-box.md +++ b/text/2360-bench-black-box.md @@ -1,7 +1,7 @@ -- Feature Name: bench_black_box +- Feature Name: `bench_black_box` - Start Date: 2018-03-12 -- RFC PR: (leave this empty) -- Rust Issue: (leave this empty) +- RFC PR: [rust-lang/rfcs#2360](https://github.com/rust-lang/rfcs/pull/2360) +- Rust Issue: [rust-lang/rust#64102](https://github.com/rust-lang/rust/issues/64102) # Summary [summary]: #summary