Inaccurate results reported for small methods

Unless doing instruction level profiling, the highest precision timer on a modern computer is about `25-32 cycles` (which is, under ideal circumstances, about `6ns` on a 5GHz processor at best and `32ns` on a 1GHz processor).

Due to platform specific differences, the maximum reported difference in the high precision timer APIs exposed by the OS is about `100ns`. Additionally, it is well documented that due to the latency between calls and other factors on the OS or hardware, the latency for such a call can be much worse, such as closer to `300ns` when a CPU level timer such as `RDTSC` is not available: https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps#resolution-precision-accuracy-and-stability.

While Benchmark.NET does try to account for small methods and while it also tries to account for noise due to call overhead and the like, there are many cases where the numbers it reports are of questionable accuracy.

One such example is the following:
![image](https://user-images.githubusercontent.com/10487869/132735069-fe6e4d5d-f3d4-4f2e-9618-4e3e6147b8eb.png)

In particular, if we look at the first entry `GetShortName_opt` is reporting a time of `0.2082 ns`. Even in an "ideal" scenario where the JIT is able to fully optimize the comparison against a constant value and optimize it to simple be `xor rax, rax`, this is still reporting that it takes approximately 1 cycle on a 5GHz CPU.
* It also shouldn't be able to optimize it like this. AFAIR, Benchmark.NET should be passing the value in and preventing the actual benchmark body from being inlined to avoid such issues.

It would be beneficial, IMO, if Benchmark.NET was more proactive about labeling potentially problematic results and had guidance on how to optimally write a test in a way that will provide accurate results.
* I would view a problematic result, at the very least, as anything taking less than 10ns. Most of these methods should be testing more than a single instruction and are running on 2-4GHz computers. So in an "ideal" environment, 10ns represents no more than 20 instructions and likely no memory accesses. Very few instructions take 0 cycles. Several take 1 cycle and can be pipelined for up to 4 to be in simultaneous dispatch, but its rare to actually have this. Many take 2-3 cycles and if you have any kind of memory access they will take about 3-11 cycles in the fastest scenario (potentially longer for uncached results among other things).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Inaccurate results reported for small methods #1802

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Inaccurate results reported for small methods #1802

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions