bench: replace wall-clock timer with per-process CPU timer #1732

Raimo33 · 2025-09-02T14:58:38Z

Goal

This PR refactors the benchmarking functions as per #1701, in order to make benchmarks more deterministic and less influenced by the environvment.

This is achieved by replacing Wall-Clock Timer with Per-Process CPU Timer when possible.

real-or-random · 2025-09-02T15:05:16Z

Just some quick comments:

[x] remove the number of runs (count) in favor of simpler cleaner approach with just number of iterations (iter).

I think there's a reason to have this. Some benchmarks take much longer than others, so it probably makes sense to run fewer iters for these.

[x] remove min and max statistics in favor of simpler approach with just avg.

I think min and max are useful. For constant-time code, you can also compare min. And max gives you an idea if there were fluctuations or not.

[x] remove needless fixed point conversion in favor of simpler floating point divisions.

Well, okay, that has a history; see #689. It's debatable if it makes sense to avoid floating point math, but as long as it doesn't get in your way here, it's a cool thing to keep it. :D

real-or-random · 2025-09-02T15:06:45Z

It will be useful to split your changes into meaningful and separate commits, see https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#committing-patches.

Raimo33 · 2025-09-02T15:09:22Z

I think min and max just complicate things. let me explain:
first of all, as it is right now, they don't even measure the min and max, they just measure the min/max of the averages of all runs. aka not the absolute. Furthermore, in order to have them, one would need to run all the iterations 10 times more. benchmarks are already slow, adding this min/max slows them by 10 fold. imho it's completely unnecessary. @real-or-random

sipa · 2025-09-02T15:15:13Z

If we're going to rework this, I'd suggest using the stabilized quartiles approach from https://cr.yp.to/papers/rsrst-20250727.pdf:

StQ1: the average of all samples between 1st and 3rd octile
StQ2: the average of all samples between 3rd and 5th octile
StQ3: the average of all samples between 5th and 7th octile

Raimo33 · 2025-09-02T15:23:30Z

I think there's a reason to have this. Some benchmarks take much longer than others, so it probably makes sense to run fewer iters for these.

right now all benchmarks are run with count=10 and fixed iters (apart from ecmult_multi which adjusts the number of iters, not count).

therefore count is only useful to extrapolate min and max

Raimo33 · 2025-09-02T15:33:42Z

Well, okay, that has a history; see #689. It's debatable if it makes sense to avoid floating point math, but as long as it doesn't get in your way here, it's a cool thing to keep it. :D

I disagree with #689. It overcomplicate things for the sake of not having floating point math. those divisions aren't even in the hot path, they're outside the benchmarks.

sipa · 2025-09-02T15:42:33Z

Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.

Raimo33 · 2025-09-02T15:57:38Z

Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.

what is the usefulness of measuring min/max when we are removing OS interference & thermal throttling out of the equation? min/max will be extremely close to the avg no matter how bad the benchmarked function is.

Raimo33 · 2025-09-02T17:32:13Z

by the way, gettimeofday() is officially discouraged since 2008 in favor of clock_gettime(). The POSIX standard marks it as obsolescent but still provides the API for backward compatibility.

Raimo33 · 2025-09-02T19:45:56Z

even though the manual says that CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core, modern CPUs have largely addressed this issue. So I think it is fair to compile CLOCK_PROCESS_CPUTIME_ID even though we don't have the guarantee that the user has pinned the benchmarking process to a core. The worst case scenario is a unreliable benchmark, which the current repo has anyways.

I added a line in the README.md for best practices to run the benchmarks.

I also tried adding a function to pin the process to a core directly in C, but there's no standard POSIX compliant way to do so. There is pthread_setaffinity_np() on linux, where 'np' stands for 'not portable'

real-or-random · 2025-09-02T20:47:36Z

Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.

what is the usefulness of measuring min/max when we are removing OS interference & thermal throttling out of the equation? min/max will be extremely close to the avg no matter how bad the benchmarked function is.

The point is exactly having a simple way of verifying that there's indeed no interference. Getting rid of sources of variance is hard to get right, and it's impossible to get a perfect solution. (This discussion shows this!) So we better have a way of spotting if something is off.

I like the stabilized quartiles idea.

Raimo33 · 2025-09-02T20:52:37Z

I like the stabilized quartiles idea.

tbh it scares me a bit, will see what I can do. Maybe in a future PR.

CMakeLists.txt

README.md

src/bench.h

Raimo33 · 2025-10-21T07:25:21Z

there appears to be an issue in the CI. @real-or-random can you trigger it again?

also there's one last thing I don't like about this PR: having to define POSIX_C_SOURCE in every source file. if anyone finds a way to define it once, perhaps in the CMakeLists.txt please submit it.

hebasto

#1732 (comment):

I suggest making this PR focused on a single (uncontroversial) change, which is switching to per process clocks.

I strongly support this approach and I suggest postponing the following changes for follow-up PRs::

4f3403e "build: rename executables with prefix", because it touches not only benchmarks but tests as well;
e691474 "build: addbenchmarks to ctest", because it would be reasonable to consider this after #1760.

I also suggest dropping a93078f "refactor: reorder cmake commands for better readability" as it does the opposite given the style used for all other targets.

src/tests_exhaustive.c

hebasto · 2025-10-25T21:06:25Z

src/tests.c

+#define _POSIX_C_SOURCE 199309L /* for clock_gettime() */
+


That's annoying that test sources have to be modified. See #1734 (comment).

hebasto · 2025-10-26T00:15:27Z

#1732 (comment):

This seems overcomplicated and convoluted.

#1732 (comment):

This looks compilicated, almost overcomplicated, without much benefits...

The PR author appears to favor simplicity over correctness (see the comment above). I disagree with that approach.

but you're welcome to provide a commit and I'll see if it should be cherrypicked.

Sure. Here is a branch that addresses some of @purpleKarrot's comments and most of mine: https://github.com/hebasto/secp256k1/commits/251026-pr1732.alt/. However, it doesn't yet include the necessary changes to the Autotools build system.

UPD. Asked a question about handling _POSIX_C_SOURCE on IRC.

This commit improves the reliability of benchmarks by removing some of the influence of other background running processes. This is achieved by using CPU bound clocks that aren't influenced by interrupts, sleeps, blocked I/O, etc.

Raimo33 · 2025-10-26T14:34:49Z

I agree, droppped the 3 unrelated commits from this PR, removed the unnecessary #define, cherry-picked your solution for the library detection

furszy · 2025-10-27T14:17:02Z

Just started reviewing it; CMake compilation fails locally.
It should set _POSIX_C_SOURCE=199309L instead, so the compiler expands correctly to -D_POSIX_C_SOURCE=199309L. See

diff --git a/cmake/FindClockGettime.cmake b/cmake/FindClockGettime.cmake
--- a/cmake/FindClockGettime.cmake	(revision e14981e28e1c1e4b2fb6321cd94342d0b2849be7)
+++ b/cmake/FindClockGettime.cmake	(date 1761574207222)
@@ -20,7 +20,7 @@
 
 cmake_push_check_state(RESET)
 
-set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L)
+set(CMAKE_REQUIRED_DEFINITIONS _POSIX_C_SOURCE=199309L)
 check_symbol_exists(clock_gettime "time.h" CLOCK_GETTIME_IS_BUILT_IN)
 set(${CMAKE_FIND_PACKAGE_NAME}_FOUND ${CLOCK_GETTIME_IS_BUILT_IN})

hebasto · 2025-10-27T14:28:46Z

Just started reviewing it; CMake compilation fails locally. It should set _POSIX_C_SOURCE=199309L instead, so the compiler expands correctly to -D_POSIX_C_SOURCE=199309L during check_symbol_exists.

I can't reproduce it. What CMake version are you using?

From CMake docs:

CMAKE_REQUIRED_DEFINITIONS

A semicolon-separated list of compiler definitions, each of the form -DFOO or -DFOO=bar.

Raimo33 · 2025-10-27T14:31:05Z

Just started reviewing it; CMake compilation fails locally. It should set _POSIX_C_SOURCE=199309L instead, so the compiler expands correctly to -D_POSIX_C_SOURCE=199309L during check_symbol_exists. See

Can you provide the error output? my CMake doesn't complain

furszy · 2025-10-27T14:42:04Z

Can you provide the error output? my CMake doesn't complain

Sure.

In file included from In file included from <built-in><built-in>:418:
<command line>:In file included from 1<built-in>:418:
:9<command line>:: 1In file included from :<built-in>::418:
4189error: :
<command line>:macro name must be an identifier<command line>:1: 1:9: 
:error: 9error: : #define -D_POSIX_C_SOURCE 199309L
        ^macro name must be an identifier
macro name must be an identifier
error: 
macro name must be an identifier
#define -D_POSIX_C_SOURCE 199309L

I can't reproduce it. What CMake version are you using?

cmake version 3.22.3

hebasto · 2025-10-27T14:47:15Z

cmake version 3.22.3

Confirming. I'll suggest a fix shortly.

UPD. There was a change in CMake 3.26:

For all COMPILE_DEFINITIONS properties, any leading -D on an item is removed...

hebasto · 2025-10-27T15:15:38Z

Just started reviewing it; CMake compilation fails locally. It should set _POSIX_C_SOURCE=199309L instead, so the compiler expands correctly to -D_POSIX_C_SOURCE=199309L. See

diff --git a/cmake/FindClockGettime.cmake b/cmake/FindClockGettime.cmake
--- a/cmake/FindClockGettime.cmake	(revision e14981e28e1c1e4b2fb6321cd94342d0b2849be7)
+++ b/cmake/FindClockGettime.cmake	(date 1761574207222)
@@ -20,7 +20,7 @@
 
 cmake_push_check_state(RESET)
 
-set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L)
+set(CMAKE_REQUIRED_DEFINITIONS _POSIX_C_SOURCE=199309L)
 check_symbol_exists(clock_gettime "time.h" CLOCK_GETTIME_IS_BUILT_IN)
 set(${CMAKE_FIND_PACKAGE_NAME}_FOUND ${CLOCK_GETTIME_IS_BUILT_IN})

Here is the minimal diff to fix the error for CMake older thank 3.26:

--- a/cmake/FindClockGettime.cmake
+++ b/cmake/FindClockGettime.cmake
@@ -34,7 +34,7 @@ if(${CMAKE_FIND_PACKAGE_NAME}_FOUND)
   if(NOT TARGET POSIX::clock_gettime)
     add_library(POSIX::clock_gettime INTERFACE IMPORTED)
     set_target_properties(POSIX::clock_gettime PROPERTIES
-      INTERFACE_COMPILE_DEFINITIONS "${CMAKE_REQUIRED_DEFINITIONS}"
+      INTERFACE_COMPILE_DEFINITIONS _POSIX_C_SOURCE=199309L
       INTERFACE_LINK_LIBRARIES "${CMAKE_REQUIRED_LIBRARIES}"
     )
   endif()

furszy · 2025-10-27T15:21:22Z

Isn't that going to make CMAKE_REQUIRED_DEFINITIONS unused? the set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L) line.

hebasto · 2025-10-27T15:22:39Z

Isn't that going to make CMAKE_REQUIRED_DEFINITIONS unused? the set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L) line.

It is still used by check_symbol_exists().

furszy

Have you considered GetProcessTimes or GetThreadTimes to ignore sleep and waiting times on Windows?
It seems to me that the current approach is semantically different across platforms: on POSIX systems, you get CPU time (if enabled), whereas on Windows you’re using QueryPerformanceCounter, which measures elapsed wall-clock time.
So results will not be comparable across platforms?

furszy · 2025-10-27T15:32:54Z

src/tests_common.h

+#if defined(_WIN32)
+
+    LARGE_INTEGER freq, counter;
+    QueryPerformanceFrequency(&freq);


Per https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancefrequency, freq should be static and initialized only once. The function's description says:

The frequency of the performance counter is fixed at system boot and is consistent across all processors. Therefore, the frequency need only be queried upon application initialization, and the result can be cached.

I know, it's not mandatory but recommended. It is redundant to initialize it every time but I found no other solution since we are not in an OOP environment. and using an if statement to initialize it only the first time doesn't seem optimal.

I see. Using an init_time(void) function could work in that case, but it would require every binary to call this method early in every program’s execution. All good anyway.

Another nit; QueryPerformanceFrequency returns 0 if the call fails, so it would be good to handle that as well.

Another nit; QueryPerformanceFrequency returns 0 if the call fails, so it would be good to handle that as well.

my reasoning Is the same here. It would be one extra branch. and a failure in the clock is not a huge deal.

Raimo33 · 2025-10-27T15:52:44Z

Have you considered GetProcessTimes or GetThreadTimes

I have, but they lack precision.

furszy · 2025-10-27T17:38:20Z

Have you considered GetProcessTimes or GetThreadTimes

I have, but they lack precision.

That's interesting. What do you think about the semantical difference between the Windows and POSIX approaches?
It seems to me that we should at least mention in the docs that results aren't comparable across platforms.

Raimo33 · 2025-10-28T12:39:28Z

What do you think about the semantical difference between the Windows and POSIX approaches? It seems to me that we should at least mention in the docs that results aren't comparable across platforms.

We added print statements for that exact reason. On windows there simply isn't a reliable way to get per-process time.

Raimo33 marked this pull request as draft September 2, 2025 14:58

Raimo33 mentioned this pull request Sep 2, 2025

[WIP] Add intel simd #1703

Draft

2 tasks

Raimo33 force-pushed the benchmark-precise branch from 6aff035 to d456fad Compare September 2, 2025 15:22

Raimo33 force-pushed the benchmark-precise branch 3 times, most recently from 97e5264 to 254a014 Compare September 2, 2025 16:08

Raimo33 changed the title ~~[WIP] Refactor benchmark~~ [WIP] refactor: remove system interference from benchmarks Sep 2, 2025

Raimo33 force-pushed the benchmark-precise branch 2 times, most recently from 1d9d6d0 to 4c9a074 Compare September 2, 2025 16:26

Raimo33 force-pushed the benchmark-precise branch 2 times, most recently from ddeaede to 71dff3f Compare September 2, 2025 17:45

Raimo33 force-pushed the benchmark-precise branch 4 times, most recently from 3e43c75 to ef9e40e Compare September 2, 2025 20:31

Raimo33 force-pushed the benchmark-precise branch from ef9e40e to 66745f7 Compare September 2, 2025 23:56

Raimo33 marked this pull request as ready for review September 2, 2025 23:56

real-or-random reviewed Sep 3, 2025

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

src/bench.h Outdated Show resolved Hide resolved

src/bench.h Outdated Show resolved Hide resolved

src/bench.h Outdated Show resolved Hide resolved

src/bench.h Outdated Show resolved Hide resolved

Raimo33 force-pushed the benchmark-precise branch 2 times, most recently from 1485450 to e691474 Compare October 18, 2025 08:11

Raimo33 requested review from hebasto, purpleKarrot and real-or-random October 24, 2025 09:13

hebasto reviewed Oct 25, 2025

View reviewed changes

src/tests_exhaustive.c Outdated Show resolved Hide resolved

hebasto mentioned this pull request Oct 25, 2025

Introduce (mini) unit test framework #1734

Merged

hebasto reviewed Oct 25, 2025

View reviewed changes

hebasto and others added 4 commits October 26, 2025 15:26

cmake: Add FindClockGettime module

28f273d

bench: print clock info

76a4fc6

docs: add benchmarking best practices

e14981e

Raimo33 force-pushed the benchmark-precise branch from e691474 to e14981e Compare October 26, 2025 14:34

Raimo33 marked this pull request as draft October 26, 2025 14:37

furszy reviewed Oct 27, 2025

View reviewed changes

Uh oh!

bench: replace wall-clock timer with per-process CPU timer #1732

Are you sure you want to change the base?

bench: replace wall-clock timer with per-process CPU timer #1732

Uh oh!

Conversation

Raimo33 commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Goal

Uh oh!

real-or-random commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

real-or-random commented Sep 2, 2025

Uh oh!

Raimo33 commented Sep 2, 2025

Uh oh!

sipa commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Raimo33 commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Raimo33 commented Sep 2, 2025

Uh oh!

sipa commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Raimo33 commented Sep 2, 2025

Uh oh!

Raimo33 commented Sep 2, 2025

Uh oh!

Raimo33 commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

real-or-random commented Sep 2, 2025

Uh oh!

Raimo33 commented Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Raimo33 commented Oct 21, 2025

Uh oh!

hebasto left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hebasto Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

hebasto commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Raimo33 commented Oct 26, 2025

Uh oh!

furszy commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hebasto commented Oct 27, 2025

Uh oh!

Raimo33 commented Oct 27, 2025

Uh oh!

furszy commented Oct 27, 2025

Uh oh!

hebasto commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hebasto commented Oct 27, 2025

Uh oh!

furszy commented Oct 27, 2025

Uh oh!

hebasto commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

furszy left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Raimo33 commented Sep 2, 2025 •

edited

Loading

real-or-random commented Sep 2, 2025 •

edited

Loading

sipa commented Sep 2, 2025 •

edited

Loading

Raimo33 commented Sep 2, 2025 •

edited

Loading

sipa commented Sep 2, 2025 •

edited

Loading

Raimo33 commented Sep 2, 2025 •

edited

Loading

hebasto commented Oct 26, 2025 •

edited

Loading

furszy commented Oct 27, 2025 •

edited

Loading

hebasto commented Oct 27, 2025 •

edited

Loading

hebasto commented Oct 27, 2025 •

edited

Loading

furszy left a comment •

edited

Loading

furszy Oct 27, 2025 •

edited

Loading