Skip to content

Conversation

@Raimo33
Copy link

@Raimo33 Raimo33 commented Sep 2, 2025

Goal

This PR refactors the benchmarking functions as per #1701, in order to make benchmarks more deterministic and less influenced by the environvment.

This is achieved by replacing Wall-Clock Timer with Per-Process CPU Timer when possible.

@Raimo33 Raimo33 marked this pull request as draft September 2, 2025 14:58
@Raimo33 Raimo33 mentioned this pull request Sep 2, 2025
2 tasks
@real-or-random
Copy link
Contributor

real-or-random commented Sep 2, 2025

Just some quick comments:

[x] remove the number of runs (count) in favor of simpler cleaner approach with just number of iterations (iter).

I think there's a reason to have this. Some benchmarks take much longer than others, so it probably makes sense to run fewer iters for these.

[x] remove min and max statistics in favor of simpler approach with just avg.

I think min and max are useful. For constant-time code, you can also compare min. And max gives you an idea if there were fluctuations or not.

[x] remove needless fixed point conversion in favor of simpler floating point divisions.

Well, okay, that has a history; see #689. It's debatable if it makes sense to avoid floating point math, but as long as it doesn't get in your way here, it's a cool thing to keep it. :D

@real-or-random
Copy link
Contributor

It will be useful to split your changes into meaningful and separate commits, see https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#committing-patches.

@Raimo33
Copy link
Author

Raimo33 commented Sep 2, 2025

I think min and max just complicate things. let me explain:
first of all, as it is right now, they don't even measure the min and max, they just measure the min/max of the averages of all runs. aka not the absolute. Furthermore, in order to have them, one would need to run all the iterations 10 times more. benchmarks are already slow, adding this min/max slows them by 10 fold. imho it's completely unnecessary. @real-or-random

@sipa
Copy link
Contributor

sipa commented Sep 2, 2025

If we're going to rework this, I'd suggest using the stabilized quartiles approach from https://cr.yp.to/papers/rsrst-20250727.pdf:

  • StQ1: the average of all samples between 1st and 3rd octile
  • StQ2: the average of all samples between 3rd and 5th octile
  • StQ3: the average of all samples between 5th and 7th octile

@Raimo33
Copy link
Author

Raimo33 commented Sep 2, 2025

I think there's a reason to have this. Some benchmarks take much longer than others, so it probably makes sense to run fewer iters for these.

right now all benchmarks are run with count=10 and fixed iters (apart from ecmult_multi which adjusts the number of iters, not count).

therefore count is only useful to extrapolate min and max

@Raimo33
Copy link
Author

Raimo33 commented Sep 2, 2025

Well, okay, that has a history; see #689. It's debatable if it makes sense to avoid floating point math, but as long as it doesn't get in your way here, it's a cool thing to keep it. :D

I disagree with #689. It overcomplicate things for the sake of not having floating point math. those divisions aren't even in the hot path, they're outside the benchmarks.

@sipa
Copy link
Contributor

sipa commented Sep 2, 2025

Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.

@Raimo33
Copy link
Author

Raimo33 commented Sep 2, 2025

Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.

what is the usefulness of measuring min/max when we are removing OS interference & thermal throttling out of the equation? min/max will be extremely close to the avg no matter how bad the benchmarked function is.

@Raimo33 Raimo33 force-pushed the benchmark-precise branch 3 times, most recently from 97e5264 to 254a014 Compare September 2, 2025 16:08
@Raimo33 Raimo33 changed the title [WIP] Refactor benchmark [WIP] refactor: remove system interference from benchmarks Sep 2, 2025
@Raimo33 Raimo33 force-pushed the benchmark-precise branch 2 times, most recently from 1d9d6d0 to 4c9a074 Compare September 2, 2025 16:26
@Raimo33
Copy link
Author

Raimo33 commented Sep 2, 2025

by the way, gettimeofday() is officially discouraged since 2008 in favor of clock_gettime(). The POSIX standard marks it as obsolescent but still provides the API for backward compatibility.

@Raimo33 Raimo33 force-pushed the benchmark-precise branch 2 times, most recently from ddeaede to 71dff3f Compare September 2, 2025 17:45
@Raimo33
Copy link
Author

Raimo33 commented Sep 2, 2025

even though the manual says that CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core, modern CPUs have largely addressed this issue. So I think it is fair to compile CLOCK_PROCESS_CPUTIME_ID even though we don't have the guarantee that the user has pinned the benchmarking process to a core. The worst case scenario is a unreliable benchmark, which the current repo has anyways.

I added a line in the README.md for best practices to run the benchmarks.

I also tried adding a function to pin the process to a core directly in C, but there's no standard POSIX compliant way to do so. There is pthread_setaffinity_np() on linux, where 'np' stands for 'not portable'

@Raimo33 Raimo33 force-pushed the benchmark-precise branch 4 times, most recently from 3e43c75 to ef9e40e Compare September 2, 2025 20:31
@real-or-random
Copy link
Contributor

Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.

what is the usefulness of measuring min/max when we are removing OS interference & thermal throttling out of the equation? min/max will be extremely close to the avg no matter how bad the benchmarked function is.

The point is exactly having a simple way of verifying that there's indeed no interference. Getting rid of sources of variance is hard to get right, and it's impossible to get a perfect solution. (This discussion shows this!) So we better have a way of spotting if something is off.

I like the stabilized quartiles idea.

@Raimo33
Copy link
Author

Raimo33 commented Sep 2, 2025

I like the stabilized quartiles idea.

tbh it scares me a bit, will see what I can do. Maybe in a future PR.

@Raimo33 Raimo33 marked this pull request as ready for review September 2, 2025 23:56
@Raimo33 Raimo33 force-pushed the benchmark-precise branch 2 times, most recently from 1485450 to e691474 Compare October 18, 2025 08:11
@Raimo33
Copy link
Author

Raimo33 commented Oct 21, 2025

there appears to be an issue in the CI. @real-or-random can you trigger it again?

also there's one last thing I don't like about this PR: having to define POSIX_C_SOURCE in every source file. if anyone finds a way to define it once, perhaps in the CMakeLists.txt please submit it.

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1732 (comment):

I suggest making this PR focused on a single (uncontroversial) change, which is switching to per process clocks.

I strongly support this approach and I suggest postponing the following changes for follow-up PRs::

  • 4f3403e "build: rename executables with prefix", because it touches not only benchmarks but tests as well;
  • e691474 "build: addbenchmarks to ctest", because it would be reasonable to consider this after #1760.

I also suggest dropping a93078f "refactor: reorder cmake commands for better readability" as it does the opposite given the style used for all other targets.

src/tests.c Outdated
Comment on lines 7 to 8
#define _POSIX_C_SOURCE 199309L /* for clock_gettime() */

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's annoying that test sources have to be modified. See #1734 (comment).

@hebasto
Copy link
Member

hebasto commented Oct 26, 2025

#1732 (comment):

This seems overcomplicated and convoluted.

#1732 (comment):

This looks compilicated, almost overcomplicated, without much benefits...

The PR author appears to favor simplicity over correctness (see the comment above). I disagree with that approach.

but you're welcome to provide a commit and I'll see if it should be cherrypicked.

Sure. Here is a branch that addresses some of @purpleKarrot's comments and most of mine: https://github.com/hebasto/secp256k1/commits/251026-pr1732.alt/. However, it doesn't yet include the necessary changes to the Autotools build system.

UPD. Asked a question about handling _POSIX_C_SOURCE on IRC.

hebasto and others added 4 commits October 26, 2025 15:26
This commit improves the reliability of benchmarks by removing some of the influence of other background running processes. This is achieved by using CPU bound clocks that aren't influenced by interrupts, sleeps, blocked I/O, etc.
@Raimo33
Copy link
Author

Raimo33 commented Oct 26, 2025

I agree, droppped the 3 unrelated commits from this PR, removed the unnecessary #define, cherry-picked your solution for the library detection

@Raimo33 Raimo33 marked this pull request as draft October 26, 2025 14:37
@furszy
Copy link
Member

furszy commented Oct 27, 2025

Just started reviewing it; CMake compilation fails locally.
It should set _POSIX_C_SOURCE=199309L instead, so the compiler expands correctly to -D_POSIX_C_SOURCE=199309L. See

diff --git a/cmake/FindClockGettime.cmake b/cmake/FindClockGettime.cmake
--- a/cmake/FindClockGettime.cmake	(revision e14981e28e1c1e4b2fb6321cd94342d0b2849be7)
+++ b/cmake/FindClockGettime.cmake	(date 1761574207222)
@@ -20,7 +20,7 @@
 
 cmake_push_check_state(RESET)
 
-set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L)
+set(CMAKE_REQUIRED_DEFINITIONS _POSIX_C_SOURCE=199309L)
 check_symbol_exists(clock_gettime "time.h" CLOCK_GETTIME_IS_BUILT_IN)
 set(${CMAKE_FIND_PACKAGE_NAME}_FOUND ${CLOCK_GETTIME_IS_BUILT_IN})

@hebasto
Copy link
Member

hebasto commented Oct 27, 2025

Just started reviewing it; CMake compilation fails locally. It should set _POSIX_C_SOURCE=199309L instead, so the compiler expands correctly to -D_POSIX_C_SOURCE=199309L during check_symbol_exists.

I can't reproduce it. What CMake version are you using?

From CMake docs:

CMAKE_REQUIRED_DEFINITIONS

A semicolon-separated list of compiler definitions, each of the form -DFOO or -DFOO=bar.

@Raimo33
Copy link
Author

Raimo33 commented Oct 27, 2025

Just started reviewing it; CMake compilation fails locally. It should set _POSIX_C_SOURCE=199309L instead, so the compiler expands correctly to -D_POSIX_C_SOURCE=199309L during check_symbol_exists. See

Can you provide the error output? my CMake doesn't complain

@furszy
Copy link
Member

furszy commented Oct 27, 2025

Can you provide the error output? my CMake doesn't complain

Sure.

In file included from In file included from <built-in><built-in>:418:
<command line>:In file included from 1<built-in>:418:
:9<command line>:: 1In file included from :<built-in>::418:
4189error: :
<command line>:macro name must be an identifier<command line>:1: 1:9: 
:error: 9error: : #define -D_POSIX_C_SOURCE 199309L
        ^macro name must be an identifier
macro name must be an identifier
error: 
macro name must be an identifier
#define -D_POSIX_C_SOURCE 199309L

I can't reproduce it. What CMake version are you using?

cmake version 3.22.3

@hebasto
Copy link
Member

hebasto commented Oct 27, 2025

cmake version 3.22.3

Confirming. I'll suggest a fix shortly.

UPD. There was a change in CMake 3.26:

For all COMPILE_DEFINITIONS properties, any leading -D on an item is removed...

@hebasto
Copy link
Member

hebasto commented Oct 27, 2025

Just started reviewing it; CMake compilation fails locally. It should set _POSIX_C_SOURCE=199309L instead, so the compiler expands correctly to -D_POSIX_C_SOURCE=199309L. See

diff --git a/cmake/FindClockGettime.cmake b/cmake/FindClockGettime.cmake
--- a/cmake/FindClockGettime.cmake	(revision e14981e28e1c1e4b2fb6321cd94342d0b2849be7)
+++ b/cmake/FindClockGettime.cmake	(date 1761574207222)
@@ -20,7 +20,7 @@
 
 cmake_push_check_state(RESET)
 
-set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L)
+set(CMAKE_REQUIRED_DEFINITIONS _POSIX_C_SOURCE=199309L)
 check_symbol_exists(clock_gettime "time.h" CLOCK_GETTIME_IS_BUILT_IN)
 set(${CMAKE_FIND_PACKAGE_NAME}_FOUND ${CLOCK_GETTIME_IS_BUILT_IN})

Here is the minimal diff to fix the error for CMake older thank 3.26:

--- a/cmake/FindClockGettime.cmake
+++ b/cmake/FindClockGettime.cmake
@@ -34,7 +34,7 @@ if(${CMAKE_FIND_PACKAGE_NAME}_FOUND)
   if(NOT TARGET POSIX::clock_gettime)
     add_library(POSIX::clock_gettime INTERFACE IMPORTED)
     set_target_properties(POSIX::clock_gettime PROPERTIES
-      INTERFACE_COMPILE_DEFINITIONS "${CMAKE_REQUIRED_DEFINITIONS}"
+      INTERFACE_COMPILE_DEFINITIONS _POSIX_C_SOURCE=199309L
       INTERFACE_LINK_LIBRARIES "${CMAKE_REQUIRED_LIBRARIES}"
     )
   endif()

@furszy
Copy link
Member

furszy commented Oct 27, 2025

Isn't that going to make CMAKE_REQUIRED_DEFINITIONS unused? the set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L) line.

@hebasto
Copy link
Member

hebasto commented Oct 27, 2025

Isn't that going to make CMAKE_REQUIRED_DEFINITIONS unused? the set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L) line.

It is still used by check_symbol_exists().

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered GetProcessTimes or GetThreadTimes to ignore sleep and waiting times on Windows?
It seems to me that the current approach is semantically different across platforms: on POSIX systems, you get CPU time (if enabled), whereas on Windows you’re using QueryPerformanceCounter, which measures elapsed wall-clock time.
So results will not be comparable across platforms?

#if defined(_WIN32)

LARGE_INTEGER freq, counter;
QueryPerformanceFrequency(&freq);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancefrequency, freq should be static and initialized only once. The function's description says:

The frequency of the performance counter is fixed at system boot and is consistent across all processors. Therefore, the frequency need only be queried upon application initialization, and the result can be cached.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, it's not mandatory but recommended. It is redundant to initialize it every time but I found no other solution since we are not in an OOP environment. and using an if statement to initialize it only the first time doesn't seem optimal.

Copy link
Member

@furszy furszy Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Using an init_time(void) function could work in that case, but it would require every binary to call this method early in every program’s execution. All good anyway.

Another nit; QueryPerformanceFrequency returns 0 if the call fails, so it would be good to handle that as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nit; QueryPerformanceFrequency returns 0 if the call fails, so it would be good to handle that as well.

my reasoning Is the same here. It would be one extra branch. and a failure in the clock is not a huge deal.

@Raimo33
Copy link
Author

Raimo33 commented Oct 27, 2025

Have you considered GetProcessTimes or GetThreadTimes

I have, but they lack precision.

@furszy
Copy link
Member

furszy commented Oct 27, 2025

Have you considered GetProcessTimes or GetThreadTimes

I have, but they lack precision.

That's interesting. What do you think about the semantical difference between the Windows and POSIX approaches?
It seems to me that we should at least mention in the docs that results aren't comparable across platforms.

@Raimo33
Copy link
Author

Raimo33 commented Oct 28, 2025

What do you think about the semantical difference between the Windows and POSIX approaches? It seems to me that we should at least mention in the docs that results aren't comparable across platforms.

We added print statements for that exact reason. On windows there simply isn't a reliable way to get per-process time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants