-
Couldn't load subscription status.
- Fork 1.1k
bench: replace wall-clock timer with per-process CPU timer #1732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Just some quick comments:
I think there's a reason to have this. Some benchmarks take much longer than others, so it probably makes sense to run fewer iters for these.
I think
Well, okay, that has a history; see #689. It's debatable if it makes sense to avoid floating point math, but as long as it doesn't get in your way here, it's a cool thing to keep it. :D |
|
It will be useful to split your changes into meaningful and separate commits, see https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#committing-patches. |
|
I think |
|
If we're going to rework this, I'd suggest using the stabilized quartiles approach from https://cr.yp.to/papers/rsrst-20250727.pdf:
|
6aff035 to
d456fad
Compare
right now all benchmarks are run with count=10 and fixed iters (apart from ecmult_multi which adjusts the number of iters, not count). therefore |
I disagree with #689. It overcomplicate things for the sake of not having floating point math. those divisions aren't even in the hot path, they're outside the benchmarks. |
|
Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards. |
what is the usefulness of measuring min/max when we are removing OS interference & thermal throttling out of the equation? min/max will be extremely close to the avg no matter how bad the benchmarked function is. |
97e5264 to
254a014
Compare
1d9d6d0 to
4c9a074
Compare
|
by the way, |
ddeaede to
71dff3f
Compare
|
even though the manual says that I added a line in the README.md for best practices to run the benchmarks. I also tried adding a function to pin the process to a core directly in C, but there's no standard POSIX compliant way to do so. There is |
3e43c75 to
ef9e40e
Compare
The point is exactly having a simple way of verifying that there's indeed no interference. Getting rid of sources of variance is hard to get right, and it's impossible to get a perfect solution. (This discussion shows this!) So we better have a way of spotting if something is off. I like the stabilized quartiles idea. |
tbh it scares me a bit, will see what I can do. Maybe in a future PR. |
ef9e40e to
66745f7
Compare
1485450 to
e691474
Compare
|
there appears to be an issue in the CI. @real-or-random can you trigger it again? also there's one last thing I don't like about this PR: having to define |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest making this PR focused on a single (uncontroversial) change, which is switching to per process clocks.
I strongly support this approach and I suggest postponing the following changes for follow-up PRs::
- 4f3403e "build: rename executables with prefix", because it touches not only benchmarks but tests as well;
- e691474 "build: addbenchmarks to ctest", because it would be reasonable to consider this after #1760.
I also suggest dropping a93078f "refactor: reorder cmake commands for better readability" as it does the opposite given the style used for all other targets.
src/tests.c
Outdated
| #define _POSIX_C_SOURCE 199309L /* for clock_gettime() */ | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's annoying that test sources have to be modified. See #1734 (comment).
The PR author appears to favor simplicity over correctness (see the comment above). I disagree with that approach.
Sure. Here is a branch that addresses some of @purpleKarrot's comments and most of mine: https://github.com/hebasto/secp256k1/commits/251026-pr1732.alt/. However, it doesn't yet include the necessary changes to the Autotools build system. UPD. Asked a question about handling |
This commit improves the reliability of benchmarks by removing some of the influence of other background running processes. This is achieved by using CPU bound clocks that aren't influenced by interrupts, sleeps, blocked I/O, etc.
e691474 to
e14981e
Compare
|
I agree, droppped the 3 unrelated commits from this PR, removed the unnecessary |
|
Just started reviewing it; CMake compilation fails locally. diff --git a/cmake/FindClockGettime.cmake b/cmake/FindClockGettime.cmake
--- a/cmake/FindClockGettime.cmake (revision e14981e28e1c1e4b2fb6321cd94342d0b2849be7)
+++ b/cmake/FindClockGettime.cmake (date 1761574207222)
@@ -20,7 +20,7 @@
cmake_push_check_state(RESET)
-set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L)
+set(CMAKE_REQUIRED_DEFINITIONS _POSIX_C_SOURCE=199309L)
check_symbol_exists(clock_gettime "time.h" CLOCK_GETTIME_IS_BUILT_IN)
set(${CMAKE_FIND_PACKAGE_NAME}_FOUND ${CLOCK_GETTIME_IS_BUILT_IN}) |
I can't reproduce it. What CMake version are you using? From CMake docs:
|
Can you provide the error output? my CMake doesn't complain |
Sure.
cmake version 3.22.3 |
Confirming. I'll suggest a fix shortly. UPD. There was a change in CMake 3.26:
|
Here is the minimal diff to fix the error for CMake older thank 3.26: --- a/cmake/FindClockGettime.cmake
+++ b/cmake/FindClockGettime.cmake
@@ -34,7 +34,7 @@ if(${CMAKE_FIND_PACKAGE_NAME}_FOUND)
if(NOT TARGET POSIX::clock_gettime)
add_library(POSIX::clock_gettime INTERFACE IMPORTED)
set_target_properties(POSIX::clock_gettime PROPERTIES
- INTERFACE_COMPILE_DEFINITIONS "${CMAKE_REQUIRED_DEFINITIONS}"
+ INTERFACE_COMPILE_DEFINITIONS _POSIX_C_SOURCE=199309L
INTERFACE_LINK_LIBRARIES "${CMAKE_REQUIRED_LIBRARIES}"
)
endif() |
|
Isn't that going to make |
It is still used by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered GetProcessTimes or GetThreadTimes to ignore sleep and waiting times on Windows?
It seems to me that the current approach is semantically different across platforms: on POSIX systems, you get CPU time (if enabled), whereas on Windows you’re using QueryPerformanceCounter, which measures elapsed wall-clock time.
So results will not be comparable across platforms?
| #if defined(_WIN32) | ||
|
|
||
| LARGE_INTEGER freq, counter; | ||
| QueryPerformanceFrequency(&freq); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancefrequency, freq should be static and initialized only once. The function's description says:
The frequency of the performance counter is fixed at system boot and is consistent across all processors. Therefore, the frequency need only be queried upon application initialization, and the result can be cached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, it's not mandatory but recommended. It is redundant to initialize it every time but I found no other solution since we are not in an OOP environment. and using an if statement to initialize it only the first time doesn't seem optimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Using an init_time(void) function could work in that case, but it would require every binary to call this method early in every program’s execution. All good anyway.
Another nit; QueryPerformanceFrequency returns 0 if the call fails, so it would be good to handle that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another nit;
QueryPerformanceFrequencyreturns 0 if the call fails, so it would be good to handle that as well.
my reasoning Is the same here. It would be one extra branch. and a failure in the clock is not a huge deal.
I have, but they lack precision. |
That's interesting. What do you think about the semantical difference between the Windows and POSIX approaches? |
We added print statements for that exact reason. On windows there simply isn't a reliable way to get per-process time. |
Goal
This PR refactors the benchmarking functions as per #1701, in order to make benchmarks more deterministic and less influenced by the environvment.
This is achieved by replacing Wall-Clock Timer with Per-Process CPU Timer when possible.