Commit f0af76d
authored
Replace OS sleep with GPU nanosleep kernel in event timing test (#1285)
* Replace timing-based event test with deterministic elapsed-time check
The previous test attempted to measure a real sleep delay between two
event records, which introduced flakiness (especially on Windows/WDDM)
and tested OS/driver timing behavior rather than the __sub__ implementation
itself.
This change replaces the test with a minimal, deterministic version that:
* records two back-to-back events on the same stream
* synchronizes on the second event to ensure both timestamps are valid
* asserts that cuEventElapsedTime returns a finite, non-negative float
This exercises the success path of Event.__sub__ without depending on
actual GPU/OS timing characteristics, or requiring artificial GPU work.
* cuda_core/tests/helpers/__init__.py: also use CUDA_HOME
* Revert "cuda_core/tests/helpers/__init__.py: also use CUDA_HOME"
This reverts commit 605f1ef.
* Use nanosleep kernel in test_event_elapsed_time_basic for deterministic timing
Replace the back-to-back event record test with a version that uses a
__nanosleep kernel between events. This ensures a guaranteed positive
elapsed time (delta_ms > 10) without depending on OS/driver timing
characteristics or requiring artificial GPU work beyond the minimal
nanosleep delay.
The kernel sleeps for 20ms (double the assertion threshold of 10ms),
providing a large safety margin above the ~0.5 microsecond resolution
of cudaEventElapsedTime, making this test deterministic and non-flaky
across platforms including Windows/WDDM.
* Fix nanosleep kernel to use clock64() loop for guaranteed duration
Replace single __nanosleep() call with clock64()-based loop to ensure the
kernel actually waits for the full 20ms duration. A single __nanosleep()
call doesn't guarantee the full sleep duration, which caused measured
times to be orders of magnitude less than expected (~0.2ms instead of
~20ms).
The new implementation:
- Uses clock64() to measure actual elapsed time
- Loops until 20ms worth of clock cycles have elapsed
- Uses __nanosleep(1000000) inside the loop to yield and avoid 100% CPU spin
This ensures delta_ms > 10 assertion is reliable and the test passes
deterministically.
* clock64() return type is documented as `long long int`:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/#time-function
* Use device.arch instead of joining device.compute_capability
* cusor-generated cuda_core/tests/helpers/nanosleep_kernel.py
* Change NanosleepKernel API to sleep_duration_ms
* Rename back to test_timing_success
* Streamline a comment
* Polish comments. Make the code more similar to the existing code.
* Simplify nanosleep_kernel implementation.1 parent f52c71a commit f0af76d
2 files changed
+63
-18
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
5 | | - | |
| 4 | + | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
16 | | - | |
| 15 | + | |
17 | 16 | | |
18 | 17 | | |
19 | 18 | | |
| |||
23 | 22 | | |
24 | 23 | | |
25 | 24 | | |
26 | | - | |
27 | | - | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
28 | 32 | | |
29 | | - | |
| 33 | + | |
30 | 34 | | |
31 | 35 | | |
32 | 36 | | |
33 | 37 | | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
45 | 47 | | |
46 | 48 | | |
47 | 49 | | |
| |||
0 commit comments