You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CUDA events suffer from low accuracy and include the kernel launch overhead. On the other hand, CUPTI provides a more reliable way to get consistent timing measurement.
This request asks to add an option to replace CUDA Events with CUPTI.
Details
CUDA events issues:
Accuracy and Stability:
cudaEvent can fluctuate in the range of 10-30us, making measurements of small computations unreliable
cudaEvent take into account the kernel launch overhead that depends on host/CPU execution and/or driver version
CUPTI:
~0.5us granularity vs. 10-30us
Not affected by kernel launch overhead
Consistency: measurements close to the profiler (nsys)
Efficiency: avoid using waiting/delay kernels to hide CPU overhead
The text was updated successfully, but these errors were encountered:
We do mitigate a lot of the issues with events by using blocking_kernels, so it's not quite as bad as it seems. I think this would be a great addition, I'm curious how much this would improve the stability of our results, especially when sync tags are used.
CUDA events suffer from low accuracy and include the kernel launch overhead. On the other hand, CUPTI provides a more reliable way to get consistent timing measurement.
This request asks to add an option to replace CUDA Events with CUPTI.
Details
CUDA events issues:
cudaEvent
can fluctuate in the range of 10-30us, making measurements of small computations unreliablecudaEvent
take into account the kernel launch overhead that depends on host/CPU execution and/or driver versionCUPTI:
The text was updated successfully, but these errors were encountered: