-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First-order CUDA follow-up fix: do not use NVTX #162
base: develop
Are you sure you want to change the base?
Conversation
ac217ae
to
5c37b81
Compare
This make trouble for continuous integration and is apparently not supported on all platforms. Since it is a debug function, it's just as well to remove it from the mainstream tree.
7435c0f
to
abef1d4
Compare
CUDA is always able to provide the timing of CUDA kernels running on the GPU to tools that observe this, either debuggers or performance analyzers. It is usually not able to register the timing and occupancy of threads on the CPU. NVTX is a mechanism that allows CPU code to add timestamps of its running threads into the same system. That makes it possible to discover whether the CPU is working on something while the GPU is working on something else. It is not foolproof because NVTX knows nothing about the CPU's "occupancy" (whether it actually does something or is sleeping). But it can help to understand the communication between CPU and GPU better. NVidia promises that it costs "nearly nothing", and I don't know how it actually compares to something like prof. It does create output for very nice performance analysers. But it is most certainly not necessary to have it in the develop branch for everybody. So while the transition from NVTX (version 2) to NVTX3 is ongoing and leads to cross-platform problems, I'd prefer to remove it entirely from the develop branch. There's probably nobody else than I who uses it anyway. |
Without this PR, PopSift does not build on Linux with CUDA 12.6 any more. |
Description
Remove use of NVTX from the develop branch. This is a profiling feature that is not needed for everybody out there. People who know about it can add the relevant lines themselves.
It is currently very troublesome because the transition from the NVTX library to header-only NVTX3 is complete on Windows, where NVTX has been removed, while NVTX3 does not even exist on Tegra, Jetson and maybe other Arm-based platforms. The removal is OK because NVTX is only relevant for very fine-grained debugging of the interplay between CPU and GPU. Its timing features can also be achieved by cudaEvents.