Enable discovery and loading at run time of NVRTC and nvJitLink libraries in a wheels ecosystem #363

vzhurba01 · 2025-01-08T00:30:31Z

Support discovery of NVRTC and nvJitLink libraries at run time

CTK installations distribute their libraries using personal packages:

nvidia-nvjitlink-cuXX
nvidia-cuda-nvrtc-cuXX

The relative path of their libraries to cuda-bindings is consistent,
and allows us to use relative paths to discover them when loading
at run time.

close #286
close #287

CTK installations distribute their libraries using personal packages: - nvidia-nvjitlink-cuXX - nvidia-cuda-nvrtc-cuXX The relative path of their libraries to cuda-bindings is consistent, and allows us to use relative paths to discover them when loading at run time.

copy-pr-bot · 2025-01-08T00:30:34Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

vzhurba01 · 2025-01-08T00:32:55Z

/ok to test

vzhurba01 · 2025-01-08T00:45:59Z

/ok to test

leofang

Thanks, Vlad! I left two comments below.

I understand this is the 2nd step as discussed offline. For the 1st step, I think it'd be nice to preserve what pip install cuda-python behaves today (not depending on CUDA wheels). So let's add an all optional dependency to facilitate the usage

pip install cuda-python[all]

This can be achieved by adding this section to pyproject.toml (example):

[project.optional-dependencies]
all = [
    "nvidia-cuda-nvrtc-cu12",
    "nvidia-nvjitlink-cu12>=12.3.*",
]

We should also document this new installation option.

cuda_bindings/setup.py

leofang · 2025-01-08T05:37:52Z

btw I'll test this manually before merging. I'll also add a new test workflow to test this new use case (meaning we don't install the mini CTK in the test job), in a separate PR. (Let me know if you are interested in work stealing 😉)

leofang · 2025-01-08T06:28:12Z

Another thing: Since we do not have any Windows GPU runners in the CI, even if I add a new test job targeting this capability there's still one bug that we wouldn't be able to catch, which is that we need a bit more logic to discover the NVRTC DLL location. We've done this for nvJitLink and the same logic was also applied in nvmath-python:

cuda-python/cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx

Lines 65 to 79 in e774b32

    
           # Next, check if DLLs are installed via pip 
        
           for sp in get_site_packages(): 
        
               mod_path = os.path.join(sp, "nvidia", "nvJitLink", "bin") 
        
               if not os.path.isdir(mod_path): 
        
                   continue 
        
               os.add_dll_directory(mod_path) 
        
           try: 
        
               handle = win32api.LoadLibraryEx( 
        
                   # Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path... 
        
                   os.path.join(mod_path, dll_name), 
        
                   0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR) 
        
           except: 
        
               pass 
        
           else: 
        
               break

so @vzhurba01 this will require some manual testing on Windows for the time being...

leofang · 2025-01-08T06:31:20Z

cc @kmaehashi @gmarkall for vis

vzhurba01 · 2025-01-08T22:54:49Z

Thanks, Vlad! I left two comments below.

I understand this is the 2nd step as discussed offline. For the 1st step, I think it'd be nice to preserve what pip install cuda-python behaves today (not depending on CUDA wheels). So let's add an all optional dependency to facilitate the usage
pip install cuda-python[all]
This can be achieved by adding this section to pyproject.toml (example):
[project.optional-dependencies]
all = [
    "nvidia-cuda-nvrtc-cu12",
    "nvidia-nvjitlink-cu12>=12.3.*",
]
We should also document this new installation option.

Done.

Also I've split the new documentation between this PR and a new draft PR #366. That draft would only make sense after the new wheels are posted.

vzhurba01 · 2025-01-08T23:04:32Z

Another thing: Since we do not have any Windows GPU runners in the CI, even if I add a new test job targeting this capability there's still one bug that we wouldn't be able to catch, which is that we need a bit more logic to discover the NVRTC DLL location. We've done this for nvJitLink and the same logic was also applied in nvmath-python:

cuda-python/cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx

Lines 65 to 79 in e774b32

# Next, check if DLLs are installed via pip

for sp in get_site_packages():

mod_path = os.path.join(sp, "nvidia", "nvJitLink", "bin")

if not os.path.isdir(mod_path):

continue

os.add_dll_directory(mod_path)

try:

handle = win32api.LoadLibraryEx(

# Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path...

os.path.join(mod_path, dll_name),

0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR)

except:

pass

else:

break

so @vzhurba01 this will require some manual testing on Windows for the time being...

Done.

Note that I found some awkwardness during testing and added a comment with commit: 0e51e7d#diff-368f861093deb2399fd7b59421bef21cace83e3092956aafe62b06fbb4e9f235R80-R83
The consequence of not handling this is tests start throwing:

nvrtc: error: failed to open nvrtc-builtins64_126.dll.
  Make sure that nvrtc-builtins64_126.dll is installed correctly.

because NVRTC APIs return error code NVRTC_ERROR_BUILTIN_OPERATION_FAILURE.

vzhurba01 · 2025-01-08T23:06:23Z

btw I'll test this manually before merging. I'll also add a new test workflow to test this new use case (meaning we don't install the mini CTK in the test job), in a separate PR. (Let me know if you are interested in work stealing 😉)

No work stealing just yet 😅 I'd be happy to grab it after I first work through my P0s and P1s ✊

vzhurba01 · 2025-01-08T23:06:40Z

/ok to test

leofang · 2025-01-08T23:27:28Z

The consequence of not handling this is tests start throwing:
nvrtc: error: failed to open nvrtc-builtins64_126.dll.
  Make sure that nvrtc-builtins64_126.dll is installed correctly.
because NVRTC APIs return error code NVRTC_ERROR_BUILTIN_OPERATION_FAILURE.

Ahhhh yes sorry I forgot about this and wasted your time 😓 Yes, on Windows it's annoying that DLL loading is weird, unlike on Linux where $ORIGIN could help find the additional shared libraries. What's even worse is that add_dll_directory doesn't seem to affect this search behavior...

Here's the treatment I added to nvmath-python:
https://github.com/NVIDIA/nvmath-python/blob/073b168ac0688fa3b84caaa8bb65948bf7db7eae/nvmath/_utils.py#L113-L140
basically it was a hack in that I force pre-loading the nvrtc-builtin DLL so that by the time it's needed by NVRTC it is already immediately available. It seems fine to me to just update PATH as you did, unless there's some additional caveats that I miss.

bdice · 2025-01-08T23:46:46Z

Is this change targeting only CUDA 12 releases? (nvJitLink is CUDA 12 only, but it might be possible for NVRTC wheels to be used with CUDA 11.)

Also, I am 100% happy with a hard (non-optional) dependency on CUDA wheels. We have moved to having a hard dependency on CUDA wheels in RAPIDS. It helps constrain the space of possible installation types and ensures the necessary libraries are available somehow.

vzhurba01 · 2025-01-09T00:05:49Z

basically it was a hack in that I force pre-loading the nvrtc-builtin DLL so that by the time it's needed by NVRTC it is already immediately available. It seems fine to me to just update PATH as you did, unless there's some additional caveats that I miss.

Normally I'd be against messing with the PATH, but because this is done only after we confirm that nvrtc64_120_0.dll both exists and loadable then this seems ok.

Is this change targeting only CUDA 12 releases?

My plan was to propagate this change to CUDA 11 as well. This would require a new 11.8.x patch release to make use of it, but I haven't put thought into what else should be bundled and when.

leofang · 2025-01-09T00:07:34Z

Is this change targeting only CUDA 12 releases?

My plan was to propagate this change to CUDA 11 as well. This would require a new 11.8.x patch release to make use of it, but I haven't put thought into what else should be bundled and when.

I added the to-be-backported label, so ideally the bot would raise a PR for us once this one is merged. However, the cherry picking could fail in which case we'll need to do it manually.

vzhurba01 · 2025-01-09T00:08:48Z

Note that the backport will need cleanup by changing the cu12 to cu11

leofang

LGTM! Thanks!

github-actions · 2025-01-09T00:59:28Z

Backport failed because this pull request contains merge commits. You can either backport this pull request manually, or configure the action to skip merge commits.

leofang · 2025-01-09T02:48:36Z

Looks like the bot is not happy with this PR 🤷 @vzhurba01 would you mind backporting the NVRTC bits manually (so as to avoid asymmetry between 11/12)?

leofang · 2025-01-09T16:40:15Z

btw I'll test this manually before merging. I'll also add a new test workflow to test this new use case (meaning we don't install the mini CTK in the test job), in a separate PR. (Let me know if you are interested in work stealing 😉)

No work stealing just yet 😅 I'd be happy to grab it after I first work through my P0s and P1s ✊

Tracked in #367 and ongoing in #368.

leofang · 2025-01-10T02:08:46Z

backporting the NVRTC bits manually

#369

cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in

vzhurba01 self-assigned this Jan 8, 2025

vzhurba01 added to-be-backported Trigger the bot to raise a backport PR upon merge P0 High priority - Must do! cuda.bindings Everything related to the cuda.bindings module enhancement Any code-related improvements packaging Anything related to wheels or Conda packages labels Jan 8, 2025

vzhurba01 added this to the cuda-python 12-next, 11-next milestone Jan 8, 2025

Merge branch 'main' into wheels-lib-loading

a59ecd4

vzhurba01 requested review from leofang and ksimpson-work January 8, 2025 00:47

leofang requested changes Jan 8, 2025

View reviewed changes

cuda_bindings/setup.py Outdated Show resolved Hide resolved

cuda_bindings/setup.py Outdated Show resolved Hide resolved

vzhurba01 added 4 commits January 8, 2025 11:13

Support wheels for Windows

400e4ea

Cleanup Windows support after testing

0e51e7d

Simplify dlopen call

0f7c777

Wording

c136130

leofang approved these changes Jan 9, 2025

View reviewed changes

leofang merged commit 61ef224 into NVIDIA:main Jan 9, 2025
47 checks passed

leofang mentioned this pull request Jan 9, 2025

Test against CUDA wheels #368

Merged

leofang reviewed Jan 10, 2025

View reviewed changes

cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable discovery and loading at run time of NVRTC and nvJitLink libraries in a wheels ecosystem #363

Enable discovery and loading at run time of NVRTC and nvJitLink libraries in a wheels ecosystem #363

vzhurba01 commented Jan 8, 2025

copy-pr-bot bot commented Jan 8, 2025

vzhurba01 commented Jan 8, 2025

vzhurba01 commented Jan 8, 2025

leofang left a comment •

edited

Loading

leofang commented Jan 8, 2025

leofang commented Jan 8, 2025

leofang commented Jan 8, 2025

vzhurba01 commented Jan 8, 2025

vzhurba01 commented Jan 8, 2025

vzhurba01 commented Jan 8, 2025 •

edited

Loading

vzhurba01 commented Jan 8, 2025

leofang commented Jan 8, 2025

bdice commented Jan 8, 2025 •

edited

Loading

vzhurba01 commented Jan 9, 2025

leofang commented Jan 9, 2025

vzhurba01 commented Jan 9, 2025

leofang left a comment

github-actions bot commented Jan 9, 2025

leofang commented Jan 9, 2025

leofang commented Jan 9, 2025

leofang commented Jan 10, 2025

Enable discovery and loading at run time of NVRTC and nvJitLink libraries in a wheels ecosystem #363

Enable discovery and loading at run time of NVRTC and nvJitLink libraries in a wheels ecosystem #363

Conversation

vzhurba01 commented Jan 8, 2025

copy-pr-bot bot commented Jan 8, 2025

vzhurba01 commented Jan 8, 2025

vzhurba01 commented Jan 8, 2025

leofang left a comment • edited Loading

Choose a reason for hiding this comment

leofang commented Jan 8, 2025

leofang commented Jan 8, 2025

leofang commented Jan 8, 2025

vzhurba01 commented Jan 8, 2025

vzhurba01 commented Jan 8, 2025

vzhurba01 commented Jan 8, 2025 • edited Loading

vzhurba01 commented Jan 8, 2025

leofang commented Jan 8, 2025

bdice commented Jan 8, 2025 • edited Loading

vzhurba01 commented Jan 9, 2025

leofang commented Jan 9, 2025

vzhurba01 commented Jan 9, 2025

leofang left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 9, 2025

leofang commented Jan 9, 2025

leofang commented Jan 9, 2025

leofang commented Jan 10, 2025

leofang left a comment •

edited

Loading

vzhurba01 commented Jan 8, 2025 •

edited

Loading

bdice commented Jan 8, 2025 •

edited

Loading