Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable discovery and loading at run time of NVRTC and nvJitLink libraries in a wheels ecosystem #363

Merged
merged 6 commits into from
Jan 9, 2025

Conversation

vzhurba01
Copy link
Collaborator

Support discovery of NVRTC and nvJitLink libraries at run time

CTK installations distribute their libraries using personal packages:

  • nvidia-nvjitlink-cuXX
  • nvidia-cuda-nvrtc-cuXX

The relative path of their libraries to cuda-bindings is consistent,
and allows us to use relative paths to discover them when loading
at run time.

close #286
close #287

CTK installations distribute their libraries using personal packages:
- nvidia-nvjitlink-cuXX
- nvidia-cuda-nvrtc-cuXX

The relative path of their libraries to cuda-bindings is consistent,
and allows us to use relative paths to discover them when loading
at run time.
@vzhurba01 vzhurba01 self-assigned this Jan 8, 2025
Copy link
Contributor

copy-pr-bot bot commented Jan 8, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@vzhurba01 vzhurba01 added to-be-backported Trigger the bot to raise a backport PR upon merge P0 High priority - Must do! cuda.bindings Everything related to the cuda.bindings module enhancement Any code-related improvements packaging Anything related to wheels or Conda packages labels Jan 8, 2025
@vzhurba01
Copy link
Collaborator Author

/ok to test

@vzhurba01 vzhurba01 added this to the cuda-python 12-next, 11-next milestone Jan 8, 2025
@vzhurba01
Copy link
Collaborator Author

/ok to test

Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Vlad! I left two comments below.

I understand this is the 2nd step as discussed offline. For the 1st step, I think it'd be nice to preserve what pip install cuda-python behaves today (not depending on CUDA wheels). So let's add an all optional dependency to facilitate the usage

pip install cuda-python[all]

This can be achieved by adding this section to pyproject.toml (example):

[project.optional-dependencies]
all = [
    "nvidia-cuda-nvrtc-cu12",
    "nvidia-nvjitlink-cu12>=12.3.*",
]

We should also document this new installation option.

cuda_bindings/setup.py Outdated Show resolved Hide resolved
cuda_bindings/setup.py Outdated Show resolved Hide resolved
@leofang
Copy link
Member

leofang commented Jan 8, 2025

btw I'll test this manually before merging. I'll also add a new test workflow to test this new use case (meaning we don't install the mini CTK in the test job), in a separate PR. (Let me know if you are interested in work stealing 😉)

@leofang
Copy link
Member

leofang commented Jan 8, 2025

Another thing: Since we do not have any Windows GPU runners in the CI, even if I add a new test job targeting this capability there's still one bug that we wouldn't be able to catch, which is that we need a bit more logic to discover the NVRTC DLL location. We've done this for nvJitLink and the same logic was also applied in nvmath-python:

# Next, check if DLLs are installed via pip
for sp in get_site_packages():
mod_path = os.path.join(sp, "nvidia", "nvJitLink", "bin")
if not os.path.isdir(mod_path):
continue
os.add_dll_directory(mod_path)
try:
handle = win32api.LoadLibraryEx(
# Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path...
os.path.join(mod_path, dll_name),
0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR)
except:
pass
else:
break

so @vzhurba01 this will require some manual testing on Windows for the time being...

@leofang
Copy link
Member

leofang commented Jan 8, 2025

cc @kmaehashi @gmarkall for vis

@vzhurba01
Copy link
Collaborator Author

Thanks, Vlad! I left two comments below.

I understand this is the 2nd step as discussed offline. For the 1st step, I think it'd be nice to preserve what pip install cuda-python behaves today (not depending on CUDA wheels). So let's add an all optional dependency to facilitate the usage

pip install cuda-python[all]

This can be achieved by adding this section to pyproject.toml (example):

[project.optional-dependencies]
all = [
    "nvidia-cuda-nvrtc-cu12",
    "nvidia-nvjitlink-cu12>=12.3.*",
]

We should also document this new installation option.

Done.

Also I've split the new documentation between this PR and a new draft PR #366. That draft would only make sense after the new wheels are posted.

@vzhurba01
Copy link
Collaborator Author

Another thing: Since we do not have any Windows GPU runners in the CI, even if I add a new test job targeting this capability there's still one bug that we wouldn't be able to catch, which is that we need a bit more logic to discover the NVRTC DLL location. We've done this for nvJitLink and the same logic was also applied in nvmath-python:

# Next, check if DLLs are installed via pip
for sp in get_site_packages():
mod_path = os.path.join(sp, "nvidia", "nvJitLink", "bin")
if not os.path.isdir(mod_path):
continue
os.add_dll_directory(mod_path)
try:
handle = win32api.LoadLibraryEx(
# Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path...
os.path.join(mod_path, dll_name),
0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR)
except:
pass
else:
break

so @vzhurba01 this will require some manual testing on Windows for the time being...

Done.

Note that I found some awkwardness during testing and added a comment with commit: 0e51e7d#diff-368f861093deb2399fd7b59421bef21cace83e3092956aafe62b06fbb4e9f235R80-R83
The consequence of not handling this is tests start throwing:

nvrtc: error: failed to open nvrtc-builtins64_126.dll.
  Make sure that nvrtc-builtins64_126.dll is installed correctly.

because NVRTC APIs return error code NVRTC_ERROR_BUILTIN_OPERATION_FAILURE.

@vzhurba01
Copy link
Collaborator Author

vzhurba01 commented Jan 8, 2025

btw I'll test this manually before merging. I'll also add a new test workflow to test this new use case (meaning we don't install the mini CTK in the test job), in a separate PR. (Let me know if you are interested in work stealing 😉)

No work stealing just yet 😅 I'd be happy to grab it after I first work through my P0s and P1s ✊

@vzhurba01
Copy link
Collaborator Author

/ok to test

@leofang
Copy link
Member

leofang commented Jan 8, 2025

The consequence of not handling this is tests start throwing:

nvrtc: error: failed to open nvrtc-builtins64_126.dll.
  Make sure that nvrtc-builtins64_126.dll is installed correctly.

because NVRTC APIs return error code NVRTC_ERROR_BUILTIN_OPERATION_FAILURE.

Ahhhh yes sorry I forgot about this and wasted your time 😓 Yes, on Windows it's annoying that DLL loading is weird, unlike on Linux where $ORIGIN could help find the additional shared libraries. What's even worse is that add_dll_directory doesn't seem to affect this search behavior...

Here's the treatment I added to nvmath-python:
https://github.com/NVIDIA/nvmath-python/blob/073b168ac0688fa3b84caaa8bb65948bf7db7eae/nvmath/_utils.py#L113-L140
basically it was a hack in that I force pre-loading the nvrtc-builtin DLL so that by the time it's needed by NVRTC it is already immediately available. It seems fine to me to just update PATH as you did, unless there's some additional caveats that I miss.

@bdice
Copy link

bdice commented Jan 8, 2025

Is this change targeting only CUDA 12 releases? (nvJitLink is CUDA 12 only, but it might be possible for NVRTC wheels to be used with CUDA 11.)

Also, I am 100% happy with a hard (non-optional) dependency on CUDA wheels. We have moved to having a hard dependency on CUDA wheels in RAPIDS. It helps constrain the space of possible installation types and ensures the necessary libraries are available somehow.

@vzhurba01
Copy link
Collaborator Author

basically it was a hack in that I force pre-loading the nvrtc-builtin DLL so that by the time it's needed by NVRTC it is already immediately available. It seems fine to me to just update PATH as you did, unless there's some additional caveats that I miss.

Normally I'd be against messing with the PATH, but because this is done only after we confirm that nvrtc64_120_0.dll both exists and loadable then this seems ok.

Is this change targeting only CUDA 12 releases?

My plan was to propagate this change to CUDA 11 as well. This would require a new 11.8.x patch release to make use of it, but I haven't put thought into what else should be bundled and when.

@leofang
Copy link
Member

leofang commented Jan 9, 2025

Is this change targeting only CUDA 12 releases?

My plan was to propagate this change to CUDA 11 as well. This would require a new 11.8.x patch release to make use of it, but I haven't put thought into what else should be bundled and when.

I added the to-be-backported label, so ideally the bot would raise a PR for us once this one is merged. However, the cherry picking could fail in which case we'll need to do it manually.

@vzhurba01
Copy link
Collaborator Author

Note that the backport will need cleanup by changing the cu12 to cu11

Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@leofang leofang merged commit 61ef224 into NVIDIA:main Jan 9, 2025
47 checks passed
Copy link

github-actions bot commented Jan 9, 2025

Backport failed because this pull request contains merge commits. You can either backport this pull request manually, or configure the action to skip merge commits.

@leofang
Copy link
Member

leofang commented Jan 9, 2025

Looks like the bot is not happy with this PR 🤷 @vzhurba01 would you mind backporting the NVRTC bits manually (so as to avoid asymmetry between 11/12)?

@leofang leofang mentioned this pull request Jan 9, 2025
@leofang
Copy link
Member

leofang commented Jan 9, 2025

btw I'll test this manually before merging. I'll also add a new test workflow to test this new use case (meaning we don't install the mini CTK in the test job), in a separate PR. (Let me know if you are interested in work stealing 😉)

No work stealing just yet 😅 I'd be happy to grab it after I first work through my P0s and P1s ✊

Tracked in #367 and ongoing in #368.

@leofang
Copy link
Member

leofang commented Jan 10, 2025

backporting the NVRTC bits manually

#369

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.bindings Everything related to the cuda.bindings module enhancement Any code-related improvements P0 High priority - Must do! packaging Anything related to wheels or Conda packages to-be-backported Trigger the bot to raise a backport PR upon merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support nvJitLink wheels Support NVRTC wheels
3 participants