-
Notifications
You must be signed in to change notification settings - Fork 217
Implement release threshold configuration for DeviceMemoryResource performance optimization #875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…rformance optimization Co-authored-by: leofang <[email protected]>
…ization Co-authored-by: leofang <[email protected]>
Co-authored-by: leofang <[email protected]>
DeviceMemoryResource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: leofang <[email protected]>
@copilot please add a release note entry noting the performance optimization in device memory management. No need to be too explicit about what's the optimization we did in this PR. |
Co-authored-by: leofang <[email protected]>
@copilot please apply the labels from the issue we're fixing to this PR, thanks |
I can see that issue #771 has the following labels: |
/ok to test 6a2c4f4 |
This comment has been minimized.
This comment has been minimized.
Co-authored-by: leofang <[email protected]>
…date release note Co-authored-by: kkraus14 <[email protected]>
/ok to test 2a8b536 |
if not device.properties.memory_pools_supported: | ||
pytest.skip("memory pools not supported") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a (happy) coincidence or the copilot actually understands that this is legit code (accessing device properties without calling .set_current()
is OK)? Too good to be true!!!
Co-authored-by: leofang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Keith already approved, the CI was green, and the last commit was doc-only, let me admin-merge this. Thanks Keith & @copilot!
|
…rformance optimization (#875) * Initial plan * Implement release threshold configuration for DeviceMemoryResource performance optimization Co-authored-by: leofang <[email protected]> * Add performance demo for DeviceMemoryResource release threshold optimization Co-authored-by: leofang <[email protected]> * Fix linting issues and format code with ruff Co-authored-by: leofang <[email protected]> * Remove try-except wrapper and performance demo per code review feedback Co-authored-by: leofang <[email protected]> * Add release note for DeviceMemoryResource performance optimization Co-authored-by: leofang <[email protected]> * Add skip decorator for mempool support check in device memory test Co-authored-by: leofang <[email protected]> * Address code review feedback: move skip logic, add docstring note, update release note Co-authored-by: kkraus14 <[email protected]> * Remove verbose docstring Notes section per code review feedback Co-authored-by: leofang <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: leofang <[email protected]> Co-authored-by: kkraus14 <[email protected]>
This PR implements a performance optimization for
DeviceMemoryResource
by configuring the memory pool release threshold, addressing slow allocation performance when there are no active allocations.Problem
As reported in issue #756, allocation via
DeviceMemoryResource
can be very slow when there are no active allocations. This occurs because the default memory pool is configured with a release threshold of0
, meaning memory is immediately released back to the OS when there are no active suballocations. Subsequent allocations then require expensive OS memory requests.Solution
This PR upstreams the workaround from CCCL to cuda-core by modifying
DeviceMemoryResource.__init__()
to:cuMemPoolGetAttribute
0xFFFFFFFFFFFFFFFF
) usingcuMemPoolSetAttribute
Implementation Details
DeviceMemoryResource.__init__()
incuda_core/cuda/core/experimental/_memory.py
The implementation follows the exact same pattern as the CCCL workaround, using identical CUDA driver APIs and constants.
Example Usage
The optimization is transparent to users and maintains full backward compatibility.
Fixes #771.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.