Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Aug 21, 2025

This PR implements a performance optimization for DeviceMemoryResource by configuring the memory pool release threshold, addressing slow allocation performance when there are no active allocations.

Problem

As reported in issue #756, allocation via DeviceMemoryResource can be very slow when there are no active allocations. This occurs because the default memory pool is configured with a release threshold of 0, meaning memory is immediately released back to the OS when there are no active suballocations. Subsequent allocations then require expensive OS memory requests.

Solution

This PR upstreams the workaround from CCCL to cuda-core by modifying DeviceMemoryResource.__init__() to:

  1. Check the current release threshold using cuMemPoolGetAttribute
  2. If the threshold is 0 (default), set it to maximum (0xFFFFFFFFFFFFFFFF) using cuMemPoolSetAttribute
  3. This retains memory in the pool for better allocation performance in steady-state scenarios

Implementation Details

  • Core change: 22 lines added to DeviceMemoryResource.__init__() in cuda_core/cuda/core/experimental/_memory.py
  • Error handling: Wrapped in try-except for backward compatibility with older CUDA drivers
  • Documentation: Added comprehensive comments explaining the optimization
  • Testing: Added test case to verify proper initialization
  • Example: Created performance demonstration script

The implementation follows the exact same pattern as the CCCL workaround, using identical CUDA driver APIs and constants.

Example Usage

from cuda.core.experimental import Device, DeviceMemoryResource

device = Device()
mr = DeviceMemoryResource(device.device_id)  # Now automatically optimized

# Subsequent allocations will be faster due to memory retention
buffer = mr.allocate(1024)
buffer.close()

The optimization is transparent to users and maintains full backward compatibility.

Fixes #771.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Aug 21, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Copilot Copilot AI changed the title [WIP] Use an increased release threshold for the default async pool when using DeviceMemoryResource Implement release threshold configuration for DeviceMemoryResource performance optimization Aug 21, 2025
Copilot finished work on behalf of leofang August 21, 2025 02:36
@Copilot Copilot AI requested a review from leofang August 21, 2025 02:36
Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-project-automation github-project-automation bot moved this from Todo to In Progress in CCCL Aug 21, 2025
Copilot finished work on behalf of leofang August 21, 2025 02:50
@Copilot Copilot AI requested a review from leofang August 21, 2025 02:50
@leofang
Copy link
Member

leofang commented Aug 21, 2025

@copilot please add a release note entry noting the performance optimization in device memory management. No need to be too explicit about what's the optimization we did in this PR.

@Copilot
Copy link
Contributor Author

Copilot AI commented Aug 21, 2025

@copilot please add a release note entry noting the performance optimization in device memory management. No need to be too explicit about what's the optimization we did in this PR.

Added release note entry for the DeviceMemoryResource performance optimization to the 0.X.Y release notes. (6a2c4f4)

Copilot finished work on behalf of leofang August 21, 2025 02:57
@leofang
Copy link
Member

leofang commented Aug 21, 2025

@copilot please apply the labels from the issue we're fixing to this PR, thanks

@Copilot
Copy link
Contributor Author

Copilot AI commented Aug 21, 2025

@copilot please apply the labels from the issue we're fixing to this PR, thanks

I can see that issue #771 has the following labels: enhancement, P0, and cuda.core. However, I don't have access to GitHub API functions that allow me to modify PR properties like adding labels. Could you please apply these labels to the PR, or let me know if there's another way I can help with this?

Copilot finished work on behalf of leofang August 21, 2025 03:08
@leofang leofang added enhancement Any code-related improvements P0 High priority - Must do! labels Aug 21, 2025
@leofang leofang added the cuda.core Everything related to the cuda.core module label Aug 21, 2025
@leofang leofang added this to the cuda.core beta 7 milestone Aug 21, 2025
@leofang leofang requested a review from shwina August 21, 2025 03:10
@leofang leofang marked this pull request as ready for review August 21, 2025 03:10
@leofang
Copy link
Member

leofang commented Aug 21, 2025

/ok to test 6a2c4f4

@github-actions

This comment has been minimized.

@Copilot Copilot AI requested a review from leofang August 21, 2025 03:48
Copilot finished work on behalf of leofang August 21, 2025 03:48
@Copilot Copilot AI requested a review from kkraus14 August 21, 2025 04:30
Copilot finished work on behalf of kkraus14 August 21, 2025 04:30
@kkraus14
Copy link
Collaborator

/ok to test 2a8b536

kkraus14
kkraus14 previously approved these changes Aug 21, 2025
Comment on lines +269 to +270
if not device.properties.memory_pools_supported:
pytest.skip("memory pools not supported")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a (happy) coincidence or the copilot actually understands that this is legit code (accessing device properties without calling .set_current() is OK)? Too good to be true!!!

@Copilot Copilot AI requested a review from leofang August 21, 2025 13:14
Copilot finished work on behalf of leofang August 21, 2025 13:14
Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Keith already approved, the CI was green, and the last commit was doc-only, let me admin-merge this. Thanks Keith & @copilot!

@leofang leofang merged commit 85da64b into main Aug 21, 2025
1 check passed
@leofang leofang deleted the copilot/fix-771 branch August 21, 2025 13:16
@github-project-automation github-project-automation bot moved this from In Progress to Done in CCCL Aug 21, 2025
@github-actions
Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

Copilot AI added a commit that referenced this pull request Aug 21, 2025
…rformance optimization (#875)

* Initial plan

* Implement release threshold configuration for DeviceMemoryResource performance optimization

Co-authored-by: leofang <[email protected]>

* Add performance demo for DeviceMemoryResource release threshold optimization

Co-authored-by: leofang <[email protected]>

* Fix linting issues and format code with ruff

Co-authored-by: leofang <[email protected]>

* Remove try-except wrapper and performance demo per code review feedback

Co-authored-by: leofang <[email protected]>

* Add release note for DeviceMemoryResource performance optimization

Co-authored-by: leofang <[email protected]>

* Add skip decorator for mempool support check in device memory test

Co-authored-by: leofang <[email protected]>

* Address code review feedback: move skip logic, add docstring note, update release note

Co-authored-by: kkraus14 <[email protected]>

* Remove verbose docstring Notes section per code review feedback

Co-authored-by: leofang <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: leofang <[email protected]>
Co-authored-by: kkraus14 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do!

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Use an increased release threshold for the default async pool when using DeviceMemoryResource

3 participants