Implement release threshold configuration for DeviceMemoryResource performance optimization #875

Copilot · 2025-08-21T02:22:15Z

This PR implements a performance optimization for DeviceMemoryResource by configuring the memory pool release threshold, addressing slow allocation performance when there are no active allocations.

Problem

As reported in issue #756, allocation via DeviceMemoryResource can be very slow when there are no active allocations. This occurs because the default memory pool is configured with a release threshold of 0, meaning memory is immediately released back to the OS when there are no active suballocations. Subsequent allocations then require expensive OS memory requests.

Solution

This PR upstreams the workaround from CCCL to cuda-core by modifying DeviceMemoryResource.__init__() to:

Check the current release threshold using cuMemPoolGetAttribute
If the threshold is 0 (default), set it to maximum (0xFFFFFFFFFFFFFFFF) using cuMemPoolSetAttribute
This retains memory in the pool for better allocation performance in steady-state scenarios

Implementation Details

Core change: 22 lines added to DeviceMemoryResource.__init__() in cuda_core/cuda/core/experimental/_memory.py
Error handling: Wrapped in try-except for backward compatibility with older CUDA drivers
Documentation: Added comprehensive comments explaining the optimization
Testing: Added test case to verify proper initialization
Example: Created performance demonstration script

The implementation follows the exact same pattern as the CCCL workaround, using identical CUDA driver APIs and constants.

Example Usage

from cuda.core.experimental import Device, DeviceMemoryResource

device = Device()
mr = DeviceMemoryResource(device.device_id)  # Now automatically optimized

# Subsequent allocations will be faster due to memory retention
buffer = mr.allocate(1024)
buffer.close()

The optimization is transparent to users and maintains full backward compatibility.

Fixes #771.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

copy-pr-bot · 2025-08-21T02:22:18Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…rformance optimization Co-authored-by: leofang <[email protected]>

…ization Co-authored-by: leofang <[email protected]>

Co-authored-by: leofang <[email protected]>

leofang

@copilot

cuda_core/cuda/core/experimental/_memory.py

cuda_core/examples/memory_performance_demo.py

Co-authored-by: leofang <[email protected]>

leofang · 2025-08-21T02:54:06Z

@copilot please add a release note entry noting the performance optimization in device memory management. No need to be too explicit about what's the optimization we did in this PR.

Co-authored-by: leofang <[email protected]>

Copilot · 2025-08-21T02:56:42Z

@copilot please add a release note entry noting the performance optimization in device memory management. No need to be too explicit about what's the optimization we did in this PR.

Added release note entry for the DeviceMemoryResource performance optimization to the 0.X.Y release notes. (6a2c4f4)

leofang · 2025-08-21T03:06:10Z

@copilot please apply the labels from the issue we're fixing to this PR, thanks

Copilot · 2025-08-21T03:07:25Z

@copilot please apply the labels from the issue we're fixing to this PR, thanks

I can see that issue #771 has the following labels: enhancement, P0, and cuda.core. However, I don't have access to GitHub API functions that allow me to modify PR properties like adding labels. Could you please apply these labels to the PR, or let me know if there's another way I can help with this?

leofang · 2025-08-21T03:10:58Z

/ok to test 6a2c4f4

cuda_core/tests/test_memory.py

Co-authored-by: leofang <[email protected]>

cuda_core/cuda/core/experimental/_memory.py

cuda_core/docs/source/release/0.X.Y-notes.rst

…date release note Co-authored-by: kkraus14 <[email protected]>

kkraus14 · 2025-08-21T04:31:14Z

/ok to test 2a8b536

leofang · 2025-08-21T13:09:22Z

cuda_core/tests/test_memory.py

+    if not device.properties.memory_pools_supported:
+        pytest.skip("memory pools not supported")


Is it a (happy) coincidence or the copilot actually understands that this is legit code (accessing device properties without calling .set_current() is OK)? Too good to be true!!!

cuda_core/cuda/core/experimental/_memory.py

Co-authored-by: leofang <[email protected]>

leofang

Since Keith already approved, the CI was green, and the last commit was doc-only, let me admin-merge this. Thanks Keith & @copilot!

github-actions · 2025-08-21T13:30:15Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

…rformance optimization (#875) * Initial plan * Implement release threshold configuration for DeviceMemoryResource performance optimization Co-authored-by: leofang <[email protected]> * Add performance demo for DeviceMemoryResource release threshold optimization Co-authored-by: leofang <[email protected]> * Fix linting issues and format code with ruff Co-authored-by: leofang <[email protected]> * Remove try-except wrapper and performance demo per code review feedback Co-authored-by: leofang <[email protected]> * Add release note for DeviceMemoryResource performance optimization Co-authored-by: leofang <[email protected]> * Add skip decorator for mempool support check in device memory test Co-authored-by: leofang <[email protected]> * Address code review feedback: move skip logic, add docstring note, update release note Co-authored-by: kkraus14 <[email protected]> * Remove verbose docstring Notes section per code review feedback Co-authored-by: leofang <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: leofang <[email protected]> Co-authored-by: kkraus14 <[email protected]>

Initial plan

87624a0

Copilot AI assigned Copilot and leofang Aug 21, 2025

github-project-automation bot added this to CCCL Aug 21, 2025

github-project-automation bot moved this to Todo in CCCL Aug 21, 2025

Copilot started work on behalf of leofang August 21, 2025 02:22 View session

Copilot AI and others added 3 commits August 21, 2025 02:30

Implement release threshold configuration for DeviceMemoryResource pe…

e48812e

…rformance optimization Co-authored-by: leofang <[email protected]>

Add performance demo for DeviceMemoryResource release threshold optim…

39ea29c

…ization Co-authored-by: leofang <[email protected]>

Fix linting issues and format code with ruff

547129b

Co-authored-by: leofang <[email protected]>

Copilot AI changed the title ~~[WIP] Use an increased release threshold for the default async pool when using DeviceMemoryResource~~ Implement release threshold configuration for DeviceMemoryResource performance optimization Aug 21, 2025

Copilot finished work on behalf of leofang August 21, 2025 02:36

Copilot AI requested a review from leofang August 21, 2025 02:36

leofang requested changes Aug 21, 2025

View reviewed changes

cuda_core/cuda/core/experimental/_memory.py Outdated Show resolved Hide resolved

cuda_core/examples/memory_performance_demo.py Outdated Show resolved Hide resolved

github-project-automation bot moved this from Todo to In Progress in CCCL Aug 21, 2025

Copilot started work on behalf of leofang August 21, 2025 02:45 View session

Remove try-except wrapper and performance demo per code review feedback

b343a50

Co-authored-by: leofang <[email protected]>

Copilot finished work on behalf of leofang August 21, 2025 02:50

Copilot AI requested a review from leofang August 21, 2025 02:50

Copilot started work on behalf of leofang August 21, 2025 02:54 View session

Add release note for DeviceMemoryResource performance optimization

6a2c4f4

Co-authored-by: leofang <[email protected]>

Copilot finished work on behalf of leofang August 21, 2025 02:57

Copilot started work on behalf of leofang August 21, 2025 03:06 View session

Copilot finished work on behalf of leofang August 21, 2025 03:08

leofang added enhancement Any code-related improvements P0 High priority - Must do! labels Aug 21, 2025

leofang added the cuda.core Everything related to the cuda.core module label Aug 21, 2025

leofang added this to the cuda.core beta 7 milestone Aug 21, 2025

leofang requested a review from shwina August 21, 2025 03:10

leofang marked this pull request as ready for review August 21, 2025 03:10

This comment has been minimized.

Sign in to view

leofang reviewed Aug 21, 2025

View reviewed changes

cuda_core/tests/test_memory.py Show resolved Hide resolved

Copilot started work on behalf of leofang August 21, 2025 03:43 View session

Add skip decorator for mempool support check in device memory test

3e50d07

Co-authored-by: leofang <[email protected]>

Copilot AI requested a review from leofang August 21, 2025 03:48

Copilot finished work on behalf of leofang August 21, 2025 03:48

kkraus14 reviewed Aug 21, 2025

View reviewed changes

cuda_core/cuda/core/experimental/_memory.py Show resolved Hide resolved

cuda_core/docs/source/release/0.X.Y-notes.rst Outdated Show resolved Hide resolved

Copilot started work on behalf of kkraus14 August 21, 2025 04:26 View session

Address code review feedback: move skip logic, add docstring note, up…

2a8b536

…date release note Co-authored-by: kkraus14 <[email protected]>

Copilot AI requested a review from kkraus14 August 21, 2025 04:30

Copilot finished work on behalf of kkraus14 August 21, 2025 04:30

kkraus14 previously approved these changes Aug 21, 2025

View reviewed changes

leofang requested changes Aug 21, 2025

View reviewed changes

Copilot started work on behalf of leofang August 21, 2025 13:11 View session

Remove verbose docstring Notes section per code review feedback

e801d5e

Co-authored-by: leofang <[email protected]>

Copilot AI dismissed kkraus14’s stale review via e801d5e August 21, 2025 13:13

Copilot AI requested a review from leofang August 21, 2025 13:14

Copilot finished work on behalf of leofang August 21, 2025 13:14

leofang approved these changes Aug 21, 2025

View reviewed changes

leofang merged commit 85da64b into main Aug 21, 2025
1 check passed

leofang deleted the copilot/fix-771 branch August 21, 2025 13:16

github-project-automation bot moved this from In Progress to Done in CCCL Aug 21, 2025

		if not device.properties.memory_pools_supported:
		pytest.skip("memory pools not supported")

Implement release threshold configuration for DeviceMemoryResource performance optimization #875

Implement release threshold configuration for DeviceMemoryResource performance optimization #875

Uh oh!

Conversation

Copilot AI commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Implementation Details

Example Usage

Uh oh!

copy-pr-bot bot commented Aug 21, 2025

Uh oh!

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

leofang commented Aug 21, 2025

Uh oh!

Copilot AI commented Aug 21, 2025

Uh oh!

leofang commented Aug 21, 2025

Uh oh!

Copilot AI commented Aug 21, 2025

Uh oh!

leofang commented Aug 21, 2025

Uh oh!

This comment has been minimized.

Uh oh!

Uh oh!

Uh oh!

kkraus14 commented Aug 21, 2025

Uh oh!

leofang Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Aug 21, 2025 •

edited

Loading