Handle `cuda.core.Stream` in driver operations #401

brandon-b-miller · 2025-08-18T12:21:42Z

Closes #151

copy-pr-bot · 2025-08-18T12:21:45Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

numba_cuda/numba/cuda/cudadrv/driver.py

brandon-b-miller · 2025-08-18T12:27:09Z

/ok to test

brandon-b-miller · 2025-08-20T14:42:06Z

/ok to test

brandon-b-miller · 2025-08-20T15:11:15Z

/ok to test

brandon-b-miller · 2025-08-20T15:58:31Z

/ok to test

numba_cuda/numba/cuda/cudadrv/driver.py

isVoid

I see this PR closes #151, per issue suggests that we can pass a cuda core stream object via kernel launch interface, but this PR is missing a test for this use case.

Co-authored-by: Keith Kraus <[email protected]>

brandon-b-miller · 2025-08-25T15:57:17Z

/ok to test

numba_cuda/numba/cuda/cudadrv/driver.py

leofang · 2025-08-26T19:40:01Z

numba_cuda/numba/cuda/cudadrv/driver.py

+    acceptable stream objects. Acceptable types are
+    int (0 for default stream), Stream, ExperimentalStream


Is the docstring outdated? int is currently not allowed

Only for the special value 0 I believe.

Should we consider deprecating allowing passing 0 as a Stream? The "default stream" is ambiguous in Python since PTDS is normally a host compile-time concept. We have an environment variable for controlling it in cuda.bindings / cuda.core: CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM which I think should be generally used.

It would be great if we could introduce a deprecation warning in some form to passing 0 as a Stream in user facing APIs.

From the user perspective we're deprecating the apis fully in #546, so those should be gone entirely. But we should do a sweep and make sure we're being explicit with all our usages of streams internally.

Outside of the DeviceNDArray class, I think streams are accepted when launching kernels and using the Event APIs as well where we should properly handle there as well?

launching is tested as part of this PR, events added in 7df62ce though.

leofang · 2025-08-26T19:44:21Z

numba_cuda/numba/cuda/cudadrv/driver.py

+    """
+    Memset on the device.
+    If stream is 0, the call is synchronous.
+    If stream is a Stream object, asynchronous mode is used.


There is a bug (or change or behavior) here and elsewhere. stream can be a Stream object from either numba-cuda or cuda.core, but still holds 0 (the default stream) under the hood. However, the call now becomes asynchronous (with respect to the host) instead of synchronous. Just wanted to call it out in case it was not the intention.

This is a really good catch. As a follow up to this, is the output here as expected, where dev is a cuda.core.experimental.Device for whom set_current() has been called? Should it not be (0, 0)?

>>> dev.default_stream.__cuda_stream__() (0, 1)

I ask hoping there's a reliable way of detecting this situation based on the passed object.

After a while searching around the codebase I concluded this was at least the original intention, though these are really only used for the deprecated device array API:

If a CUDA ``stream`` is given, then the transfer will be made asynchronously as part as the given stream. Otherwise, the transfer is synchronous: the function returns after the copy is finished.

So AFAICT this PR maintains the above behavior just with a new stream object. Ultimately though I'm not sure we should spend too much time thinking about it as these will be removed and users performing these types of memory transfers should use either cupy for a nice array API or cuda.bindings for full control of things like synchronization behavior.

leofang · 2025-08-26T19:45:56Z

numba_cuda/numba/cuda/cudadrv/driver.py

+    fn(*args)


 def device_to_host(dst, src, size, stream=0):


As mentioned below (or above), stream semantics is changed which probably has a bigger impact to this method, because the copy is now asynchronous and to access src on host a stream synchronization is needed.

numba_cuda/numba/cuda/cudadrv/driver.py

numba_cuda/numba/cuda/tests/cudadrv/test_cuda_driver.py

brandon-b-miller · 2025-10-13T10:33:03Z

/ok to test

brandon-b-miller · 2025-10-13T13:27:28Z

/ok to test

brandon-b-miller · 2025-10-14T21:47:50Z

/ok to test

brandon-b-miller · 2025-10-15T17:40:41Z

/ok to test

brandon-b-miller · 2025-10-15T20:25:57Z

/ok to test

brandon-b-miller · 2025-10-15T21:30:32Z

/ok to test

brandon-b-miller · 2025-10-24T12:47:27Z

/ok to test

numba_cuda/numba/cuda/cudadrv/driver.py

brandon-b-miller · 2025-10-27T17:05:36Z

/ok to test

brandon-b-miller · 2025-10-27T20:24:43Z

/ok to test

brandon-b-miller · 2025-10-27T21:04:07Z

/ok to test

brandon-b-miller added 5 commits August 15, 2025 13:52

initial

a0f25af

tests

5322eef

refactor

251f4e9

small changes

505cd4d

__cuda_stream__

b861723

brandon-b-miller commented Aug 18, 2025

View reviewed changes

numba_cuda/numba/cuda/cudadrv/driver.py Outdated Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

brandon-b-miller added 3 commits August 20, 2025 06:02

Merge branch 'main' into cuda-core-streams

b53f9ca

accomodate ctypes bindings

2181748

clean

46863d3

more pacifying ctypes bindings

2082063

fix

ec5841c

kkraus14 reviewed Aug 22, 2025

View reviewed changes

numba_cuda/numba/cuda/cudadrv/driver.py Outdated Show resolved Hide resolved

numba_cuda/numba/cuda/cudadrv/driver.py Outdated Show resolved Hide resolved

isVoid reviewed Aug 22, 2025

View reviewed changes

numba_cuda/numba/cuda/cudadrv/driver.py Show resolved Hide resolved

isVoid reviewed Aug 22, 2025

View reviewed changes

brandon-b-miller and others added 4 commits August 25, 2025 07:22

Merge branch 'main' into cuda-core-streams

2e45f6d

renaming

4fcf9d1

address reviews

220c2e3

Update numba_cuda/numba/cuda/cudadrv/driver.py

f3b07c0

Co-authored-by: Keith Kraus <[email protected]>

leofang reviewed Aug 26, 2025

View reviewed changes

isVoid reviewed Aug 29, 2025

View reviewed changes

numba_cuda/numba/cuda/tests/cudadrv/test_cuda_driver.py Outdated Show resolved Hide resolved

rparolin mentioned this pull request Oct 1, 2025

[FEA] Make cuda.core.Stream recognized by numba-cuda by supporting the __cuda_stream__ protocol #151

Closed

brandon-b-miller added 2 commits October 7, 2025 07:27

merge/resolve

387ba84

address some reviews

20440ab

Merge branch 'main' into cuda-core-streams

1a00d67

brandon-b-miller added 2 commits October 13, 2025 04:51

fix ctypes tests

f0ff9d5

addressing old comments

1b59b5c

brandon-b-miller requested review from isVoid and leofang October 13, 2025 12:07

brandon-b-miller added the 3 - Ready for Review Ready for review by team label Oct 13, 2025

merge/resolve

6f8ddb3

merge/resolve

9ab36e7

small fix

d1ad577

small fix

b7b56eb

Merge branch 'main' into cuda-core-streams

c3e10af

kkraus14 reviewed Oct 27, 2025

View reviewed changes

numba_cuda/numba/cuda/cudadrv/driver.py Outdated Show resolved Hide resolved

brandon-b-miller added 2 commits October 27, 2025 10:03

Merge branch 'main' into cuda-core-streams

9b301a8

USE_NV_BINDING

324a48a

kkraus14 approved these changes Oct 27, 2025

View reviewed changes

events

7df62ce

kkraus14 approved these changes Oct 27, 2025

View reviewed changes

skip event tests on sim

f859466

brandon-b-miller merged commit 39066c7 into NVIDIA:main Oct 27, 2025
70 checks passed

brandon-b-miller deleted the cuda-core-streams branch October 27, 2025 22:01

		acceptable stream objects. Acceptable types are
		int (0 for default stream), Stream, ExperimentalStream

Uh oh!

Handle cuda.core.Stream in driver operations #401

Handle cuda.core.Stream in driver operations #401

Uh oh!

Conversation

brandon-b-miller commented Aug 18, 2025

Uh oh!

copy-pr-bot bot commented Aug 18, 2025

Uh oh!

Uh oh!

brandon-b-miller commented Aug 18, 2025

Uh oh!

This comment was marked as outdated.

brandon-b-miller commented Aug 20, 2025

Uh oh!

brandon-b-miller commented Aug 20, 2025

Uh oh!

brandon-b-miller commented Aug 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

isVoid left a comment

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller commented Aug 25, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brandon-b-miller commented Oct 13, 2025

Uh oh!

brandon-b-miller commented Oct 13, 2025

Uh oh!

brandon-b-miller commented Oct 14, 2025

Uh oh!

brandon-b-miller commented Oct 15, 2025

Uh oh!

brandon-b-miller commented Oct 15, 2025

Uh oh!

brandon-b-miller commented Oct 15, 2025

Uh oh!

brandon-b-miller commented Oct 24, 2025

Uh oh!

Uh oh!

brandon-b-miller commented Oct 27, 2025

Uh oh!

brandon-b-miller commented Oct 27, 2025

Uh oh!

brandon-b-miller commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Handle `cuda.core.Stream` in driver operations #401

Handle `cuda.core.Stream` in driver operations #401

brandon-b-miller Oct 27, 2025 •

edited

Loading