Skip to content

Commit 9780a34

Browse files
committed
add an interoperatibility page
1 parent f1267cd commit 9780a34

File tree

2 files changed

+84
-0
lines changed

2 files changed

+84
-0
lines changed

cuda_core/docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ and other functionalities.
1010

1111
release.md
1212
install.md
13+
interoperability.rst
1314
api.rst
1415

1516

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
.. currentmodule:: cuda.core.experimental
2+
3+
Interoperability
4+
================
5+
6+
``cuda.core`` is designed to be interoperable with other Python GPU libraries. Below
7+
we cover a list of possible such scenarios.
8+
9+
10+
Current device/context
11+
----------------------
12+
13+
The :meth:`Device.set_current` method ensures that the calling host thread has
14+
an active CUDA context set to current. This CUDA context can be seen and accessed
15+
by other GPU libraries without any code change. For libraries built on top of
16+
the CUDA runtime (``cudart``), this is as if ``cudaSetDevice`` is called.
17+
18+
Since CUDA contexts are per-thread constructs, in a multi-threaded program each
19+
host thread should call this method.
20+
21+
Conversely, if any GPU library already set a device (or context) to current, this
22+
method ensures that the same device/context is picked up by and shared with
23+
``cuda.core``.
24+
25+
26+
``__cuda_stream__`` protocol
27+
----------------------------
28+
29+
The :class:`~_stream.Stream` class is a vocabulary type representing CUDA streams
30+
in Python. While we encourage new Python projects to start using streams (and other
31+
CUDA types) from ``cuda.core``, we understand that there are already several projects
32+
exposing their own stream types.
33+
34+
To address this issue, we propose the ``__cuda_stream__`` protocol (currently version
35+
0) as follows: For any Python objects that are meant to be interpreted as a stream, they
36+
should add a ``__cuda_stream__`` attribute that returns a 2-tuple: The version number
37+
(``0``) and the address of ``cudaStream_t``:
38+
39+
.. code-block:: python
40+
41+
class MyStream:
42+
43+
@property
44+
def __cuda_stream__(self):
45+
return (0, self.ptr)
46+
47+
...
48+
49+
Then such objects can be understood by ``cuda.core`` anywhere a stream-like object
50+
is needed.
51+
52+
We suggest all existing Python projects that expose a stream class to also support this
53+
protocol wherever a function takes a stream.
54+
55+
56+
Memory view utilities for CPU/GPU buffers
57+
-----------------------------------------
58+
59+
The Python community has defined protocols such as CUDA Array Interface (CAI) [1]_ and DLPack
60+
[2]_ (part of the Python array API standard [3]_) for facilitating zero-copy data exchange
61+
between two GPU projects. In particular, performance considerations prompted the protocol
62+
designs gearing toward *stream-ordered* operations so as to avoid unnecessary synchronizations.
63+
While the designs are robust, *implementing* such protocols can be tricky and often requires
64+
a few iterations to ensure correctness.
65+
66+
``cuda.core`` offers a :func:`~utils.args_viewable_as_strided_memory` decorator for
67+
extracting the metadata (such as pointer address, shape, strides, and dtype) from any
68+
Python objects supporting either CAI or DLPack and returning a :class:`~utils.StridedMemoryView` object, see the
69+
`strided_memory_view.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_core/examples/strided_memory_view.py>`_
70+
example. Alternatively, a :class:`~utils.StridedMemoryView` object can be explicitly
71+
constructed without using the decorator. This provides a *concrete implementation* to both
72+
protocols that is **array-library-agnostic**, so that all Python projects can just rely on this
73+
without either re-implementing (the consumer-side of) the protocols or tying to any particular
74+
array libraries.
75+
76+
The :attr:`~utils.StridedMemoryView.is_device_accessible` attribute can be used to check
77+
whether or not the underlying buffer can be accessed on GPU.
78+
79+
.. rubric:: Footnotes
80+
81+
.. [1] https://numba.readthedocs.io/en/stable/cuda/cuda_array_interface.html
82+
.. [2] https://dmlc.github.io/dlpack/latest/python_spec.html
83+
.. [3] https://data-apis.org/array-api/latest/design_topics/data_interchange.html

0 commit comments

Comments
 (0)