@@ -48,7 +48,7 @@ API](https://docs.nvidia.com/cuda/cuda-driver-api/index.html) and
4848Python package. In this example, you copy data from the host to device. You need
4949[ NumPy] ( https://numpy.org/doc/stable/contents.html ) to store data on the host.
5050
51- ``` {code-cell} python
51+ ``` python
5252from cuda.bindings import driver, nvrtc
5353import numpy as np
5454```
@@ -58,7 +58,7 @@ example is provided.
5858In a future release, this may automatically raise exceptions using a Python
5959object model.
6060
61- ``` {code-cell} python
61+ ``` python
6262def _cudaGetErrorEnum (error ):
6363 if isinstance (error, driver.CUresult):
6464 err, name = driver.cuGetErrorName(error)
@@ -86,7 +86,7 @@ Python that requires some understanding of CUDA C++. For more information, see
8686[ An Even Easier Introduction to
8787CUDA] ( https://developer.nvidia.com/blog/even-easier-introduction-cuda/ ) .
8888
89- ``` {code-cell} python
89+ ``` python
9090saxpy = """ \
9191extern "C" __global__
9292void saxpy(float a, float *x, float *y, float *out, size_t n)
@@ -108,7 +108,7 @@ In the following code example, the Driver API is initialized so that the NVIDIA
108108and GPU are accessible. Next, the GPU is queried for their compute capability. Finally,
109109the program is compiled to target our local compute capability architecture with FMAD enabled.
110110
111- ``` {code-cell} python
111+ ``` python
112112# Initialize CUDA Driver API
113113checkCudaErrors(driver.cuInit(0 ))
114114
@@ -138,7 +138,7 @@ context. CUDA contexts are analogous to host processes for the device. In the
138138following code example, a handle for compute device 0 is passed to
139139` cuCtxCreate ` to designate that GPU for context creation.
140140
141- ``` {code-cell} python
141+ ``` python
142142# Create context
143143context = checkCudaErrors(driver.cuCtxCreate(0 , cuDevice))
144144```
@@ -148,7 +148,7 @@ module. A module is analogous to dynamically loaded libraries for the device.
148148After loading into the module, extract a specific kernel with
149149` cuModuleGetFunction ` . It is not uncommon for multiple kernels to reside in PTX.
150150
151- ``` {code-cell} python
151+ ``` python
152152# Load PTX as module data and retrieve function
153153ptx = np.char.array(ptx)
154154# Note: Incompatible --gpu-architecture would be detected here
@@ -161,7 +161,7 @@ application performance, you can input data on the device to eliminate data
161161transfers. For completeness, this example shows how you would transfer data to
162162and from the device.
163163
164- ``` {code-cell} python
164+ ``` python
165165NUM_THREADS = 512 # Threads per block
166166NUM_BLOCKS = 32768 # Blocks per grid
167167
@@ -184,7 +184,7 @@ Python doesn’t have a natural concept of pointers, yet `cuMemcpyHtoDAsync` exp
184184` void* ` . Therefore, ` XX.ctypes.data ` retrieves the pointer value associated with
185185XX.
186186
187- ``` {code-cell} python
187+ ``` python
188188dXclass = checkCudaErrors(driver.cuMemAlloc(bufferSize))
189189dYclass = checkCudaErrors(driver.cuMemAlloc(bufferSize))
190190dOutclass = checkCudaErrors(driver.cuMemAlloc(bufferSize))
@@ -209,7 +209,7 @@ Like `cuMemcpyHtoDAsync`, `cuLaunchKernel` expects `void**` in the argument list
209209the earlier code example, it creates ` void** ` by grabbing the ` void* ` value of each
210210individual argument and placing them into its own contiguous memory.
211211
212- ``` {code-cell} python
212+ ``` python
213213# The following code example is not intuitive
214214# Subject to change in a future release
215215dX = np.array([int (dXclass)], dtype = np.uint64)
@@ -222,7 +222,7 @@ args = np.array([arg.ctypes.data for arg in args], dtype=np.uint64)
222222
223223Now the kernel can be launched:
224224
225- ``` {code-cell} python
225+ ``` python
226226checkCudaErrors(driver.cuLaunchKernel(
227227 kernel,
228228 NUM_BLOCKS , # grid x dim
@@ -251,7 +251,7 @@ stream are serialized. After the call to transfer data back to the host is
251251executed, ` cuStreamSynchronize ` is used to halt CPU execution until all operations
252252in the designated stream are finished.
253253
254- ``` {code-cell} python
254+ ``` python
255255# Assert values are same after running kernel
256256hZ = a * hX + hY
257257if not np.allclose(hOut, hZ):
@@ -261,7 +261,7 @@ if not np.allclose(hOut, hZ):
261261Perform verification of the data to ensure correctness and finish the code with
262262memory clean up.
263263
264- ``` {code-cell} python
264+ ``` python
265265checkCudaErrors(driver.cuStreamDestroy(stream))
266266checkCudaErrors(driver.cuMemFree(dXclass))
267267checkCudaErrors(driver.cuMemFree(dYclass))
0 commit comments