-
Notifications
You must be signed in to change notification settings - Fork 214
Description
Is this a duplicate?
- I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct
Type of Bug
Silent Failure
Component
cuda.core
Describe the bug
Hello!
It looks like cuda-core JIT respects the caching behavior for nvrtc such that setting the environment variable CUDA_CACHE_DISABLE
to 1
causes the ptxas '-v' option to always print register info.
When I set the environment variable and run the script below:
import sys
from cuda.core.experimental import Device, Program, ProgramOptions
cuda_code = r"""
extern "C"
__global__
void
kernel() {
float x = 3.0f;
printf("%f\n", x);
}
"""
dev = Device()
dev.set_current()
capability = dev.compute_capability
arch=f"sm_{capability.major}{capability.minor}"
program_options = \
ProgramOptions(
std="c++11",
arch=arch,
ptxas_options=['-v'],
)
prog = Program(cuda_code, code_type="c++", options=program_options)
ptx = prog.compile("ptx", logs=sys.stdout,).code.decode('utf-8')
print('Compiled PTX')
program_options = \
ProgramOptions(
arch=arch,
ptxas_options=['-v']
)
prog = Program(ptx, code_type="ptx", options=program_options)
cubin = prog.compile("cubin", logs=sys.stdout,)
print('Compiled CUBIN')
The ptxas info always prints, but it only prints when compiling the code from CUDA to PTX:
python ptxas_v.py
ptxas info : 4 bytes gmem, 16 bytes cmem[4]
ptxas info : Compiling entry function 'kernel' for 'sm_89'
ptxas info : Function properties for kernel
ptxas . 8 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, used 0 barriers, 8 bytes cumulative stack size, 352 bytes cmem[0]
ptxas info : Compile time = 1.209 ms
Compiled PTX
Compiled CUBIN
I think the ptxas -v information just isn't printing when doing PTX->CUBIN. It doesn't print if I change it from ['-v']
to ['-O1','-v']
, or ['-O2','-v']
, but setting the ptxas_options to options that don't exist causes a nvJitLink error, i.e. ['-O1a', '-v']
.
Any ideas?
Thanks!
How to Reproduce
Run the script above, change the ptxas_options for the PTX -> CUBIN.
Expected behavior
Should see ptxas_info print the verbose message for CUBIN compilation.
Operating System
CachyOS
nvidia-smi output
Fri Oct 10 11:20:36 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 On | Off |
| 0% 47C P5 37W / 450W | 1754MiB / 24564MiB | 1% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+