Skip to content

[BUG]: Strange/complex behavior for ptxas_option -v #1124

@capybara-club

Description

@capybara-club

Is this a duplicate?

Type of Bug

Silent Failure

Component

cuda.core

Describe the bug

Hello!

It looks like cuda-core JIT respects the caching behavior for nvrtc such that setting the environment variable CUDA_CACHE_DISABLE to 1 causes the ptxas '-v' option to always print register info.

When I set the environment variable and run the script below:

import sys
from cuda.core.experimental import Device, Program, ProgramOptions

cuda_code = r"""
extern "C"
__global__
void
kernel() {
    float x = 3.0f;
    printf("%f\n", x);
}
"""

dev = Device()
dev.set_current()
capability = dev.compute_capability
arch=f"sm_{capability.major}{capability.minor}"

program_options = \
    ProgramOptions(
        std="c++11", 
        arch=arch,
        ptxas_options=['-v'],
    )
prog = Program(cuda_code, code_type="c++", options=program_options)
ptx = prog.compile("ptx", logs=sys.stdout,).code.decode('utf-8')

print('Compiled PTX')

program_options = \
    ProgramOptions(
        arch=arch,
        ptxas_options=['-v']
    )
prog = Program(ptx, code_type="ptx", options=program_options)
cubin = prog.compile("cubin", logs=sys.stdout,)

print('Compiled CUBIN')

The ptxas info always prints, but it only prints when compiling the code from CUDA to PTX:

python ptxas_v.py
ptxas info    : 4 bytes gmem, 16 bytes cmem[4]
ptxas info    : Compiling entry function 'kernel' for 'sm_89'
ptxas info    : Function properties for kernel
ptxas         .     8 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 24 registers, used 0 barriers, 8 bytes cumulative stack size, 352 bytes cmem[0]
ptxas info    : Compile time = 1.209 ms
Compiled PTX
Compiled CUBIN

I think the ptxas -v information just isn't printing when doing PTX->CUBIN. It doesn't print if I change it from ['-v'] to ['-O1','-v'], or ['-O2','-v'], but setting the ptxas_options to options that don't exist causes a nvJitLink error, i.e. ['-O1a', '-v'].

Any ideas?

Thanks!

How to Reproduce

Run the script above, change the ptxas_options for the PTX -> CUBIN.

Expected behavior

Should see ptxas_info print the verbose message for CUBIN compilation.

Operating System

CachyOS

nvidia-smi output

Fri Oct 10 11:20:36 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0  On |                  Off |
|  0%   47C    P5             37W /  450W |    1754MiB /  24564MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Metadata

Metadata

Assignees

No one assigned

    Labels

    awaiting-responseFurther information is requestedbugSomething isn't workingcuda.coreEverything related to the cuda.core module

    Type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions