-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Description
Is your feature request related to a problem? Please describe.
At nvmath device api we are adding support to a library that uses architecture/family specific. Basically we have device function in lto that was built with specific a/f features and we want to link it to numba kernel.
https://docs.nvidia.com/cuda/cuda-c-programming-guide/#feature-availability
Describe the solution you'd like
Some api to enable architecture/family specific features for kernels, device functions and cuda.declare_device.
Describe alternatives you've considered
Numba automatically detects if a/f specific feature was used and set's compiler/linker flag.
Additional context
Basically I want this to work
from numba import cuda
from numba.core import types
import numpy as np
import cffi
set_desc = cuda.CUSource("""
#include <cuda_fp16.h>
extern "C" __device__
int set_descriptor(int *out, int* smem) {
unsigned usmem = __cvta_generic_to_shared(smem);
asm volatile("tensormap.replace.tile.rank.shared::cta.b1024.b32 [%0], 2;" :: "r"(usmem));
return 0;
}
""")
set_descriptor = cuda.declare_device("set_descriptor", types.int32(types.CPointer(types.int32)), link=[set_desc])
ffi = cffi.FFI()
@cuda.jit
def kernel(a):
sm = cuda.shared.array(1, dtype=np.int32)
data_ptr = ffi.from_buffer(sm)
set_descriptor(data_ptr)
# just to prevent optimization:
sm[0]=2
cuda.syncthreads()
a[0] = sm[0]
a = np.ones(1, dtype=np.int32)
kernel[1, 1](a)
assert(a[0]==2)Currently on sm90 it returns an error:
numba_cuda/numba/cuda/dispatcher.py:700: in __call__
return self.dispatcher.call(
numba_cuda/numba/cuda/dispatcher.py:1022: in call
kernel = _dispatcher.Dispatcher._cuda_call(self, *args)
numba_cuda/numba/cuda/dispatcher.py:1030: in _compile_for_args
return self.compile(tuple(argtypes))
.venv/lib/python3.10/site-packages/numba/core/compiler_lock.py:35: in _acquire_compile_lock
return func(*args, **kwargs)
numba_cuda/numba/cuda/dispatcher.py:1298: in compile
kernel.bind()
numba_cuda/numba/cuda/dispatcher.py:331: in bind
cufunc = self._codelibrary.get_cufunc()
numba_cuda/numba/cuda/codegen.py:339: in get_cufunc
cubin = self.get_cubin(cc=device.compute_capability)
numba_cuda/numba/cuda/codegen.py:318: in get_cubin
cubin = linker.complete()
numba_cuda/numba/cuda/cudadrv/driver.py:3084: in complete
result = self.linker.link("cubin")
.venv/lib/python3.10/site-packages/cuda/core/experimental/_linker.py:448: in link
_nvjitlink.complete(self._mnff.handle)
cuda/bindings/nvjitlink.pyx:156: in cuda.bindings.nvjitlink.complete
???
cuda/bindings/nvjitlink.pyx:166: in cuda.bindings.nvjitlink.complete
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E cuda.bindings.nvjitlink.nvJitLinkError: ERROR_PTX_COMPILE (4)
E Linker error log: ptxas application ptx input, line 35; error : Instruction 'tensormap.replace' not supported on .target 'sm_90'
E ptxas fatal : Ptx assembly aborted due to errors
E ERROR NVJITLINK_ERROR_PTX_COMPILE: JIT the PTX (ltoPtx)
E
cuda/bindings/nvjitlink.pyx:66: nvJitLinkError
nvlcambier
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request