-
Notifications
You must be signed in to change notification settings - Fork 216
Labels
P1Medium priority - Should doMedium priority - Should docuda.coreEverything related to the cuda.core moduleEverything related to the cuda.core modulefeatureNew feature or requestNew feature or request
Milestone
Description
Initializing a TMA descriptor through the driver APIs
https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TENSOR__MEMORY.html
is really tedious and error prone. We need a way to abstract it out, which aligns well with the mission of cuda.core
. This also allows JIT compilers to easier consume and incorporate into the compilation pipelines.
In my understanding there are two (implicit?) requirements for this to be useful:
- Creating/initializing a TMA object on host
- Passing the object to the
cuda.core.launch()
API as a kernel arg
Sub-issues
Metadata
Metadata
Assignees
Labels
P1Medium priority - Should doMedium priority - Should docuda.coreEverything related to the cuda.core moduleEverything related to the cuda.core modulefeatureNew feature or requestNew feature or request