Suggestion Description
Flashinfer-ai/flashinfer provides an implementation of a batched attention kernel https://github.com/flashinfer-ai/flashinfer/blob/main/csrc/batch_attention.cu.
Refer flashinfer-ai#1137. It is worth exploring if such a feature is viable for ROCm and whether it will be of benefit to end users.
Operating System
No response
GPU
No response
ROCm Component
No response
Suggestion Description
Flashinfer-ai/flashinfer provides an implementation of a batched attention kernel https://github.com/flashinfer-ai/flashinfer/blob/main/csrc/batch_attention.cu.
Refer flashinfer-ai#1137. It is worth exploring if such a feature is viable for ROCm and whether it will be of benefit to end users.
Operating System
No response
GPU
No response
ROCm Component
No response