Efficient attention not working with Whisper

**Describe the bug**
None of the memory efficient attention kernels are working with Whisper implementation.

**To Reproduce**
Run whisper with paged_attention/flash attention (splash attention is working but internally falls back to vanilla)