Why Doesn't FlashAttention3 Allow KV and O to Share Memory Space? #1396

ziyuhuang123 · 2024-12-18T11:13:02Z

I noticed in kernel_traits that in FA3, Q and K are kept fixed in memory, while V and O can reuse the same space. However, isn't Q the only tensor that must remain fixed? (Since our block keeps moving to the right, Q must stay fixed, while K and V are continuously updated.)

Why not allow KV and O to share memory space (using a union)? Is it because O occupies very little space, making such a modification unnecessary?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why Doesn't FlashAttention3 Allow KV and O to Share Memory Space? #1396

Why Doesn't FlashAttention3 Allow KV and O to Share Memory Space? #1396

ziyuhuang123 commented Dec 18, 2024

Why Doesn't FlashAttention3 Allow KV and O to Share Memory Space? #1396

Why Doesn't FlashAttention3 Allow KV and O to Share Memory Space? #1396

Comments

ziyuhuang123 commented Dec 18, 2024