Hi, thanks for your valuable work!
Do you have any plan to adapt flash attention to run on Volta hardware? Maybe the improvements will not be so large as with Ampere-based cards, but now people having Volta-based cards cannot run anymore any LLM based on Flash Attention 2 and it would be very valuable to build a package making FA2 compatible with Volta cards, even accepting a not-so-optimised attention computation.
Thanks!
Hi, thanks for your valuable work!
Do you have any plan to adapt flash attention to run on Volta hardware? Maybe the improvements will not be so large as with Ampere-based cards, but now people having Volta-based cards cannot run anymore any LLM based on Flash Attention 2 and it would be very valuable to build a package making FA2 compatible with Volta cards, even accepting a not-so-optimised attention computation.
Thanks!