Replies: 1 comment 1 reply
-
flash attention 2 is already integrated. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
They are new techniques for optimizing inference performance of LLM.
Beta Was this translation helpful? Give feedback.
All reactions