You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to implement flashAttention in local, which is similar to zhuzilin/ring-flash-attention#24.
But I have found that there is a significant discrepancy between the attention computation using the flashAttention interface and the native attention computation. Could you please explain why this is the case? This error is unacceptable for inference.
This issue has been troubling me for a long time. I am looking forward to your reply.
I am trying to implement flashAttention in local, which is similar to zhuzilin/ring-flash-attention#24.
But I have found that there is a significant discrepancy between the attention computation using the flashAttention interface and the native attention computation. Could you please explain why this is the case? This error is unacceptable for inference.
This issue has been troubling me for a long time. I am looking forward to your reply.
`
`
The print result is:
[flash_attn_output, native_output] diff max: 0.294921875
[flash_attn_output, native_output] diff mean: 0.01226806640625
[flash_attn_varlen_output, native_output] diff max: 0.326171875
[flash_attn_varlen_output, native_output] diff mean: 0.0130615234375
The text was updated successfully, but these errors were encountered: