You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Do FP8 rowwise bias addition in higher precision (#4095)
Summary:
X-link: facebookresearch/FBGEMM#1179
Previously, when bias was used in our FP8 rowwise kernel, it was added to the accumulator in its native precision. For example, if the bias is bf16, we would do a bf16 + bf16 addition. However, it's a bit more efficient and a bit more accurate to leave the accumulator in fp32, cast the bias to fp32, then to an fp32 addition.
Reviewed By: jianyuh
Differential Revision: D74408348
Copy file name to clipboardExpand all lines: fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_common.cuh
0 commit comments