Skip to content

Commit 52487ea

Browse files
jwfrommfacebook-github-bot
authored andcommitted
Do FP8 rowwise bias addition in higher precision (#4095)
Summary: X-link: facebookresearch/FBGEMM#1179 Previously, when bias was used in our FP8 rowwise kernel, it was added to the accumulator in its native precision. For example, if the bias is bf16, we would do a bf16 + bf16 addition. However, it's a bit more efficient and a bit more accurate to leave the accumulator in fp32, cast the bias to fp32, then to an fp32 addition. Reviewed By: jianyuh Differential Revision: D74408348
1 parent 2c0e915 commit 52487ea

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_common.cuh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ at::Tensor f8f8bf16_rowwise_impl(
138138
0,
139139
TileShape,
140140
ElementBias,
141-
ElementBias,
141+
ElementComputeEpilogue,
142142
cute::Stride<cute::Int<1>, cute::Int<0>, cute::Int<0>>>;
143143

144144
using Accum = cutlass::epilogue::fusion::Sm90AccFetch;
@@ -156,7 +156,7 @@ at::Tensor f8f8bf16_rowwise_impl(
156156
cutlass::multiplies,
157157
cute::conditional_t< // Second stage output type.
158158
USE_BIAS,
159-
ElementBias,
159+
ElementComputeEpilogue,
160160
ElementOutput>,
161161
ElementComputeEpilogue, // Second stage input types.
162162
cutlass::FloatRoundStyle::round_to_nearest>;
@@ -167,7 +167,7 @@ at::Tensor f8f8bf16_rowwise_impl(
167167
using ComputeBias = cutlass::epilogue::fusion::Sm90Compute<
168168
cutlass::plus,
169169
ElementOutput, // Final (optional) stage output type.
170-
ElementBias, // Final stage input types.
170+
ElementComputeEpilogue, // Final stage input types.
171171
cutlass::FloatRoundStyle::round_to_nearest>;
172172

173173
using EVTComputeBias =

0 commit comments

Comments
 (0)