-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Labels
Description
Is your feature request related to a problem? Please describe.
Currently, converting from tf32 to f32 with round to nearest dispatches to a PTX cvt
instruction only for sm90.
Describe the solution you'd like
If we allow rna
rounding, we can dispatch to cvt.rna.tf32.f32
, which works for sm80.
Describe alternatives you've considered
N/A
Additional context
A simple code sample is given below:
__global__ void f2tfK() {
constexpr float x = -0.45466f;
uint32_t d = 0;
constexpr auto f2tf = cutlass::NumericConverter<cutlass::tfloat32_t, float>{};
asm volatile("cvt.rna.tf32.f32 %0, %1;" : "=r"(d) : "f"(x));
const auto res = cutlass::tfloat32_t::bitcast(d);
const auto cRes = f2tf(x);
printf("Intrinsic: "); cute::print(res); printf("\n");
printf("Other: "); cute::print(cRes); printf("\n");
printf("isEqual? %s\n", cRes == res ? "yes" : "no");
}
// Output:
// Intrinsic: -0.454590
// Other: -0.454590
// isEqual? yes