Skip to content

[FEA] CVT F32 -> TF32 PTX for sm80 #2254

@osayamenja

Description

@osayamenja

Is your feature request related to a problem? Please describe.
Currently, converting from tf32 to f32 with round to nearest dispatches to a PTX cvt instruction only for sm90.

Describe the solution you'd like
If we allow rna rounding, we can dispatch to cvt.rna.tf32.f32, which works for sm80.

Describe alternatives you've considered
N/A

Additional context
A simple code sample is given below:

__global__ void f2tfK() {
    constexpr float x = -0.45466f;
    uint32_t d = 0;
    constexpr auto f2tf = cutlass::NumericConverter<cutlass::tfloat32_t, float>{};
    asm volatile("cvt.rna.tf32.f32 %0, %1;" : "=r"(d) : "f"(x));
    const auto res = cutlass::tfloat32_t::bitcast(d);
    const auto cRes = f2tf(x);
    printf("Intrinsic: "); cute::print(res); printf("\n");
    printf("Other: "); cute::print(cRes); printf("\n");
    printf("isEqual? %s\n", cRes == res ? "yes" : "no");
}
// Output: 
// Intrinsic: -0.454590
// Other: -0.454590
// isEqual? yes

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions