-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Description
Does anyone know the correct way to define fma for BFloat16? Based on my understanding of the double-rounding theorems, I think it should be correct to simply cast to Float32 and back:
@inline Base.fma(x::BFloat16, y::BFloat16, z::BFloat16) =
BFloat16(fma(Float32(x), Float32(y), Float32(z)))
But I've observed that this sometimes returns different results from BFloat16(Float32(x) * Float32(y) + Float32(z)), which puzzles me because I think they ought to be equivalent in round-to-nearest-even (provided that Float32 has more than twice plus two the precision of BFloat16, which is true).
Metadata
Metadata
Assignees
Labels
No labels