-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[ARM][AArch64] Vector intrinsics do not match hardware behavior for NaN, subnormals #128006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-backend-aarch64 Author: Oliver Stannard (ostannard)
The ARM/AArch64 vector intrinsics are defined as having the exact same behaviour as the hardware instructions.
For MVE: For AdvSIMD: However, clang does constant folding which doesn't always match the hardware's exact behaviour in cases like NaNs or subnormals. For example, the MVE instructions always use a "default NaN" of 0x7ffc0000 (for single-precision) when the result of the instruction is any NaN, but we constant-fold this code down to return the input NaN value of 0xffffff42: #include <arm_mve.h>
uint32x4_t foo() {
float32x4_t nan = vreinterpretq_f32_u32(vdupq_n_u32(0xffffff42));
float32x4_t nan_plus_nan = vaddq_f32(nan, nan);
return vreinterpretq_u32_f32(nan_plus_nan);
}
For subnormals, MVE instructions always flush input subnormal values to zero, but we optimise this code as if that was not the case, so the result gets rounded up to 1.0f: #include <arm_mve.h>
float32x4_t bar() {
float32x4_t smallest_subnormal = vreinterpretq_f32_u32(vdupq_n_u32(1));
float32x4_t round_up = vrndpq_f32(smallest_subnormal);
return round_up;
}
For AArch64 AdvSIMD, the rounding mode and subnormal flushing behaviour are configurable with the I think it would be reasonable to deviate from the ACLE here, and allow these optimisations depending on the floating-point options (e.g. |
@llvm/issue-subscribers-backend-arm Author: Oliver Stannard (ostannard)
The ARM/AArch64 vector intrinsics are defined as having the exact same behaviour as the hardware instructions.
For MVE: For AdvSIMD: However, clang does constant folding which doesn't always match the hardware's exact behaviour in cases like NaNs or subnormals. For example, the MVE instructions always use a "default NaN" of 0x7ffc0000 (for single-precision) when the result of the instruction is any NaN, but we constant-fold this code down to return the input NaN value of 0xffffff42: #include <arm_mve.h>
uint32x4_t foo() {
float32x4_t nan = vreinterpretq_f32_u32(vdupq_n_u32(0xffffff42));
float32x4_t nan_plus_nan = vaddq_f32(nan, nan);
return vreinterpretq_u32_f32(nan_plus_nan);
}
For subnormals, MVE instructions always flush input subnormal values to zero, but we optimise this code as if that was not the case, so the result gets rounded up to 1.0f: #include <arm_mve.h>
float32x4_t bar() {
float32x4_t smallest_subnormal = vreinterpretq_f32_u32(vdupq_n_u32(1));
float32x4_t round_up = vrndpq_f32(smallest_subnormal);
return round_up;
}
For AArch64 AdvSIMD, the rounding mode and subnormal flushing behaviour are configurable with the I think it would be reasonable to deviate from the ACLE here, and allow these optimisations depending on the floating-point options (e.g. |
For vector intrinsics not respecting strictfp, a few people are working on that in a target-independent context, trying to change the way "constrained" fp is represented. Don't have time to dig it up right now. 32-bit NEON/MVE in particular is weird because it doesn't respect the floating-point control word; see #16648/#106909/etc. |
The ARM/AArch64 vector intrinsics are defined as having the exact same behaviour as the hardware instructions.
For MVE:
For AdvSIMD:
However, clang does constant folding which doesn't always match the hardware's exact behaviour in cases like NaNs or subnormals.
For example, the MVE instructions always use a "default NaN" of 0x7ffc0000 (for single-precision) when the result of the instruction is any NaN, but we constant-fold this code down to return the input NaN value of 0xffffff42:
For subnormals, MVE instructions always flush input subnormal values to zero, but we optimise this code as if that was not the case, so the result gets rounded up to 1.0f:
For AArch64 AdvSIMD, the rounding mode and subnormal flushing behaviour are configurable with the
FPCR
register, but we also emit code which constant-folds these operations.I think it would be reasonable to deviate from the ACLE here, and allow these optimisations depending on the floating-point options (e.g.
-ffp-model=
), but none of these options seem to have any effect on vector intrinsics.The text was updated successfully, but these errors were encountered: