-
Notifications
You must be signed in to change notification settings - Fork 14.6k
[AArch64] Improve lowering of scalar abs(sub(a, b)). #151180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: Ricardo Jesus (rj-jesus) ChangesThis patch avoids a comparison against zero when lowering abs(sub(a, b)) patterns, instead reusing the condition codes generated by a subs of the operands directly. For example, currently:
becomes:
Whilst this doesn't decrease the number of instructions, the new version should expose more ILP and use ``cheaper'' instructions having lower latency and/or higher throughput. This patch also includes a somewhat orthogonal change in Together with #151177, this should handle the remaining patterns in #118413 (although I believe now some of the Full diff: https://github.com/llvm/llvm-project/pull/151180.diff 5 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 7b49754ee7e1f..8b515fb649ee7 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -7118,12 +7118,21 @@ SDValue AArch64TargetLowering::LowerABS(SDValue Op, SelectionDAG &DAG) const {
return LowerToPredicatedOp(Op, DAG, AArch64ISD::ABS_MERGE_PASSTHRU);
SDLoc DL(Op);
- SDValue Neg = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT),
- Op.getOperand(0));
- // Generate SUBS & CSEL.
- SDValue Cmp = DAG.getNode(AArch64ISD::SUBS, DL, DAG.getVTList(VT, FlagsVT),
- Op.getOperand(0), DAG.getConstant(0, DL, VT));
- return DAG.getNode(AArch64ISD::CSEL, DL, VT, Op.getOperand(0), Neg,
+ SDValue Val = Op.getOperand(0);
+ SDValue Neg = DAG.getNegative(Val, DL, VT);
+ SDValue Cmp;
+
+ // For abs(sub(lhs, rhs)), we can compare lhs and rhs directly. This allows
+ // reusing the subs operation for the calculation and comparison.
+ if (Val.getOpcode() == ISD::SUB)
+ Cmp = DAG.getNode(AArch64ISD::SUBS, DL, DAG.getVTList(VT, FlagsVT),
+ Val.getOperand(0), Val.getOperand(1));
+ else
+ // Otherwise, compare with zero.
+ Cmp = DAG.getNode(AArch64ISD::SUBS, DL, DAG.getVTList(VT, FlagsVT), Val,
+ DAG.getConstant(0, DL, VT));
+
+ return DAG.getNode(AArch64ISD::CSEL, DL, VT, Val, Neg,
DAG.getConstant(AArch64CC::PL, DL, MVT::i32),
Cmp.getValue(1));
}
@@ -20835,11 +20844,17 @@ static SDValue performNegCSelCombine(SDNode *N, SelectionDAG &DAG) {
if (!isNegatedInteger(N0) && !isNegatedInteger(N1))
return SDValue();
- SDValue N0N = getNegatedInteger(N0, DAG);
- SDValue N1N = getNegatedInteger(N1, DAG);
-
SDLoc DL(N);
EVT VT = CSel.getValueType();
+
+ // If the operands are negations of each other, reverse them.
+ if ((isNegatedInteger(N0) && N0.getOperand(1) == N1) ||
+ (isNegatedInteger(N1) && N1.getOperand(1) == N0))
+ return DAG.getNode(AArch64ISD::CSEL, DL, VT, N1, N0, CSel.getOperand(2),
+ CSel.getOperand(3));
+
+ SDValue N0N = getNegatedInteger(N0, DAG);
+ SDValue N1N = getNegatedInteger(N1, DAG);
return DAG.getNode(AArch64ISD::CSEL, DL, VT, N0N, N1N, CSel.getOperand(2),
CSel.getOperand(3));
}
diff --git a/llvm/test/CodeGen/AArch64/abds-neg.ll b/llvm/test/CodeGen/AArch64/abds-neg.ll
index 432ffc30eec5e..a7c27894e6125 100644
--- a/llvm/test/CodeGen/AArch64/abds-neg.ll
+++ b/llvm/test/CodeGen/AArch64/abds-neg.ll
@@ -8,9 +8,9 @@
define i8 @abd_ext_i8(i8 %a, i8 %b) nounwind {
; CHECK-LABEL: abd_ext_i8:
; CHECK: // %bb.0:
-; CHECK-NEXT: sxtb w8, w0
-; CHECK-NEXT: sub w8, w8, w1, sxtb
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: sxtb w8, w1
+; CHECK-NEXT: sxtb w9, w0
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, pl
; CHECK-NEXT: ret
%aext = sext i8 %a to i64
@@ -25,9 +25,9 @@ define i8 @abd_ext_i8(i8 %a, i8 %b) nounwind {
define i8 @abd_ext_i8_i16(i8 %a, i16 %b) nounwind {
; CHECK-LABEL: abd_ext_i8_i16:
; CHECK: // %bb.0:
-; CHECK-NEXT: sxtb w8, w0
-; CHECK-NEXT: sub w8, w8, w1, sxth
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: sxth w8, w1
+; CHECK-NEXT: sxtb w9, w0
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, pl
; CHECK-NEXT: ret
%aext = sext i8 %a to i64
@@ -42,9 +42,9 @@ define i8 @abd_ext_i8_i16(i8 %a, i16 %b) nounwind {
define i8 @abd_ext_i8_undef(i8 %a, i8 %b) nounwind {
; CHECK-LABEL: abd_ext_i8_undef:
; CHECK: // %bb.0:
-; CHECK-NEXT: sxtb w8, w0
-; CHECK-NEXT: sub w8, w8, w1, sxtb
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: sxtb w8, w1
+; CHECK-NEXT: sxtb w9, w0
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, pl
; CHECK-NEXT: ret
%aext = sext i8 %a to i64
@@ -59,9 +59,9 @@ define i8 @abd_ext_i8_undef(i8 %a, i8 %b) nounwind {
define i16 @abd_ext_i16(i16 %a, i16 %b) nounwind {
; CHECK-LABEL: abd_ext_i16:
; CHECK: // %bb.0:
-; CHECK-NEXT: sxth w8, w0
-; CHECK-NEXT: sub w8, w8, w1, sxth
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: sxth w8, w1
+; CHECK-NEXT: sxth w9, w0
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, pl
; CHECK-NEXT: ret
%aext = sext i16 %a to i64
@@ -94,9 +94,9 @@ define i16 @abd_ext_i16_i32(i16 %a, i32 %b) nounwind {
define i16 @abd_ext_i16_undef(i16 %a, i16 %b) nounwind {
; CHECK-LABEL: abd_ext_i16_undef:
; CHECK: // %bb.0:
-; CHECK-NEXT: sxth w8, w0
-; CHECK-NEXT: sub w8, w8, w1, sxth
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: sxth w8, w1
+; CHECK-NEXT: sxth w9, w0
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, pl
; CHECK-NEXT: ret
%aext = sext i16 %a to i64
diff --git a/llvm/test/CodeGen/AArch64/abds.ll b/llvm/test/CodeGen/AArch64/abds.ll
index ed1e6077948ee..3e1d8522aa9d8 100644
--- a/llvm/test/CodeGen/AArch64/abds.ll
+++ b/llvm/test/CodeGen/AArch64/abds.ll
@@ -8,9 +8,9 @@
define i8 @abd_ext_i8(i8 %a, i8 %b) nounwind {
; CHECK-LABEL: abd_ext_i8:
; CHECK: // %bb.0:
-; CHECK-NEXT: sxtb w8, w0
-; CHECK-NEXT: sub w8, w8, w1, sxtb
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: sxtb w8, w1
+; CHECK-NEXT: sxtb w9, w0
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%aext = sext i8 %a to i64
@@ -24,9 +24,9 @@ define i8 @abd_ext_i8(i8 %a, i8 %b) nounwind {
define i8 @abd_ext_i8_i16(i8 %a, i16 %b) nounwind {
; CHECK-LABEL: abd_ext_i8_i16:
; CHECK: // %bb.0:
-; CHECK-NEXT: sxtb w8, w0
-; CHECK-NEXT: sub w8, w8, w1, sxth
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: sxth w8, w1
+; CHECK-NEXT: sxtb w9, w0
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%aext = sext i8 %a to i64
@@ -40,9 +40,9 @@ define i8 @abd_ext_i8_i16(i8 %a, i16 %b) nounwind {
define i8 @abd_ext_i8_undef(i8 %a, i8 %b) nounwind {
; CHECK-LABEL: abd_ext_i8_undef:
; CHECK: // %bb.0:
-; CHECK-NEXT: sxtb w8, w0
-; CHECK-NEXT: sub w8, w8, w1, sxtb
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: sxtb w8, w1
+; CHECK-NEXT: sxtb w9, w0
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%aext = sext i8 %a to i64
@@ -56,9 +56,9 @@ define i8 @abd_ext_i8_undef(i8 %a, i8 %b) nounwind {
define i16 @abd_ext_i16(i16 %a, i16 %b) nounwind {
; CHECK-LABEL: abd_ext_i16:
; CHECK: // %bb.0:
-; CHECK-NEXT: sxth w8, w0
-; CHECK-NEXT: sub w8, w8, w1, sxth
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: sxth w8, w1
+; CHECK-NEXT: sxth w9, w0
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%aext = sext i16 %a to i64
@@ -88,9 +88,9 @@ define i16 @abd_ext_i16_i32(i16 %a, i32 %b) nounwind {
define i16 @abd_ext_i16_undef(i16 %a, i16 %b) nounwind {
; CHECK-LABEL: abd_ext_i16_undef:
; CHECK: // %bb.0:
-; CHECK-NEXT: sxth w8, w0
-; CHECK-NEXT: sub w8, w8, w1, sxth
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: sxth w8, w1
+; CHECK-NEXT: sxth w9, w0
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%aext = sext i16 %a to i64
@@ -220,9 +220,9 @@ define i128 @abd_ext_i128_undef(i128 %a, i128 %b) nounwind {
define i8 @abd_minmax_i8(i8 %a, i8 %b) nounwind {
; CHECK-LABEL: abd_minmax_i8:
; CHECK: // %bb.0:
-; CHECK-NEXT: sxtb w8, w0
-; CHECK-NEXT: sub w8, w8, w1, sxtb
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: sxtb w8, w1
+; CHECK-NEXT: sxtb w9, w0
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%min = call i8 @llvm.smin.i8(i8 %a, i8 %b)
@@ -234,9 +234,9 @@ define i8 @abd_minmax_i8(i8 %a, i8 %b) nounwind {
define i16 @abd_minmax_i16(i16 %a, i16 %b) nounwind {
; CHECK-LABEL: abd_minmax_i16:
; CHECK: // %bb.0:
-; CHECK-NEXT: sxth w8, w0
-; CHECK-NEXT: sub w8, w8, w1, sxth
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: sxth w8, w1
+; CHECK-NEXT: sxth w9, w0
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%min = call i16 @llvm.smin.i16(i16 %a, i16 %b)
@@ -294,9 +294,9 @@ define i128 @abd_minmax_i128(i128 %a, i128 %b) nounwind {
define i8 @abd_cmp_i8(i8 %a, i8 %b) nounwind {
; CHECK-LABEL: abd_cmp_i8:
; CHECK: // %bb.0:
-; CHECK-NEXT: sxtb w8, w0
-; CHECK-NEXT: sub w8, w8, w1, sxtb
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: sxtb w8, w1
+; CHECK-NEXT: sxtb w9, w0
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%cmp = icmp sgt i8 %a, %b
@@ -309,9 +309,9 @@ define i8 @abd_cmp_i8(i8 %a, i8 %b) nounwind {
define i16 @abd_cmp_i16(i16 %a, i16 %b) nounwind {
; CHECK-LABEL: abd_cmp_i16:
; CHECK: // %bb.0:
-; CHECK-NEXT: sxth w8, w0
-; CHECK-NEXT: sub w8, w8, w1, sxth
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: sxth w8, w1
+; CHECK-NEXT: sxth w9, w0
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%cmp = icmp sge i16 %a, %b
@@ -517,11 +517,11 @@ define i64 @vector_legalized(i16 %a, i16 %b) {
; CHECK-LABEL: vector_legalized:
; CHECK: // %bb.0:
; CHECK-NEXT: movi v0.2d, #0000000000000000
-; CHECK-NEXT: sxth w8, w0
-; CHECK-NEXT: sub w8, w8, w1, sxth
-; CHECK-NEXT: addp d0, v0.2d
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: sxth w8, w1
+; CHECK-NEXT: sxth w9, w0
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w8, w8, mi
+; CHECK-NEXT: addp d0, v0.2d
; CHECK-NEXT: fmov x9, d0
; CHECK-NEXT: add x0, x9, x8
; CHECK-NEXT: ret
@@ -542,9 +542,9 @@ define i64 @vector_legalized(i16 %a, i16 %b) {
define i8 @abd_select_i8(i8 %a, i8 %b) nounwind {
; CHECK-LABEL: abd_select_i8:
; CHECK: // %bb.0:
-; CHECK-NEXT: sxtb w8, w0
-; CHECK-NEXT: sub w8, w8, w1, sxtb
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: sxtb w8, w1
+; CHECK-NEXT: sxtb w9, w0
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%cmp = icmp slt i8 %a, %b
@@ -557,9 +557,9 @@ define i8 @abd_select_i8(i8 %a, i8 %b) nounwind {
define i16 @abd_select_i16(i16 %a, i16 %b) nounwind {
; CHECK-LABEL: abd_select_i16:
; CHECK: // %bb.0:
-; CHECK-NEXT: sxth w8, w0
-; CHECK-NEXT: sub w8, w8, w1, sxth
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: sxth w8, w1
+; CHECK-NEXT: sxth w9, w0
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%cmp = icmp sle i16 %a, %b
diff --git a/llvm/test/CodeGen/AArch64/abdu-neg.ll b/llvm/test/CodeGen/AArch64/abdu-neg.ll
index 8fb106e92866e..a83323650330b 100644
--- a/llvm/test/CodeGen/AArch64/abdu-neg.ll
+++ b/llvm/test/CodeGen/AArch64/abdu-neg.ll
@@ -8,9 +8,9 @@
define i8 @abd_ext_i8(i8 %a, i8 %b) nounwind {
; CHECK-LABEL: abd_ext_i8:
; CHECK: // %bb.0:
-; CHECK-NEXT: and w8, w0, #0xff
-; CHECK-NEXT: sub w8, w8, w1, uxtb
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: and w8, w1, #0xff
+; CHECK-NEXT: and w9, w0, #0xff
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, pl
; CHECK-NEXT: ret
%aext = zext i8 %a to i64
@@ -25,9 +25,9 @@ define i8 @abd_ext_i8(i8 %a, i8 %b) nounwind {
define i8 @abd_ext_i8_i16(i8 %a, i16 %b) nounwind {
; CHECK-LABEL: abd_ext_i8_i16:
; CHECK: // %bb.0:
-; CHECK-NEXT: and w8, w0, #0xff
-; CHECK-NEXT: sub w8, w8, w1, uxth
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: and w8, w1, #0xffff
+; CHECK-NEXT: and w9, w0, #0xff
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, pl
; CHECK-NEXT: ret
%aext = zext i8 %a to i64
@@ -42,9 +42,9 @@ define i8 @abd_ext_i8_i16(i8 %a, i16 %b) nounwind {
define i8 @abd_ext_i8_undef(i8 %a, i8 %b) nounwind {
; CHECK-LABEL: abd_ext_i8_undef:
; CHECK: // %bb.0:
-; CHECK-NEXT: and w8, w0, #0xff
-; CHECK-NEXT: sub w8, w8, w1, uxtb
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: and w8, w1, #0xff
+; CHECK-NEXT: and w9, w0, #0xff
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, pl
; CHECK-NEXT: ret
%aext = zext i8 %a to i64
@@ -59,9 +59,9 @@ define i8 @abd_ext_i8_undef(i8 %a, i8 %b) nounwind {
define i16 @abd_ext_i16(i16 %a, i16 %b) nounwind {
; CHECK-LABEL: abd_ext_i16:
; CHECK: // %bb.0:
-; CHECK-NEXT: and w8, w0, #0xffff
-; CHECK-NEXT: sub w8, w8, w1, uxth
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: and w8, w1, #0xffff
+; CHECK-NEXT: and w9, w0, #0xffff
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, pl
; CHECK-NEXT: ret
%aext = zext i16 %a to i64
@@ -94,9 +94,9 @@ define i16 @abd_ext_i16_i32(i16 %a, i32 %b) nounwind {
define i16 @abd_ext_i16_undef(i16 %a, i16 %b) nounwind {
; CHECK-LABEL: abd_ext_i16_undef:
; CHECK: // %bb.0:
-; CHECK-NEXT: and w8, w0, #0xffff
-; CHECK-NEXT: sub w8, w8, w1, uxth
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: and w8, w1, #0xffff
+; CHECK-NEXT: and w9, w0, #0xffff
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, pl
; CHECK-NEXT: ret
%aext = zext i16 %a to i64
diff --git a/llvm/test/CodeGen/AArch64/abdu.ll b/llvm/test/CodeGen/AArch64/abdu.ll
index 4585de96c848f..a058121236a5b 100644
--- a/llvm/test/CodeGen/AArch64/abdu.ll
+++ b/llvm/test/CodeGen/AArch64/abdu.ll
@@ -8,9 +8,9 @@
define i8 @abd_ext_i8(i8 %a, i8 %b) nounwind {
; CHECK-LABEL: abd_ext_i8:
; CHECK: // %bb.0:
-; CHECK-NEXT: and w8, w0, #0xff
-; CHECK-NEXT: sub w8, w8, w1, uxtb
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: and w8, w1, #0xff
+; CHECK-NEXT: and w9, w0, #0xff
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%aext = zext i8 %a to i64
@@ -24,9 +24,9 @@ define i8 @abd_ext_i8(i8 %a, i8 %b) nounwind {
define i8 @abd_ext_i8_i16(i8 %a, i16 %b) nounwind {
; CHECK-LABEL: abd_ext_i8_i16:
; CHECK: // %bb.0:
-; CHECK-NEXT: and w8, w0, #0xff
-; CHECK-NEXT: sub w8, w8, w1, uxth
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: and w8, w1, #0xffff
+; CHECK-NEXT: and w9, w0, #0xff
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%aext = zext i8 %a to i64
@@ -40,9 +40,9 @@ define i8 @abd_ext_i8_i16(i8 %a, i16 %b) nounwind {
define i8 @abd_ext_i8_undef(i8 %a, i8 %b) nounwind {
; CHECK-LABEL: abd_ext_i8_undef:
; CHECK: // %bb.0:
-; CHECK-NEXT: and w8, w0, #0xff
-; CHECK-NEXT: sub w8, w8, w1, uxtb
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: and w8, w1, #0xff
+; CHECK-NEXT: and w9, w0, #0xff
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%aext = zext i8 %a to i64
@@ -56,9 +56,9 @@ define i8 @abd_ext_i8_undef(i8 %a, i8 %b) nounwind {
define i16 @abd_ext_i16(i16 %a, i16 %b) nounwind {
; CHECK-LABEL: abd_ext_i16:
; CHECK: // %bb.0:
-; CHECK-NEXT: and w8, w0, #0xffff
-; CHECK-NEXT: sub w8, w8, w1, uxth
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: and w8, w1, #0xffff
+; CHECK-NEXT: and w9, w0, #0xffff
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%aext = zext i16 %a to i64
@@ -88,9 +88,9 @@ define i16 @abd_ext_i16_i32(i16 %a, i32 %b) nounwind {
define i16 @abd_ext_i16_undef(i16 %a, i16 %b) nounwind {
; CHECK-LABEL: abd_ext_i16_undef:
; CHECK: // %bb.0:
-; CHECK-NEXT: and w8, w0, #0xffff
-; CHECK-NEXT: sub w8, w8, w1, uxth
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: and w8, w1, #0xffff
+; CHECK-NEXT: and w9, w0, #0xffff
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%aext = zext i16 %a to i64
@@ -224,9 +224,9 @@ define i128 @abd_ext_i128_undef(i128 %a, i128 %b) nounwind {
define i8 @abd_minmax_i8(i8 %a, i8 %b) nounwind {
; CHECK-LABEL: abd_minmax_i8:
; CHECK: // %bb.0:
-; CHECK-NEXT: and w8, w0, #0xff
-; CHECK-NEXT: sub w8, w8, w1, uxtb
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: and w8, w1, #0xff
+; CHECK-NEXT: and w9, w0, #0xff
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%min = call i8 @llvm.umin.i8(i8 %a, i8 %b)
@@ -238,9 +238,9 @@ define i8 @abd_minmax_i8(i8 %a, i8 %b) nounwind {
define i16 @abd_minmax_i16(i16 %a, i16 %b) nounwind {
; CHECK-LABEL: abd_minmax_i16:
; CHECK: // %bb.0:
-; CHECK-NEXT: and w8, w0, #0xffff
-; CHECK-NEXT: sub w8, w8, w1, uxth
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: and w8, w1, #0xffff
+; CHECK-NEXT: and w9, w0, #0xffff
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%min = call i16 @llvm.umin.i16(i16 %a, i16 %b)
@@ -300,9 +300,9 @@ define i128 @abd_minmax_i128(i128 %a, i128 %b) nounwind {
define i8 @abd_cmp_i8(i8 %a, i8 %b) nounwind {
; CHECK-LABEL: abd_cmp_i8:
; CHECK: // %bb.0:
-; CHECK-NEXT: and w8, w0, #0xff
-; CHECK-NEXT: sub w8, w8, w1, uxtb
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: and w8, w1, #0xff
+; CHECK-NEXT: and w9, w0, #0xff
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%cmp = icmp ugt i8 %a, %b
@@ -315,9 +315,9 @@ define i8 @abd_cmp_i8(i8 %a, i8 %b) nounwind {
define i16 @abd_cmp_i16(i16 %a, i16 %b) nounwind {
; CHECK-LABEL: abd_cmp_i16:
; CHECK: // %bb.0:
-; CHECK-NEXT: and w8, w0, #0xffff
-; CHECK-NEXT: sub w8, w8, w1, uxth
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: and w8, w1, #0xffff
+; CHECK-NEXT: and w9, w0, #0xffff
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%cmp = icmp uge i16 %a, %b
@@ -382,11 +382,11 @@ define i64 @vector_legalized(i16 %a, i16 %b) {
; CHECK-LABEL: vector_legalized:
; CHECK: // %bb.0:
; CHECK-NEXT: movi v0.2d, #0000000000000000
-; CHECK-NEXT: and w8, w0, #0xffff
-; CHECK-NEXT: sub w8, w8, w1, uxth
-; CHECK-NEXT: cmp w8, #0
-; CHECK-NEXT: addp d0, v0.2d
+; CHECK-NEXT: and w8, w1, #0xffff
+; CHECK-NEXT: and w9, w0, #0xffff
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w8, w8, mi
+; CHECK-NEXT: addp d0, v0.2d
; CHECK-NEXT: fmov x9, d0
; CHECK-NEXT: add x0, x9, x8
; CHECK-NEXT: ret
@@ -407,9 +407,9 @@ define i64 @vector_legalized(i16 %a, i16 %b) {
define i8 @abd_select_i8(i8 %a, i8 %b) nounwind {
; CHECK-LABEL: abd_select_i8:
; CHECK: // %bb.0:
-; CHECK-NEXT: and w8, w0, #0xff
-; CHECK-NEXT: sub w8, w8, w1, uxtb
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: and w8, w1, #0xff
+; CHECK-NEXT: and w9, w0, #0xff
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%cmp = icmp ult i8 %a, %b
@@ -422,9 +422,9 @@ define i8 @abd_select_i8(i8 %a, i8 %b) nounwind {
define i16 @abd_select_i16(i16 %a, i16 %b) nounwind {
; CHECK-LABEL: abd_select_i16:
; CHECK: // %bb.0:
-; CHECK-NEXT: and w8, w0, #0xffff
-; CHECK-NEXT: sub w8, w8, w1, uxth
-; CHECK-NEXT: cmp w8, #0
+; CHECK-NEXT: and w8, w1, #0xffff
+; CHECK-NEXT: and w9, w0, #0xffff
+; CHECK-NEXT: subs w8, w9, w8
; CHECK-NEXT: cneg w0, w8, mi
; CHECK-NEXT: ret
%cmp = icmp ule i16 %a, %b
|
Is there some reason to avoid generating |
That would be another option, but I believe that on Neoverse V2 and a few other cores, extended ADDS/SUBS have higher latency (2 vs 1) and lower throughput (2 vs 3) than the non-extended versions, and by doing the extension separately we may expose more ILP. |
I did some digging and, with this patch, we should generate the combined extended forms when compiling for code size. For example, given: #include <stdint.h>
typedef uint8_t u8;
typedef int8_t i8;
u8 src_u8(u8 a, u8 b) {
return (a > b) ? a - b : b - a;
}
i8 src_i8(i8 a, i8 b) {
return (a > b) ? a - b : b - a;
} With
With
Perhaps the restrictions in |
This patch avoids a comparison against zero when lowering abs(sub(a, b)) patterns, instead reusing the condition codes generated by a subs of the operands directly. For example, currently: ``` sxtb w8, w0 sub w8, w8, w1, sxtb cmp w8, #0 cneg w0, w8, pl ``` becomes: ``` sxtb w8, w0 sxtb w9, w1 subs w8, w9, w8 cneg w0, w8, gt ``` Whilst this doesn't decrease the number of instructions utilised, the new version exposes more ILP and uses ``cheaper'' instructions, typically having lower latency and/or higher throughput. This patch also includes a somewhat orthogonal change in performNegCSelCombine, which I included to avoid a code generation regression in CodeGen/AArch64/abd[su]-neg.ll due to the combine negating the operands of the csel, leading to an extra sub instruction. If preferable, I can open a separate PR for this.
Instead of reversing the operands when they are negations of each other, peek through already existing negations. This has the same effect when the operands are negations of each other but should be more generally useful.
ce097ff
to
cba370b
Compare
I've rebased this to update tests that had changed with #151177. I've also simplified the changes in |
By combining the original SUB node to the SUBS created when folding subs(sub(a,b), 0) -> subs(a,b), we no longer create ``swapped'' SUB nodes, such as sub(b,a), after performNegCSelCombine. Therefore, the changes to avoid code gen regressions in some cases are no longer required.
This patch avoids a comparison against zero when lowering abs(sub(a, b)) patterns, instead reusing the condition codes generated by a subs of the operands directly.
For example, currently:
becomes:
Together with #151177, this should handle the remaining patterns in #118413 (although I believe now some of the
tgt
cases in the issue should generate slightly worse code than thesrc
cases, presumably because the former are not combined as abd[us] nodes, but I believe these can be taken on separately).