[MachineLICM] Let targets decide if copy-like instructions are cheap #146599

guy-david · 2025-07-01T20:38:06Z

When checking whether it is profitable to hoist an instruction, the pass may override a target's ruling because it assumes that all COPY instructions are cheap, and that may not be the case for all micro-architectures (especially for when copying between different register classes).

On AArch64 there's 0% difference in performance in LLVM's test-suite with this change. Additionally, very few tests were affected which shows how it is not so useful to keep it.

x86 performance is slightly better (but maybe that's just noise) for an A/B comparison consisting of five iterations on LLVM's test suite (Ryzen 5950X on Ubuntu):

$ ./utils/compare.py build-a/results* vs build-b/results* --lhs-name base --rhs-name patch --absolute-diff
Tests: 3341
Metric: exec_time

Program                                       exec_time                 
                                              base      patch     diff  
LoopVector...meChecks4PointersDBeforeA/1000   824613.68 825394.06 780.38
LoopVector...timeChecks4PointersDBeforeA/32    18763.60  19486.02 722.42
LCALS/Subs...test:BM_MAT_X_MAT_LAMBDA/44217    37109.92  37572.52 462.60
LoopVector...ntimeChecks4PointersDAfterA/32    14211.35  14562.14 350.79
LoopVector...timeChecks4PointersDEqualsA/32    14221.44  14562.85 341.40
LoopVector...intersAllDisjointIncreasing/32    14222.73  14562.20 339.47
LoopVector...intersAllDisjointDecreasing/32    14223.85  14563.17 339.32
LoopVector...nLoopFrom_uint32_t_To_uint8_t_      739.60    807.45  67.86
harris/har...est:BENCHMARK_HARRIS/2048/2048    15953.77  15998.94  45.17
LoopVector...nLoopFrom_uint8_t_To_uint16_t_      301.94    331.21  29.27
LCALS/Subs...Raw.test:BM_DISC_ORD_RAW/44217      616.35    637.13  20.78
LCALS/Subs...Raw.test:BM_MAT_X_MAT_RAW/5001     3814.95   3833.70  18.75
LCALS/Subs...Raw.test:BM_HYDRO_2D_RAW/44217      812.98    830.64  17.66
LCALS/Subs...test:BM_IMP_HYDRO_2D_RAW/44217      811.26    828.13  16.87
ImageProce...ENCHMARK_BILATERAL_FILTER/64/4      714.77    726.23  11.46
           exec_time                            
l/r             base          patch         diff
count  3341.000000    3341.000000    3341.000000
mean   903.866450     899.732349    -4.134101   
std    20635.900959   20565.289417   115.346928 
min    0.000000       0.000000      -3380.455787
25%    0.000000       0.000000       0.000000   
50%    0.000000       0.000000       0.000000   
75%    1.806500       1.836397       0.000100   
max    824613.680801  825394.062500  780.381699

When checking whether it is profitable to hoist an instruction, the pass may override a target's ruling because it assumes that all COPY instructions are cheap, and that may not be the case for all micro-architectures. On AArch64 there's 0% difference in performance in LLVM's test-suite. Additionally, very few tests were affected by this change which shows how useful it is to keep it.

llvmbot · 2025-07-01T20:38:39Z

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-backend-aarch64

Author: Guy David (guy-david)

Changes

When checking whether it is profitable to hoist an instruction, the pass may override a target's ruling because it assumes that all COPY instructions are cheap, and that may not be the case for all micro-architectures (especially for when copying between different register classes).

On AArch64 there's 0% difference in performance in LLVM's test-suite with this change. Additionally, very few tests were affected which shows how it is not so useful to keep it.

Patch is 41.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/146599.diff

9 Files Affected:

(modified) llvm/lib/CodeGen/MachineLICM.cpp (+1-1)
(modified) llvm/test/CodeGen/AArch64/dag-combine-concat-vectors.ll (+21-21)
(modified) llvm/test/CodeGen/PowerPC/vsx-fma-m-early.ll (+139-119)
(modified) llvm/test/CodeGen/X86/break-false-dep.ll (+22-22)
(modified) llvm/test/CodeGen/X86/dag-update-nodetomatch.ll (+38-40)
(removed) llvm/test/CodeGen/X86/memfold-mov32r0.ll (-9)
(added) llvm/test/CodeGen/X86/memfold-mov32r0.mir (+143)
(modified) llvm/test/CodeGen/X86/pr57673.ll (+18-18)
(modified) llvm/test/CodeGen/X86/reverse_branches.ll (+5-5)

diff --git a/llvm/lib/CodeGen/MachineLICM.cpp b/llvm/lib/CodeGen/MachineLICM.cpp
index c9079170ca575..70a178f642fb0 100644
--- a/llvm/lib/CodeGen/MachineLICM.cpp
+++ b/llvm/lib/CodeGen/MachineLICM.cpp
@@ -1219,7 +1219,7 @@ bool MachineLICMImpl::HasHighOperandLatency(MachineInstr &MI, unsigned DefIdx,
 /// Return true if the instruction is marked "cheap" or the operand latency
 /// between its def and a use is one or less.
 bool MachineLICMImpl::IsCheapInstruction(MachineInstr &MI) const {
-  if (TII->isAsCheapAsAMove(MI) || MI.isCopyLike())
+  if (TII->isAsCheapAsAMove(MI))
     return true;
 
   bool isCheap = false;
diff --git a/llvm/test/CodeGen/AArch64/dag-combine-concat-vectors.ll b/llvm/test/CodeGen/AArch64/dag-combine-concat-vectors.ll
index 53126a08db86f..2bd04ac30509e 100644
--- a/llvm/test/CodeGen/AArch64/dag-combine-concat-vectors.ll
+++ b/llvm/test/CodeGen/AArch64/dag-combine-concat-vectors.ll
@@ -18,14 +18,28 @@ define fastcc i8 @allocno_reload_assign(ptr %p) {
 ; CHECK-NEXT:    fmov w8, s0
 ; CHECK-NEXT:    movi v0.2d, #0000000000000000
 ; CHECK-NEXT:    mvn w8, w8
+; CHECK-NEXT:    uunpklo z1.h, z0.b
+; CHECK-NEXT:    uunpkhi z2.h, z0.b
 ; CHECK-NEXT:    sbfx x8, x8, #0, #1
 ; CHECK-NEXT:    whilelo p0.b, xzr, x8
+; CHECK-NEXT:    uunpklo z3.s, z1.h
+; CHECK-NEXT:    uunpkhi z4.s, z1.h
+; CHECK-NEXT:    uunpklo z6.s, z2.h
+; CHECK-NEXT:    uunpkhi z16.s, z2.h
 ; CHECK-NEXT:    punpklo p1.h, p0.b
 ; CHECK-NEXT:    punpkhi p0.h, p0.b
 ; CHECK-NEXT:    punpklo p2.h, p1.b
 ; CHECK-NEXT:    punpkhi p4.h, p1.b
+; CHECK-NEXT:    uunpklo z1.d, z3.s
+; CHECK-NEXT:    uunpkhi z2.d, z3.s
 ; CHECK-NEXT:    punpklo p6.h, p0.b
+; CHECK-NEXT:    uunpklo z3.d, z4.s
+; CHECK-NEXT:    uunpkhi z4.d, z4.s
 ; CHECK-NEXT:    punpkhi p0.h, p0.b
+; CHECK-NEXT:    uunpklo z5.d, z6.s
+; CHECK-NEXT:    uunpkhi z6.d, z6.s
+; CHECK-NEXT:    uunpklo z7.d, z16.s
+; CHECK-NEXT:    uunpkhi z16.d, z16.s
 ; CHECK-NEXT:    punpklo p1.h, p2.b
 ; CHECK-NEXT:    punpkhi p2.h, p2.b
 ; CHECK-NEXT:    punpklo p3.h, p4.b
@@ -35,28 +49,14 @@ define fastcc i8 @allocno_reload_assign(ptr %p) {
 ; CHECK-NEXT:    punpklo p7.h, p0.b
 ; CHECK-NEXT:    punpkhi p0.h, p0.b
 ; CHECK-NEXT:  .LBB0_1: // =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    uunpklo z1.h, z0.b
-; CHECK-NEXT:    uunpklo z2.s, z1.h
-; CHECK-NEXT:    uunpkhi z1.s, z1.h
-; CHECK-NEXT:    uunpklo z3.d, z2.s
-; CHECK-NEXT:    uunpkhi z2.d, z2.s
-; CHECK-NEXT:    st1b { z3.d }, p1, [z0.d]
+; CHECK-NEXT:    st1b { z1.d }, p1, [z0.d]
 ; CHECK-NEXT:    st1b { z2.d }, p2, [z0.d]
-; CHECK-NEXT:    uunpklo z2.d, z1.s
-; CHECK-NEXT:    uunpkhi z1.d, z1.s
-; CHECK-NEXT:    st1b { z2.d }, p3, [z0.d]
-; CHECK-NEXT:    uunpkhi z2.h, z0.b
-; CHECK-NEXT:    uunpklo z3.s, z2.h
-; CHECK-NEXT:    uunpkhi z2.s, z2.h
-; CHECK-NEXT:    st1b { z1.d }, p4, [z0.d]
-; CHECK-NEXT:    uunpklo z1.d, z3.s
-; CHECK-NEXT:    st1b { z1.d }, p5, [z0.d]
-; CHECK-NEXT:    uunpkhi z1.d, z3.s
-; CHECK-NEXT:    st1b { z1.d }, p6, [z0.d]
-; CHECK-NEXT:    uunpklo z1.d, z2.s
-; CHECK-NEXT:    st1b { z1.d }, p7, [z0.d]
-; CHECK-NEXT:    uunpkhi z1.d, z2.s
-; CHECK-NEXT:    st1b { z1.d }, p0, [z0.d]
+; CHECK-NEXT:    st1b { z3.d }, p3, [z0.d]
+; CHECK-NEXT:    st1b { z4.d }, p4, [z0.d]
+; CHECK-NEXT:    st1b { z5.d }, p5, [z0.d]
+; CHECK-NEXT:    st1b { z6.d }, p6, [z0.d]
+; CHECK-NEXT:    st1b { z7.d }, p7, [z0.d]
+; CHECK-NEXT:    st1b { z16.d }, p0, [z0.d]
 ; CHECK-NEXT:    str p8, [x0]
 ; CHECK-NEXT:    b .LBB0_1
   br label %1
diff --git a/llvm/test/CodeGen/PowerPC/vsx-fma-m-early.ll b/llvm/test/CodeGen/PowerPC/vsx-fma-m-early.ll
index 9cb2d4444b974..6cfb8b0e73f7c 100644
--- a/llvm/test/CodeGen/PowerPC/vsx-fma-m-early.ll
+++ b/llvm/test/CodeGen/PowerPC/vsx-fma-m-early.ll
@@ -1,17 +1,84 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ;; Tests that the ppc-vsx-fma-mutate pass with the schedule-ppc-vsx-fma-mutation-early pass does not hoist xxspltiw out of loops.
 ; RUN: llc -verify-machineinstrs -mcpu=pwr10 -disable-ppc-vsx-fma-mutation=false \
 ; RUN:   -ppc-asm-full-reg-names -schedule-ppc-vsx-fma-mutation-early \
-; RUN:    -mtriple powerpc64-ibm-aix < %s | FileCheck --check-prefixes=CHECK64,AIX64 %s
+; RUN:    -mtriple powerpc64-ibm-aix < %s | FileCheck --check-prefixes=AIX64 %s
 
 ; RUN: llc -verify-machineinstrs -mcpu=pwr10 -disable-ppc-vsx-fma-mutation=false \
 ; RUN:   -ppc-asm-full-reg-names -schedule-ppc-vsx-fma-mutation-early \
-; RUN:   -mtriple=powerpc64le-unknown-linux-gnu < %s | FileCheck --check-prefixes=CHECK64,LINUX64 %s
+; RUN:   -mtriple=powerpc64le-unknown-linux-gnu < %s | FileCheck --check-prefixes=LINUX64 %s
 
 ; RUN: llc -verify-machineinstrs -mcpu=pwr10 -disable-ppc-vsx-fma-mutation=false \
 ; RUN:   -ppc-asm-full-reg-names -schedule-ppc-vsx-fma-mutation-early \
 ; RUN:    -mtriple powerpc-ibm-aix < %s | FileCheck --check-prefix=CHECK32 %s
 
 define void @bar(ptr noalias nocapture noundef writeonly %__output_a, ptr noalias nocapture noundef readonly %var1321In_a, ptr noalias nocapture noundef readonly %n) {
+; AIX64-LABEL: bar:
+; AIX64:       # %bb.0: # %entry
+; AIX64-NEXT:    lwz r5, 0(r5)
+; AIX64-NEXT:    cmpwi r5, 1
+; AIX64-NEXT:    bltlr cr0
+; AIX64-NEXT:  # %bb.1: # %for.body.preheader
+; AIX64-NEXT:    xxspltiw vs0, 1069066811
+; AIX64-NEXT:    xxspltiw vs1, 1170469888
+; AIX64-NEXT:    mtctr r5
+; AIX64-NEXT:    li r5, 0
+; AIX64-NEXT:    .align 5
+; AIX64-NEXT:  L..BB0_2: # %for.body
+; AIX64-NEXT:    #
+; AIX64-NEXT:    lxvx vs2, r4, r5
+; AIX64-NEXT:    xvmaddmsp vs2, vs0, vs1
+; AIX64-NEXT:    stxvx vs2, r3, r5
+; AIX64-NEXT:    addi r5, r5, 16
+; AIX64-NEXT:    bdnz L..BB0_2
+; AIX64-NEXT:  # %bb.3: # %for.end
+; AIX64-NEXT:    blr
+;
+; LINUX64-LABEL: bar:
+; LINUX64:       # %bb.0: # %entry
+; LINUX64-NEXT:    lwz r5, 0(r5)
+; LINUX64-NEXT:    cmpwi r5, 1
+; LINUX64-NEXT:    bltlr cr0
+; LINUX64-NEXT:  # %bb.1: # %for.body.preheader
+; LINUX64-NEXT:    xxspltiw vs0, 1069066811
+; LINUX64-NEXT:    xxspltiw vs1, 1170469888
+; LINUX64-NEXT:    mtctr r5
+; LINUX64-NEXT:    li r5, 0
+; LINUX64-NEXT:    .p2align 5
+; LINUX64-NEXT:  .LBB0_2: # %for.body
+; LINUX64-NEXT:    #
+; LINUX64-NEXT:    lxvx vs2, r4, r5
+; LINUX64-NEXT:    xvmaddmsp vs2, vs0, vs1
+; LINUX64-NEXT:    stxvx vs2, r3, r5
+; LINUX64-NEXT:    addi r5, r5, 16
+; LINUX64-NEXT:    bdnz .LBB0_2
+; LINUX64-NEXT:  # %bb.3: # %for.end
+; LINUX64-NEXT:    blr
+;
+; CHECK32-LABEL: bar:
+; CHECK32:       # %bb.0: # %entry
+; CHECK32-NEXT:    lwz r5, 0(r5)
+; CHECK32-NEXT:    cmpwi r5, 0
+; CHECK32-NEXT:    blelr cr0
+; CHECK32-NEXT:  # %bb.1: # %for.body.preheader
+; CHECK32-NEXT:    xxspltiw vs0, 1069066811
+; CHECK32-NEXT:    xxspltiw vs1, 1170469888
+; CHECK32-NEXT:    li r6, 0
+; CHECK32-NEXT:    li r7, 0
+; CHECK32-NEXT:    .align 4
+; CHECK32-NEXT:  L..BB0_2: # %for.body
+; CHECK32-NEXT:    #
+; CHECK32-NEXT:    slwi r8, r7, 4
+; CHECK32-NEXT:    addic r7, r7, 1
+; CHECK32-NEXT:    addze r6, r6
+; CHECK32-NEXT:    lxvx vs2, r4, r8
+; CHECK32-NEXT:    xvmaddmsp vs2, vs0, vs1
+; CHECK32-NEXT:    stxvx vs2, r3, r8
+; CHECK32-NEXT:    xor r8, r7, r5
+; CHECK32-NEXT:    or. r8, r8, r6
+; CHECK32-NEXT:    bne cr0, L..BB0_2
+; CHECK32-NEXT:  # %bb.3: # %for.end
+; CHECK32-NEXT:    blr
 entry:
   %0 = load i32, ptr %n, align 4
   %cmp11 = icmp sgt i32 %0, 0
@@ -28,7 +95,7 @@ for.body:
   %add.ptr.val = load <4 x float>, ptr %add.ptr, align 1
   %2 = tail call contract <4 x float> @llvm.fma.v4f32(<4 x float> %add.ptr.val, <4 x float> <float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000>, <4 x float> <float 6.270500e+03, float 6.270500e+03, float 6.270500e+03, float 6.270500e+03>)
   %add.ptr6 = getelementptr inbounds float, ptr %__output_a, i64 %1
-  store <4 x float> %2, ptr %add.ptr6, align 1 
+  store <4 x float> %2, ptr %add.ptr6, align 1
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
   %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
   br i1 %exitcond.not, label %for.end, label %for.body
@@ -38,6 +105,74 @@ for.end:
 }
 
 define void @foo(i1 %cmp97) #0 {
+; AIX64-LABEL: foo:
+; AIX64:       # %bb.0: # %entry
+; AIX64-NEXT:    andi. r3, r3, 1
+; AIX64-NEXT:    bclr 4, gt, 0
+; AIX64-NEXT:  # %bb.1: # %for.body.preheader
+; AIX64-NEXT:    xxlxor f0, f0, f0
+; AIX64-NEXT:    xxlxor f2, f2, f2
+; AIX64-NEXT:    xxmrghd vs1, vs0, vs0
+; AIX64-NEXT:    xvcvdpsp vs34, vs1
+; AIX64-NEXT:    xxlxor vs1, vs1, vs1
+; AIX64-NEXT:    .align 4
+; AIX64-NEXT:  L..BB1_2: # %for.body
+; AIX64-NEXT:    #
+; AIX64-NEXT:    xxmrghd vs2, vs2, vs0
+; AIX64-NEXT:    xvcvdpsp vs35, vs2
+; AIX64-NEXT:    xxspltiw vs2, 1170469888
+; AIX64-NEXT:    vmrgew v3, v3, v2
+; AIX64-NEXT:    xvcmpgtsp vs3, vs1, vs35
+; AIX64-NEXT:    xvmaddasp vs2, vs35, vs1
+; AIX64-NEXT:    xxland vs2, vs3, vs2
+; AIX64-NEXT:    xscvspdpn f2, vs2
+; AIX64-NEXT:    b L..BB1_2
+;
+; LINUX64-LABEL: foo:
+; LINUX64:       # %bb.0: # %entry
+; LINUX64-NEXT:    andi. r3, r3, 1
+; LINUX64-NEXT:    bclr 4, gt, 0
+; LINUX64-NEXT:  # %bb.1: # %for.body.preheader
+; LINUX64-NEXT:    xxlxor f0, f0, f0
+; LINUX64-NEXT:    xxlxor f2, f2, f2
+; LINUX64-NEXT:    xxspltd vs1, vs0, 0
+; LINUX64-NEXT:    xvcvdpsp vs34, vs1
+; LINUX64-NEXT:    xxlxor vs1, vs1, vs1
+; LINUX64-NEXT:    .p2align 4
+; LINUX64-NEXT:  .LBB1_2: # %for.body
+; LINUX64-NEXT:    #
+; LINUX64-NEXT:    xxmrghd vs2, vs0, vs2
+; LINUX64-NEXT:    xvcvdpsp vs35, vs2
+; LINUX64-NEXT:    xxspltiw vs2, 1170469888
+; LINUX64-NEXT:    vmrgew v3, v2, v3
+; LINUX64-NEXT:    xvcmpgtsp vs3, vs1, vs35
+; LINUX64-NEXT:    xvmaddasp vs2, vs35, vs1
+; LINUX64-NEXT:    xxland vs2, vs3, vs2
+; LINUX64-NEXT:    xxsldwi vs2, vs2, vs2, 3
+; LINUX64-NEXT:    xscvspdpn f2, vs2
+; LINUX64-NEXT:    b .LBB1_2
+;
+; CHECK32-LABEL: foo:
+; CHECK32:       # %bb.0: # %entry
+; CHECK32-NEXT:    andi. r3, r3, 1
+; CHECK32-NEXT:    bclr 4, gt, 0
+; CHECK32-NEXT:  # %bb.1: # %for.body.preheader
+; CHECK32-NEXT:    lwz r3, L..C0(r2) # %const.0
+; CHECK32-NEXT:    xxlxor f1, f1, f1
+; CHECK32-NEXT:    xxlxor vs0, vs0, vs0
+; CHECK32-NEXT:    xscvdpspn vs35, f1
+; CHECK32-NEXT:    lxv vs34, 0(r3)
+; CHECK32-NEXT:    .align 4
+; CHECK32-NEXT:  L..BB1_2: # %for.body
+; CHECK32-NEXT:    #
+; CHECK32-NEXT:    xscvdpspn vs36, f1
+; CHECK32-NEXT:    xxspltiw vs1, 1170469888
+; CHECK32-NEXT:    vperm v4, v4, v3, v2
+; CHECK32-NEXT:    xvcmpgtsp vs2, vs0, vs36
+; CHECK32-NEXT:    xvmaddasp vs1, vs36, vs0
+; CHECK32-NEXT:    xxland vs1, vs2, vs1
+; CHECK32-NEXT:    xscvspdpn f1, vs1
+; CHECK32-NEXT:    b L..BB1_2
 entry:
   br i1 %cmp97, label %for.body, label %for.end
 
@@ -57,122 +192,7 @@ for.end:                                          ; preds = %entry
 }
 
 ; Function Attrs: mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none)
-declare <4 x float> @llvm.fma.v4f32(<4 x float>, <4 x float>, <4 x float>) 
+declare <4 x float> @llvm.fma.v4f32(<4 x float>, <4 x float>, <4 x float>)
 
 ; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
 declare <4 x i32> @llvm.ppc.vsx.xvcmpgtsp(<4 x float>, <4 x float>)
-
-; CHECK64:      bar:
-; CHECK64:      # %bb.0:                                # %entry
-; CHECK64-NEXT:         lwz r5, 0(r5)
-; CHECK64-NEXT:         cmpwi   r5, 1
-; CHECK64-NEXT:         bltlr   cr0
-; CHECK64-NEXT: # %bb.1:                                # %for.body.preheader
-; CHECK64-NEXT:         xxspltiw vs0, 1069066811
-; CHECK64-NEXT:         xxspltiw vs1, 1170469888
-; CHECK64-NEXT:         mtctr r5
-; CHECK64-NEXT:         li r5, 0
-; CHECK64-NEXT:         {{.*}}align  5
-; CHECK64-NEXT: [[L2_bar:.*]]:                               # %for.body
-; CHECK64-NEXT:                                         # =>This Inner Loop Header: Depth=1
-; CHECK64-NEXT:         lxvx vs2, r4, r5
-; CHECK64-NEXT:         xvmaddmsp vs2, vs0, vs1
-; CHECK64-NEXT:         stxvx vs2, r3, r5
-; CHECK64-NEXT:         addi r5, r5, 16
-; CHECK64-NEXT:         bdnz [[L2_bar]]
-; CHECK64-NEXT: # %bb.3:                                # %for.end
-; CHECK64-NEXT:         blr
-
-; AIX64:      .foo:
-; AIX64-NEXT: # %bb.0:                                # %entry
-; AIX64-NEXT:   andi. r3, r3, 1
-; AIX64-NEXT:   bclr 4, gt, 0
-; AIX64-NEXT: # %bb.1:                                # %for.body.preheader
-; AIX64-NEXT:   xxlxor f0, f0, f0
-; AIX64-NEXT:   xxlxor vs1, vs1, vs1
-; AIX64-NEXT:   xxlxor f2, f2, f2
-; AIX64-NEXT:   .align  4
-; AIX64-NEXT: L..BB1_2:                               # %for.body
-; AIX64-NEXT:                                         # =>This Inner Loop Header: Depth=1
-; AIX64-NEXT:   xxmrghd vs2, vs2, vs0
-; AIX64-NEXT:   xvcvdpsp vs34, vs2
-; AIX64-NEXT:   xxmrghd vs2, vs0, vs0
-; AIX64-NEXT:   xvcvdpsp vs35, vs2
-; AIX64-NEXT:   xxspltiw vs2, 1170469888
-; AIX64-NEXT:   vmrgew v2, v2, v3
-; AIX64-NEXT:   xvcmpgtsp vs3, vs1, vs34
-; AIX64-NEXT:   xvmaddasp vs2, vs34, vs1
-; AIX64-NEXT:   xxland vs2, vs3, vs2
-; AIX64-NEXT:   xscvspdpn f2, vs2
-; AIX64-NEXT:   b L..BB1_2
-
-; LINUX64:      foo:                                    # @foo
-; LINUX64-NEXT: .Lfunc_begin1:
-; LINUX64-NEXT:         .cfi_startproc
-; LINUX64-NEXT: # %bb.0:                                # %entry
-; LINUX64-NEXT:         andi. r3, r3, 1
-; LINUX64-NEXT:         bclr 4, gt, 0
-; LINUX64-NEXT: # %bb.1:                                # %for.body.preheader
-; LINUX64-NEXT:         xxlxor f0, f0, f0
-; LINUX64-NEXT:         xxlxor vs1, vs1, vs1
-; LINUX64-NEXT:         xxlxor f2, f2, f2
-; LINUX64-NEXT:         .p2align        4
-; LINUX64-NEXT: .LBB1_2:                                # %for.body
-; LINUX64-NEXT:                                         # =>This Inner Loop Header: Depth=1
-; LINUX64-NEXT:         xxmrghd vs2, vs0, vs2
-; LINUX64-NEXT:         xvcvdpsp vs34, vs2
-; LINUX64-NEXT:         xxspltd vs2, vs0, 0
-; LINUX64-NEXT:         xvcvdpsp vs35, vs2
-; LINUX64-NEXT:         xxspltiw vs2, 1170469888
-; LINUX64-NEXT:         vmrgew v2, v3, v2
-; LINUX64-NEXT:         xvcmpgtsp vs3, vs1, vs34
-; LINUX64-NEXT:         xvmaddasp vs2, vs34, vs1
-; LINUX64-NEXT:         xxland vs2, vs3, vs2
-; LINUX64-NEXT:         xxsldwi vs2, vs2, vs2, 3
-; LINUX64-NEXT:         xscvspdpn f2, vs2
-; LINUX64-NEXT:         b .LBB1_2
-
-; CHECK32:        .bar:
-; CHECK32-NEXT: # %bb.0:                                # %entry
-; CHECK32-NEXT:       lwz r5, 0(r5)
-; CHECK32-NEXT:       cmpwi   r5, 0
-; CHECK32-NEXT:       blelr cr0
-; CHECK32-NEXT: # %bb.1:                                # %for.body.preheader
-; CHECK32-NEXT:       xxspltiw vs0, 1069066811
-; CHECK32-NEXT:       xxspltiw vs1, 1170469888
-; CHECK32-NEXT:       li r6, 0
-; CHECK32-NEXT:       li r7, 0
-; CHECK32-NEXT:       .align  4
-; CHECK32-NEXT: [[L2_foo:.*]]:                               # %for.body
-; CHECK32-NEXT:                                         # =>This Inner Loop Header: Depth=1
-; CHECK32-NEXT:       slwi r8, r7, 4
-; CHECK32-NEXT:       addic r7, r7, 1
-; CHECK32-NEXT:       addze r6, r6
-; CHECK32-NEXT:       lxvx vs2, r4, r8
-; CHECK32-NEXT:       xvmaddmsp vs2, vs0, vs1
-; CHECK32-NEXT:       stxvx vs2, r3, r8
-; CHECK32-NEXT:       xor r8, r7, r5
-; CHECK32-NEXT:       or. r8, r8, r6
-; CHECK32-NEXT:       bne     cr0, [[L2_foo]]
-
-; CHECK32:      .foo:
-; CHECK32-NEXT: # %bb.0:                                # %entry
-; CHECK32-NEXT:         andi. r3, r3, 1
-; CHECK32-NEXT:         bclr 4, gt, 0
-; CHECK32-NEXT: # %bb.1:                                # %for.body.preheader
-; CHECK32-NEXT:         lwz r3, L..C0(r2)                       # %const.0
-; CHECK32-NEXT:         xxlxor f1, f1, f1
-; CHECK32-NEXT:         xxlxor vs0, vs0, vs0
-; CHECK32-NEXT:         xscvdpspn vs35, f1
-; CHECK32-NEXT:         lxv vs34, 0(r3)
-; CHECK32-NEXT:         .align  4
-; CHECK32-NEXT: L..BB1_2:                               # %for.body
-; CHECK32-NEXT:                                         # =>This Inner Loop Header: Depth=1
-; CHECK32-NEXT:         xscvdpspn vs36, f1
-; CHECK32-NEXT:         xxspltiw vs1, 1170469888
-; CHECK32-NEXT:         vperm v4, v4, v3, v2
-; CHECK32-NEXT:         xvcmpgtsp vs2, vs0, vs36
-; CHECK32-NEXT:         xvmaddasp vs1, vs36, vs0
-; CHECK32-NEXT:         xxland vs1, vs2, vs1
-; CHECK32-NEXT:         xscvspdpn f1, vs1
-; CHECK32-NEXT:         b L..BB1_2
diff --git a/llvm/test/CodeGen/X86/break-false-dep.ll b/llvm/test/CodeGen/X86/break-false-dep.ll
index 6943622fac7f2..a6ad3018e052c 100644
--- a/llvm/test/CodeGen/X86/break-false-dep.ll
+++ b/llvm/test/CodeGen/X86/break-false-dep.ll
@@ -472,17 +472,17 @@ define dso_local void @loopdep3() {
 ; SSE-WIN-NEXT:    movaps %xmm6, (%rsp) # 16-byte Spill
 ; SSE-WIN-NEXT:    .seh_savexmm %xmm6, 0
 ; SSE-WIN-NEXT:    .seh_endprologue
-; SSE-WIN-NEXT:    xorl %eax, %eax
-; SSE-WIN-NEXT:    leaq v(%rip), %rcx
-; SSE-WIN-NEXT:    leaq x(%rip), %rdx
-; SSE-WIN-NEXT:    leaq y(%rip), %r8
-; SSE-WIN-NEXT:    leaq z(%rip), %r9
-; SSE-WIN-NEXT:    leaq w(%rip), %r10
+; SSE-WIN-NEXT:    leaq v(%rip), %rax
+; SSE-WIN-NEXT:    leaq x(%rip), %rcx
+; SSE-WIN-NEXT:    leaq y(%rip), %rdx
+; SSE-WIN-NEXT:    leaq z(%rip), %r8
+; SSE-WIN-NEXT:    leaq w(%rip), %r9
+; SSE-WIN-NEXT:    xorl %r10d, %r10d
 ; SSE-WIN-NEXT:    .p2align 4
 ; SSE-WIN-NEXT:  .LBB8_1: # %for.cond1.preheader
 ; SSE-WIN-NEXT:    # =>This Loop Header: Depth=1
 ; SSE-WIN-NEXT:    # Child Loop BB8_2 Depth 2
-; SSE-WIN-NEXT:    movq %rcx, %r11
+; SSE-WIN-NEXT:    movq %rax, %r11
 ; SSE-WIN-NEXT:    xorl %esi, %esi
 ; SSE-WIN-NEXT:    .p2align 4
 ; SSE-WIN-NEXT:  .LBB8_2: # %for.body3
@@ -490,10 +490,10 @@ define dso_local void @loopdep3() {
 ; SSE-WIN-NEXT:    # => This Inner Loop Header: Depth=2
 ; SSE-WIN-NEXT:    xorps %xmm0, %xmm0
 ; SSE-WIN-NEXT:    cvtsi2sdl (%r11), %xmm0
+; SSE-WIN-NEXT:    mulsd (%rsi,%rcx), %xmm0
 ; SSE-WIN-NEXT:    mulsd (%rsi,%rdx), %xmm0
 ; SSE-WIN-NEXT:    mulsd (%rsi,%r8), %xmm0
-; SSE-WIN-NEXT:    mulsd (%rsi,%r9), %xmm0
-; SSE-WIN-NEXT:    movsd %xmm0, (%rsi,%r10)
+; SSE-WIN-NEXT:    movsd %xmm0, (%rsi,%r9)
 ; SSE-WIN-NEXT:    #APP
 ; SSE-WIN-NEXT:    #NO_APP
 ; SSE-WIN-NEXT:    addq $8, %rsi
@@ -502,8 +502,8 @@ define dso_local void @loopdep3() {
 ; SSE-WIN-NEXT:    jne .LBB8_2
 ; SSE-WIN-NEXT:  # %bb.3: # %for.inc14
 ; SSE-WIN-NEXT:    # in Loop: Header=BB8_1 Depth=1
-; SSE-WIN-NEXT:    incl %eax
-; SSE-WIN-NEXT:    cmpl $100000, %eax # imm = 0x186A0
+; SSE-WIN-NEXT:    incl %r10d
+; SSE-WIN-NEXT:    cmpl $100000, %r10d # imm = 0x186A0
 ; SSE-WIN-NEXT:    jne .LBB8_1
 ; SSE-WIN-NEXT:  # %bb.4: # %for.end16
 ; SSE-WIN-NEXT:    movaps (%rsp), %xmm6 # 16-byte Reload
@@ -550,17 +550,17 @@ define dso_local void @loopdep3() {
 ; AVX-NEXT:    vmovaps %xmm6, (%rsp) # 16-byte Spill
 ; AVX-NEXT:    .seh_savexmm %xmm6, 0
 ; AVX-NEXT:    .seh_endprologue
-; AVX-NEXT:    xorl %eax, %eax
-; AVX-NEXT:    leaq v(%rip), %rcx
-; AVX-NEXT:    leaq x(%rip), %rdx
-; AVX-NEXT:    leaq y(%rip), %r8
-; AVX-NEXT:    leaq z(%rip), %r9
-; AVX-NEXT:    leaq w(%rip), %r10
+; AVX-NEXT:    leaq v(%rip), %rax
+; AVX-NEXT:    leaq x(%rip), %rcx
+; AVX-NEXT:    leaq y(%rip), %rdx
+; AVX-NEXT:    leaq z(%rip), %r8
+; AVX-NEXT:    leaq w(%rip), %r9
+; AVX-NEXT:    xorl %r10d, %r10d
 ; AVX-NEXT:    .p2align 4
 ; AVX-NEXT:  .LBB8_1: # %for.cond1.preheader
 ; AVX-NEXT:    # =>This Loop Header: Depth=1
 ; AVX-NEXT:    # Child Loop BB8_2 Depth 2
-; AVX-NEXT:    movq %rcx, %r11
+; AVX-NEXT:    movq %rax, %r11
 ; AVX-NEXT:    xorl %esi, %esi
 ; AVX-NEXT:    .p2align 4
 ; AVX-NEXT:  .LBB8_2: # %for.body3
@@ -568,10 +568,10 @@ define dso_local void @loopdep3() {
 ; AVX-NEXT:    # => This Inner Loop Header: Depth=2
 ; AVX-NEXT:    vxorps %xmm5, %xmm5, %xmm5
 ; AVX-NEXT:    vcvtsi2sdl (%r11), %xmm5, %xmm0
+; AVX-NEXT:    vmulsd (%rsi,%rcx), %xmm0, %xmm0
 ; AVX-NEXT:    vmulsd (%rsi,%rdx), %xmm0, %xmm0
 ; AVX-NEXT:    vmulsd (%rsi,%r8), %xmm0, %xmm0
-; AVX-NEXT:    vmulsd (%rsi,%r9), %xmm0, %xmm0
-; AVX-NEXT:    vmovsd %xmm0, (%rsi,%r10)
+; AVX-NEXT:    vmovsd %xmm0, (%rsi,%r9)
 ; AVX-NEXT:    #APP
 ; AVX-NEXT:    #NO_APP
 ; AVX-NEXT:    addq $8, %rsi
@@ -580,8 +580,8 @@ define dso_local void @loopdep3() {
 ; AVX-NEXT:    jne .LBB8_2
 ; AVX-NEXT:  # %bb.3: # %for.inc14
 ; AVX-NEXT:    # in Loop: Header=BB8_1 Depth=1
-; AVX-NEXT:    incl %eax
-; AVX-NEXT:    cmpl $100000, %eax # imm = 0x186A0
+; AVX-NEXT:    incl %r10d
+; AVX-NEXT:    cmpl $100000, %r10d # imm = 0x186A0
 ; AVX-NEXT:    jne .LBB8_1
 ; AVX-NEXT:  ...
[truncated]

llvmbot · 2025-07-01T20:38:39Z

@llvm/pr-subscribers-backend-powerpc

Author: Guy David (guy-david)

Changes

When checking whether it is profitable to hoist an instruction, the pass may override a target's ruling because it assumes that all COPY instructions are cheap, and that may not be the case for all micro-architectures (especially for when copying between different register classes).

On AArch64 there's 0% difference in performance in LLVM's test-suite with this change. Additionally, very few tests were affected which shows how it is not so useful to keep it.

Patch is 41.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/146599.diff

9 Files Affected:

(modified) llvm/lib/CodeGen/MachineLICM.cpp (+1-1)
(modified) llvm/test/CodeGen/AArch64/dag-combine-concat-vectors.ll (+21-21)
(modified) llvm/test/CodeGen/PowerPC/vsx-fma-m-early.ll (+139-119)
(modified) llvm/test/CodeGen/X86/break-false-dep.ll (+22-22)
(modified) llvm/test/CodeGen/X86/dag-update-nodetomatch.ll (+38-40)
(removed) llvm/test/CodeGen/X86/memfold-mov32r0.ll (-9)
(added) llvm/test/CodeGen/X86/memfold-mov32r0.mir (+143)
(modified) llvm/test/CodeGen/X86/pr57673.ll (+18-18)
(modified) llvm/test/CodeGen/X86/reverse_branches.ll (+5-5)

diff --git a/llvm/lib/CodeGen/MachineLICM.cpp b/llvm/lib/CodeGen/MachineLICM.cpp
index c9079170ca575..70a178f642fb0 100644
--- a/llvm/lib/CodeGen/MachineLICM.cpp
+++ b/llvm/lib/CodeGen/MachineLICM.cpp
@@ -1219,7 +1219,7 @@ bool MachineLICMImpl::HasHighOperandLatency(MachineInstr &MI, unsigned DefIdx,
 /// Return true if the instruction is marked "cheap" or the operand latency
 /// between its def and a use is one or less.
 bool MachineLICMImpl::IsCheapInstruction(MachineInstr &MI) const {
-  if (TII->isAsCheapAsAMove(MI) || MI.isCopyLike())
+  if (TII->isAsCheapAsAMove(MI))
     return true;
 
   bool isCheap = false;
diff --git a/llvm/test/CodeGen/AArch64/dag-combine-concat-vectors.ll b/llvm/test/CodeGen/AArch64/dag-combine-concat-vectors.ll
index 53126a08db86f..2bd04ac30509e 100644
--- a/llvm/test/CodeGen/AArch64/dag-combine-concat-vectors.ll
+++ b/llvm/test/CodeGen/AArch64/dag-combine-concat-vectors.ll
@@ -18,14 +18,28 @@ define fastcc i8 @allocno_reload_assign(ptr %p) {
 ; CHECK-NEXT:    fmov w8, s0
 ; CHECK-NEXT:    movi v0.2d, #0000000000000000
 ; CHECK-NEXT:    mvn w8, w8
+; CHECK-NEXT:    uunpklo z1.h, z0.b
+; CHECK-NEXT:    uunpkhi z2.h, z0.b
 ; CHECK-NEXT:    sbfx x8, x8, #0, #1
 ; CHECK-NEXT:    whilelo p0.b, xzr, x8
+; CHECK-NEXT:    uunpklo z3.s, z1.h
+; CHECK-NEXT:    uunpkhi z4.s, z1.h
+; CHECK-NEXT:    uunpklo z6.s, z2.h
+; CHECK-NEXT:    uunpkhi z16.s, z2.h
 ; CHECK-NEXT:    punpklo p1.h, p0.b
 ; CHECK-NEXT:    punpkhi p0.h, p0.b
 ; CHECK-NEXT:    punpklo p2.h, p1.b
 ; CHECK-NEXT:    punpkhi p4.h, p1.b
+; CHECK-NEXT:    uunpklo z1.d, z3.s
+; CHECK-NEXT:    uunpkhi z2.d, z3.s
 ; CHECK-NEXT:    punpklo p6.h, p0.b
+; CHECK-NEXT:    uunpklo z3.d, z4.s
+; CHECK-NEXT:    uunpkhi z4.d, z4.s
 ; CHECK-NEXT:    punpkhi p0.h, p0.b
+; CHECK-NEXT:    uunpklo z5.d, z6.s
+; CHECK-NEXT:    uunpkhi z6.d, z6.s
+; CHECK-NEXT:    uunpklo z7.d, z16.s
+; CHECK-NEXT:    uunpkhi z16.d, z16.s
 ; CHECK-NEXT:    punpklo p1.h, p2.b
 ; CHECK-NEXT:    punpkhi p2.h, p2.b
 ; CHECK-NEXT:    punpklo p3.h, p4.b
@@ -35,28 +49,14 @@ define fastcc i8 @allocno_reload_assign(ptr %p) {
 ; CHECK-NEXT:    punpklo p7.h, p0.b
 ; CHECK-NEXT:    punpkhi p0.h, p0.b
 ; CHECK-NEXT:  .LBB0_1: // =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    uunpklo z1.h, z0.b
-; CHECK-NEXT:    uunpklo z2.s, z1.h
-; CHECK-NEXT:    uunpkhi z1.s, z1.h
-; CHECK-NEXT:    uunpklo z3.d, z2.s
-; CHECK-NEXT:    uunpkhi z2.d, z2.s
-; CHECK-NEXT:    st1b { z3.d }, p1, [z0.d]
+; CHECK-NEXT:    st1b { z1.d }, p1, [z0.d]
 ; CHECK-NEXT:    st1b { z2.d }, p2, [z0.d]
-; CHECK-NEXT:    uunpklo z2.d, z1.s
-; CHECK-NEXT:    uunpkhi z1.d, z1.s
-; CHECK-NEXT:    st1b { z2.d }, p3, [z0.d]
-; CHECK-NEXT:    uunpkhi z2.h, z0.b
-; CHECK-NEXT:    uunpklo z3.s, z2.h
-; CHECK-NEXT:    uunpkhi z2.s, z2.h
-; CHECK-NEXT:    st1b { z1.d }, p4, [z0.d]
-; CHECK-NEXT:    uunpklo z1.d, z3.s
-; CHECK-NEXT:    st1b { z1.d }, p5, [z0.d]
-; CHECK-NEXT:    uunpkhi z1.d, z3.s
-; CHECK-NEXT:    st1b { z1.d }, p6, [z0.d]
-; CHECK-NEXT:    uunpklo z1.d, z2.s
-; CHECK-NEXT:    st1b { z1.d }, p7, [z0.d]
-; CHECK-NEXT:    uunpkhi z1.d, z2.s
-; CHECK-NEXT:    st1b { z1.d }, p0, [z0.d]
+; CHECK-NEXT:    st1b { z3.d }, p3, [z0.d]
+; CHECK-NEXT:    st1b { z4.d }, p4, [z0.d]
+; CHECK-NEXT:    st1b { z5.d }, p5, [z0.d]
+; CHECK-NEXT:    st1b { z6.d }, p6, [z0.d]
+; CHECK-NEXT:    st1b { z7.d }, p7, [z0.d]
+; CHECK-NEXT:    st1b { z16.d }, p0, [z0.d]
 ; CHECK-NEXT:    str p8, [x0]
 ; CHECK-NEXT:    b .LBB0_1
   br label %1
diff --git a/llvm/test/CodeGen/PowerPC/vsx-fma-m-early.ll b/llvm/test/CodeGen/PowerPC/vsx-fma-m-early.ll
index 9cb2d4444b974..6cfb8b0e73f7c 100644
--- a/llvm/test/CodeGen/PowerPC/vsx-fma-m-early.ll
+++ b/llvm/test/CodeGen/PowerPC/vsx-fma-m-early.ll
@@ -1,17 +1,84 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ;; Tests that the ppc-vsx-fma-mutate pass with the schedule-ppc-vsx-fma-mutation-early pass does not hoist xxspltiw out of loops.
 ; RUN: llc -verify-machineinstrs -mcpu=pwr10 -disable-ppc-vsx-fma-mutation=false \
 ; RUN:   -ppc-asm-full-reg-names -schedule-ppc-vsx-fma-mutation-early \
-; RUN:    -mtriple powerpc64-ibm-aix < %s | FileCheck --check-prefixes=CHECK64,AIX64 %s
+; RUN:    -mtriple powerpc64-ibm-aix < %s | FileCheck --check-prefixes=AIX64 %s
 
 ; RUN: llc -verify-machineinstrs -mcpu=pwr10 -disable-ppc-vsx-fma-mutation=false \
 ; RUN:   -ppc-asm-full-reg-names -schedule-ppc-vsx-fma-mutation-early \
-; RUN:   -mtriple=powerpc64le-unknown-linux-gnu < %s | FileCheck --check-prefixes=CHECK64,LINUX64 %s
+; RUN:   -mtriple=powerpc64le-unknown-linux-gnu < %s | FileCheck --check-prefixes=LINUX64 %s
 
 ; RUN: llc -verify-machineinstrs -mcpu=pwr10 -disable-ppc-vsx-fma-mutation=false \
 ; RUN:   -ppc-asm-full-reg-names -schedule-ppc-vsx-fma-mutation-early \
 ; RUN:    -mtriple powerpc-ibm-aix < %s | FileCheck --check-prefix=CHECK32 %s
 
 define void @bar(ptr noalias nocapture noundef writeonly %__output_a, ptr noalias nocapture noundef readonly %var1321In_a, ptr noalias nocapture noundef readonly %n) {
+; AIX64-LABEL: bar:
+; AIX64:       # %bb.0: # %entry
+; AIX64-NEXT:    lwz r5, 0(r5)
+; AIX64-NEXT:    cmpwi r5, 1
+; AIX64-NEXT:    bltlr cr0
+; AIX64-NEXT:  # %bb.1: # %for.body.preheader
+; AIX64-NEXT:    xxspltiw vs0, 1069066811
+; AIX64-NEXT:    xxspltiw vs1, 1170469888
+; AIX64-NEXT:    mtctr r5
+; AIX64-NEXT:    li r5, 0
+; AIX64-NEXT:    .align 5
+; AIX64-NEXT:  L..BB0_2: # %for.body
+; AIX64-NEXT:    #
+; AIX64-NEXT:    lxvx vs2, r4, r5
+; AIX64-NEXT:    xvmaddmsp vs2, vs0, vs1
+; AIX64-NEXT:    stxvx vs2, r3, r5
+; AIX64-NEXT:    addi r5, r5, 16
+; AIX64-NEXT:    bdnz L..BB0_2
+; AIX64-NEXT:  # %bb.3: # %for.end
+; AIX64-NEXT:    blr
+;
+; LINUX64-LABEL: bar:
+; LINUX64:       # %bb.0: # %entry
+; LINUX64-NEXT:    lwz r5, 0(r5)
+; LINUX64-NEXT:    cmpwi r5, 1
+; LINUX64-NEXT:    bltlr cr0
+; LINUX64-NEXT:  # %bb.1: # %for.body.preheader
+; LINUX64-NEXT:    xxspltiw vs0, 1069066811
+; LINUX64-NEXT:    xxspltiw vs1, 1170469888
+; LINUX64-NEXT:    mtctr r5
+; LINUX64-NEXT:    li r5, 0
+; LINUX64-NEXT:    .p2align 5
+; LINUX64-NEXT:  .LBB0_2: # %for.body
+; LINUX64-NEXT:    #
+; LINUX64-NEXT:    lxvx vs2, r4, r5
+; LINUX64-NEXT:    xvmaddmsp vs2, vs0, vs1
+; LINUX64-NEXT:    stxvx vs2, r3, r5
+; LINUX64-NEXT:    addi r5, r5, 16
+; LINUX64-NEXT:    bdnz .LBB0_2
+; LINUX64-NEXT:  # %bb.3: # %for.end
+; LINUX64-NEXT:    blr
+;
+; CHECK32-LABEL: bar:
+; CHECK32:       # %bb.0: # %entry
+; CHECK32-NEXT:    lwz r5, 0(r5)
+; CHECK32-NEXT:    cmpwi r5, 0
+; CHECK32-NEXT:    blelr cr0
+; CHECK32-NEXT:  # %bb.1: # %for.body.preheader
+; CHECK32-NEXT:    xxspltiw vs0, 1069066811
+; CHECK32-NEXT:    xxspltiw vs1, 1170469888
+; CHECK32-NEXT:    li r6, 0
+; CHECK32-NEXT:    li r7, 0
+; CHECK32-NEXT:    .align 4
+; CHECK32-NEXT:  L..BB0_2: # %for.body
+; CHECK32-NEXT:    #
+; CHECK32-NEXT:    slwi r8, r7, 4
+; CHECK32-NEXT:    addic r7, r7, 1
+; CHECK32-NEXT:    addze r6, r6
+; CHECK32-NEXT:    lxvx vs2, r4, r8
+; CHECK32-NEXT:    xvmaddmsp vs2, vs0, vs1
+; CHECK32-NEXT:    stxvx vs2, r3, r8
+; CHECK32-NEXT:    xor r8, r7, r5
+; CHECK32-NEXT:    or. r8, r8, r6
+; CHECK32-NEXT:    bne cr0, L..BB0_2
+; CHECK32-NEXT:  # %bb.3: # %for.end
+; CHECK32-NEXT:    blr
 entry:
   %0 = load i32, ptr %n, align 4
   %cmp11 = icmp sgt i32 %0, 0
@@ -28,7 +95,7 @@ for.body:
   %add.ptr.val = load <4 x float>, ptr %add.ptr, align 1
   %2 = tail call contract <4 x float> @llvm.fma.v4f32(<4 x float> %add.ptr.val, <4 x float> <float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000>, <4 x float> <float 6.270500e+03, float 6.270500e+03, float 6.270500e+03, float 6.270500e+03>)
   %add.ptr6 = getelementptr inbounds float, ptr %__output_a, i64 %1
-  store <4 x float> %2, ptr %add.ptr6, align 1 
+  store <4 x float> %2, ptr %add.ptr6, align 1
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
   %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
   br i1 %exitcond.not, label %for.end, label %for.body
@@ -38,6 +105,74 @@ for.end:
 }
 
 define void @foo(i1 %cmp97) #0 {
+; AIX64-LABEL: foo:
+; AIX64:       # %bb.0: # %entry
+; AIX64-NEXT:    andi. r3, r3, 1
+; AIX64-NEXT:    bclr 4, gt, 0
+; AIX64-NEXT:  # %bb.1: # %for.body.preheader
+; AIX64-NEXT:    xxlxor f0, f0, f0
+; AIX64-NEXT:    xxlxor f2, f2, f2
+; AIX64-NEXT:    xxmrghd vs1, vs0, vs0
+; AIX64-NEXT:    xvcvdpsp vs34, vs1
+; AIX64-NEXT:    xxlxor vs1, vs1, vs1
+; AIX64-NEXT:    .align 4
+; AIX64-NEXT:  L..BB1_2: # %for.body
+; AIX64-NEXT:    #
+; AIX64-NEXT:    xxmrghd vs2, vs2, vs0
+; AIX64-NEXT:    xvcvdpsp vs35, vs2
+; AIX64-NEXT:    xxspltiw vs2, 1170469888
+; AIX64-NEXT:    vmrgew v3, v3, v2
+; AIX64-NEXT:    xvcmpgtsp vs3, vs1, vs35
+; AIX64-NEXT:    xvmaddasp vs2, vs35, vs1
+; AIX64-NEXT:    xxland vs2, vs3, vs2
+; AIX64-NEXT:    xscvspdpn f2, vs2
+; AIX64-NEXT:    b L..BB1_2
+;
+; LINUX64-LABEL: foo:
+; LINUX64:       # %bb.0: # %entry
+; LINUX64-NEXT:    andi. r3, r3, 1
+; LINUX64-NEXT:    bclr 4, gt, 0
+; LINUX64-NEXT:  # %bb.1: # %for.body.preheader
+; LINUX64-NEXT:    xxlxor f0, f0, f0
+; LINUX64-NEXT:    xxlxor f2, f2, f2
+; LINUX64-NEXT:    xxspltd vs1, vs0, 0
+; LINUX64-NEXT:    xvcvdpsp vs34, vs1
+; LINUX64-NEXT:    xxlxor vs1, vs1, vs1
+; LINUX64-NEXT:    .p2align 4
+; LINUX64-NEXT:  .LBB1_2: # %for.body
+; LINUX64-NEXT:    #
+; LINUX64-NEXT:    xxmrghd vs2, vs0, vs2
+; LINUX64-NEXT:    xvcvdpsp vs35, vs2
+; LINUX64-NEXT:    xxspltiw vs2, 1170469888
+; LINUX64-NEXT:    vmrgew v3, v2, v3
+; LINUX64-NEXT:    xvcmpgtsp vs3, vs1, vs35
+; LINUX64-NEXT:    xvmaddasp vs2, vs35, vs1
+; LINUX64-NEXT:    xxland vs2, vs3, vs2
+; LINUX64-NEXT:    xxsldwi vs2, vs2, vs2, 3
+; LINUX64-NEXT:    xscvspdpn f2, vs2
+; LINUX64-NEXT:    b .LBB1_2
+;
+; CHECK32-LABEL: foo:
+; CHECK32:       # %bb.0: # %entry
+; CHECK32-NEXT:    andi. r3, r3, 1
+; CHECK32-NEXT:    bclr 4, gt, 0
+; CHECK32-NEXT:  # %bb.1: # %for.body.preheader
+; CHECK32-NEXT:    lwz r3, L..C0(r2) # %const.0
+; CHECK32-NEXT:    xxlxor f1, f1, f1
+; CHECK32-NEXT:    xxlxor vs0, vs0, vs0
+; CHECK32-NEXT:    xscvdpspn vs35, f1
+; CHECK32-NEXT:    lxv vs34, 0(r3)
+; CHECK32-NEXT:    .align 4
+; CHECK32-NEXT:  L..BB1_2: # %for.body
+; CHECK32-NEXT:    #
+; CHECK32-NEXT:    xscvdpspn vs36, f1
+; CHECK32-NEXT:    xxspltiw vs1, 1170469888
+; CHECK32-NEXT:    vperm v4, v4, v3, v2
+; CHECK32-NEXT:    xvcmpgtsp vs2, vs0, vs36
+; CHECK32-NEXT:    xvmaddasp vs1, vs36, vs0
+; CHECK32-NEXT:    xxland vs1, vs2, vs1
+; CHECK32-NEXT:    xscvspdpn f1, vs1
+; CHECK32-NEXT:    b L..BB1_2
 entry:
   br i1 %cmp97, label %for.body, label %for.end
 
@@ -57,122 +192,7 @@ for.end:                                          ; preds = %entry
 }
 
 ; Function Attrs: mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none)
-declare <4 x float> @llvm.fma.v4f32(<4 x float>, <4 x float>, <4 x float>) 
+declare <4 x float> @llvm.fma.v4f32(<4 x float>, <4 x float>, <4 x float>)
 
 ; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
 declare <4 x i32> @llvm.ppc.vsx.xvcmpgtsp(<4 x float>, <4 x float>)
-
-; CHECK64:      bar:
-; CHECK64:      # %bb.0:                                # %entry
-; CHECK64-NEXT:         lwz r5, 0(r5)
-; CHECK64-NEXT:         cmpwi   r5, 1
-; CHECK64-NEXT:         bltlr   cr0
-; CHECK64-NEXT: # %bb.1:                                # %for.body.preheader
-; CHECK64-NEXT:         xxspltiw vs0, 1069066811
-; CHECK64-NEXT:         xxspltiw vs1, 1170469888
-; CHECK64-NEXT:         mtctr r5
-; CHECK64-NEXT:         li r5, 0
-; CHECK64-NEXT:         {{.*}}align  5
-; CHECK64-NEXT: [[L2_bar:.*]]:                               # %for.body
-; CHECK64-NEXT:                                         # =>This Inner Loop Header: Depth=1
-; CHECK64-NEXT:         lxvx vs2, r4, r5
-; CHECK64-NEXT:         xvmaddmsp vs2, vs0, vs1
-; CHECK64-NEXT:         stxvx vs2, r3, r5
-; CHECK64-NEXT:         addi r5, r5, 16
-; CHECK64-NEXT:         bdnz [[L2_bar]]
-; CHECK64-NEXT: # %bb.3:                                # %for.end
-; CHECK64-NEXT:         blr
-
-; AIX64:      .foo:
-; AIX64-NEXT: # %bb.0:                                # %entry
-; AIX64-NEXT:   andi. r3, r3, 1
-; AIX64-NEXT:   bclr 4, gt, 0
-; AIX64-NEXT: # %bb.1:                                # %for.body.preheader
-; AIX64-NEXT:   xxlxor f0, f0, f0
-; AIX64-NEXT:   xxlxor vs1, vs1, vs1
-; AIX64-NEXT:   xxlxor f2, f2, f2
-; AIX64-NEXT:   .align  4
-; AIX64-NEXT: L..BB1_2:                               # %for.body
-; AIX64-NEXT:                                         # =>This Inner Loop Header: Depth=1
-; AIX64-NEXT:   xxmrghd vs2, vs2, vs0
-; AIX64-NEXT:   xvcvdpsp vs34, vs2
-; AIX64-NEXT:   xxmrghd vs2, vs0, vs0
-; AIX64-NEXT:   xvcvdpsp vs35, vs2
-; AIX64-NEXT:   xxspltiw vs2, 1170469888
-; AIX64-NEXT:   vmrgew v2, v2, v3
-; AIX64-NEXT:   xvcmpgtsp vs3, vs1, vs34
-; AIX64-NEXT:   xvmaddasp vs2, vs34, vs1
-; AIX64-NEXT:   xxland vs2, vs3, vs2
-; AIX64-NEXT:   xscvspdpn f2, vs2
-; AIX64-NEXT:   b L..BB1_2
-
-; LINUX64:      foo:                                    # @foo
-; LINUX64-NEXT: .Lfunc_begin1:
-; LINUX64-NEXT:         .cfi_startproc
-; LINUX64-NEXT: # %bb.0:                                # %entry
-; LINUX64-NEXT:         andi. r3, r3, 1
-; LINUX64-NEXT:         bclr 4, gt, 0
-; LINUX64-NEXT: # %bb.1:                                # %for.body.preheader
-; LINUX64-NEXT:         xxlxor f0, f0, f0
-; LINUX64-NEXT:         xxlxor vs1, vs1, vs1
-; LINUX64-NEXT:         xxlxor f2, f2, f2
-; LINUX64-NEXT:         .p2align        4
-; LINUX64-NEXT: .LBB1_2:                                # %for.body
-; LINUX64-NEXT:                                         # =>This Inner Loop Header: Depth=1
-; LINUX64-NEXT:         xxmrghd vs2, vs0, vs2
-; LINUX64-NEXT:         xvcvdpsp vs34, vs2
-; LINUX64-NEXT:         xxspltd vs2, vs0, 0
-; LINUX64-NEXT:         xvcvdpsp vs35, vs2
-; LINUX64-NEXT:         xxspltiw vs2, 1170469888
-; LINUX64-NEXT:         vmrgew v2, v3, v2
-; LINUX64-NEXT:         xvcmpgtsp vs3, vs1, vs34
-; LINUX64-NEXT:         xvmaddasp vs2, vs34, vs1
-; LINUX64-NEXT:         xxland vs2, vs3, vs2
-; LINUX64-NEXT:         xxsldwi vs2, vs2, vs2, 3
-; LINUX64-NEXT:         xscvspdpn f2, vs2
-; LINUX64-NEXT:         b .LBB1_2
-
-; CHECK32:        .bar:
-; CHECK32-NEXT: # %bb.0:                                # %entry
-; CHECK32-NEXT:       lwz r5, 0(r5)
-; CHECK32-NEXT:       cmpwi   r5, 0
-; CHECK32-NEXT:       blelr cr0
-; CHECK32-NEXT: # %bb.1:                                # %for.body.preheader
-; CHECK32-NEXT:       xxspltiw vs0, 1069066811
-; CHECK32-NEXT:       xxspltiw vs1, 1170469888
-; CHECK32-NEXT:       li r6, 0
-; CHECK32-NEXT:       li r7, 0
-; CHECK32-NEXT:       .align  4
-; CHECK32-NEXT: [[L2_foo:.*]]:                               # %for.body
-; CHECK32-NEXT:                                         # =>This Inner Loop Header: Depth=1
-; CHECK32-NEXT:       slwi r8, r7, 4
-; CHECK32-NEXT:       addic r7, r7, 1
-; CHECK32-NEXT:       addze r6, r6
-; CHECK32-NEXT:       lxvx vs2, r4, r8
-; CHECK32-NEXT:       xvmaddmsp vs2, vs0, vs1
-; CHECK32-NEXT:       stxvx vs2, r3, r8
-; CHECK32-NEXT:       xor r8, r7, r5
-; CHECK32-NEXT:       or. r8, r8, r6
-; CHECK32-NEXT:       bne     cr0, [[L2_foo]]
-
-; CHECK32:      .foo:
-; CHECK32-NEXT: # %bb.0:                                # %entry
-; CHECK32-NEXT:         andi. r3, r3, 1
-; CHECK32-NEXT:         bclr 4, gt, 0
-; CHECK32-NEXT: # %bb.1:                                # %for.body.preheader
-; CHECK32-NEXT:         lwz r3, L..C0(r2)                       # %const.0
-; CHECK32-NEXT:         xxlxor f1, f1, f1
-; CHECK32-NEXT:         xxlxor vs0, vs0, vs0
-; CHECK32-NEXT:         xscvdpspn vs35, f1
-; CHECK32-NEXT:         lxv vs34, 0(r3)
-; CHECK32-NEXT:         .align  4
-; CHECK32-NEXT: L..BB1_2:                               # %for.body
-; CHECK32-NEXT:                                         # =>This Inner Loop Header: Depth=1
-; CHECK32-NEXT:         xscvdpspn vs36, f1
-; CHECK32-NEXT:         xxspltiw vs1, 1170469888
-; CHECK32-NEXT:         vperm v4, v4, v3, v2
-; CHECK32-NEXT:         xvcmpgtsp vs2, vs0, vs36
-; CHECK32-NEXT:         xvmaddasp vs1, vs36, vs0
-; CHECK32-NEXT:         xxland vs1, vs2, vs1
-; CHECK32-NEXT:         xscvspdpn f1, vs1
-; CHECK32-NEXT:         b L..BB1_2
diff --git a/llvm/test/CodeGen/X86/break-false-dep.ll b/llvm/test/CodeGen/X86/break-false-dep.ll
index 6943622fac7f2..a6ad3018e052c 100644
--- a/llvm/test/CodeGen/X86/break-false-dep.ll
+++ b/llvm/test/CodeGen/X86/break-false-dep.ll
@@ -472,17 +472,17 @@ define dso_local void @loopdep3() {
 ; SSE-WIN-NEXT:    movaps %xmm6, (%rsp) # 16-byte Spill
 ; SSE-WIN-NEXT:    .seh_savexmm %xmm6, 0
 ; SSE-WIN-NEXT:    .seh_endprologue
-; SSE-WIN-NEXT:    xorl %eax, %eax
-; SSE-WIN-NEXT:    leaq v(%rip), %rcx
-; SSE-WIN-NEXT:    leaq x(%rip), %rdx
-; SSE-WIN-NEXT:    leaq y(%rip), %r8
-; SSE-WIN-NEXT:    leaq z(%rip), %r9
-; SSE-WIN-NEXT:    leaq w(%rip), %r10
+; SSE-WIN-NEXT:    leaq v(%rip), %rax
+; SSE-WIN-NEXT:    leaq x(%rip), %rcx
+; SSE-WIN-NEXT:    leaq y(%rip), %rdx
+; SSE-WIN-NEXT:    leaq z(%rip), %r8
+; SSE-WIN-NEXT:    leaq w(%rip), %r9
+; SSE-WIN-NEXT:    xorl %r10d, %r10d
 ; SSE-WIN-NEXT:    .p2align 4
 ; SSE-WIN-NEXT:  .LBB8_1: # %for.cond1.preheader
 ; SSE-WIN-NEXT:    # =>This Loop Header: Depth=1
 ; SSE-WIN-NEXT:    # Child Loop BB8_2 Depth 2
-; SSE-WIN-NEXT:    movq %rcx, %r11
+; SSE-WIN-NEXT:    movq %rax, %r11
 ; SSE-WIN-NEXT:    xorl %esi, %esi
 ; SSE-WIN-NEXT:    .p2align 4
 ; SSE-WIN-NEXT:  .LBB8_2: # %for.body3
@@ -490,10 +490,10 @@ define dso_local void @loopdep3() {
 ; SSE-WIN-NEXT:    # => This Inner Loop Header: Depth=2
 ; SSE-WIN-NEXT:    xorps %xmm0, %xmm0
 ; SSE-WIN-NEXT:    cvtsi2sdl (%r11), %xmm0
+; SSE-WIN-NEXT:    mulsd (%rsi,%rcx), %xmm0
 ; SSE-WIN-NEXT:    mulsd (%rsi,%rdx), %xmm0
 ; SSE-WIN-NEXT:    mulsd (%rsi,%r8), %xmm0
-; SSE-WIN-NEXT:    mulsd (%rsi,%r9), %xmm0
-; SSE-WIN-NEXT:    movsd %xmm0, (%rsi,%r10)
+; SSE-WIN-NEXT:    movsd %xmm0, (%rsi,%r9)
 ; SSE-WIN-NEXT:    #APP
 ; SSE-WIN-NEXT:    #NO_APP
 ; SSE-WIN-NEXT:    addq $8, %rsi
@@ -502,8 +502,8 @@ define dso_local void @loopdep3() {
 ; SSE-WIN-NEXT:    jne .LBB8_2
 ; SSE-WIN-NEXT:  # %bb.3: # %for.inc14
 ; SSE-WIN-NEXT:    # in Loop: Header=BB8_1 Depth=1
-; SSE-WIN-NEXT:    incl %eax
-; SSE-WIN-NEXT:    cmpl $100000, %eax # imm = 0x186A0
+; SSE-WIN-NEXT:    incl %r10d
+; SSE-WIN-NEXT:    cmpl $100000, %r10d # imm = 0x186A0
 ; SSE-WIN-NEXT:    jne .LBB8_1
 ; SSE-WIN-NEXT:  # %bb.4: # %for.end16
 ; SSE-WIN-NEXT:    movaps (%rsp), %xmm6 # 16-byte Reload
@@ -550,17 +550,17 @@ define dso_local void @loopdep3() {
 ; AVX-NEXT:    vmovaps %xmm6, (%rsp) # 16-byte Spill
 ; AVX-NEXT:    .seh_savexmm %xmm6, 0
 ; AVX-NEXT:    .seh_endprologue
-; AVX-NEXT:    xorl %eax, %eax
-; AVX-NEXT:    leaq v(%rip), %rcx
-; AVX-NEXT:    leaq x(%rip), %rdx
-; AVX-NEXT:    leaq y(%rip), %r8
-; AVX-NEXT:    leaq z(%rip), %r9
-; AVX-NEXT:    leaq w(%rip), %r10
+; AVX-NEXT:    leaq v(%rip), %rax
+; AVX-NEXT:    leaq x(%rip), %rcx
+; AVX-NEXT:    leaq y(%rip), %rdx
+; AVX-NEXT:    leaq z(%rip), %r8
+; AVX-NEXT:    leaq w(%rip), %r9
+; AVX-NEXT:    xorl %r10d, %r10d
 ; AVX-NEXT:    .p2align 4
 ; AVX-NEXT:  .LBB8_1: # %for.cond1.preheader
 ; AVX-NEXT:    # =>This Loop Header: Depth=1
 ; AVX-NEXT:    # Child Loop BB8_2 Depth 2
-; AVX-NEXT:    movq %rcx, %r11
+; AVX-NEXT:    movq %rax, %r11
 ; AVX-NEXT:    xorl %esi, %esi
 ; AVX-NEXT:    .p2align 4
 ; AVX-NEXT:  .LBB8_2: # %for.body3
@@ -568,10 +568,10 @@ define dso_local void @loopdep3() {
 ; AVX-NEXT:    # => This Inner Loop Header: Depth=2
 ; AVX-NEXT:    vxorps %xmm5, %xmm5, %xmm5
 ; AVX-NEXT:    vcvtsi2sdl (%r11), %xmm5, %xmm0
+; AVX-NEXT:    vmulsd (%rsi,%rcx), %xmm0, %xmm0
 ; AVX-NEXT:    vmulsd (%rsi,%rdx), %xmm0, %xmm0
 ; AVX-NEXT:    vmulsd (%rsi,%r8), %xmm0, %xmm0
-; AVX-NEXT:    vmulsd (%rsi,%r9), %xmm0, %xmm0
-; AVX-NEXT:    vmovsd %xmm0, (%rsi,%r10)
+; AVX-NEXT:    vmovsd %xmm0, (%rsi,%r9)
 ; AVX-NEXT:    #APP
 ; AVX-NEXT:    #NO_APP
 ; AVX-NEXT:    addq $8, %rsi
@@ -580,8 +580,8 @@ define dso_local void @loopdep3() {
 ; AVX-NEXT:    jne .LBB8_2
 ; AVX-NEXT:  # %bb.3: # %for.inc14
 ; AVX-NEXT:    # in Loop: Header=BB8_1 Depth=1
-; AVX-NEXT:    incl %eax
-; AVX-NEXT:    cmpl $100000, %eax # imm = 0x186A0
+; AVX-NEXT:    incl %r10d
+; AVX-NEXT:    cmpl $100000, %r10d # imm = 0x186A0
 ; AVX-NEXT:    jne .LBB8_1
 ; AVX-NEXT:  ...
[truncated]

phoebewang · 2025-07-02T01:16:06Z

llvm/lib/CodeGen/MachineLICM.cpp

@@ -1219,7 +1219,7 @@ bool MachineLICMImpl::HasHighOperandLatency(MachineInstr &MI, unsigned DefIdx,
 /// Return true if the instruction is marked "cheap" or the operand latency
 /// between its def and a use is one or less.
 bool MachineLICMImpl::IsCheapInstruction(MachineInstr &MI) const {
-  if (TII->isAsCheapAsAMove(MI) || MI.isCopyLike())
+  if (TII->isAsCheapAsAMove(MI))


isCopyLike includes isSubregToReg, which is always cheap. Adding it would prevent many changes in X86.

It does not seem like it has a negative impact on the test cases (dag-update-nodetomatch.ll even looks a tiny bit more optimized).
I've added x86 numbers as well, performance is either neutral or slightly better with this change.

It's counterintuitive. Did you compare between w/ and w/o || MI.isSubregToReg()?

The comparison is just the diff introduced in this PR. Why is it counter-intuitive? It seems to me that even though this instruction is cheap, it's not free and executing it just once may help performance.

That makes sense, but it doesn't address prior question either. SUBREG_TO_REG is not just cheap, it's free at least on X86.

Maybe the hoisting of a SUBREG_TO_REG permitted the hoisting of other instructions, eventually leading to a more optimized code? Just like the test case here.

Which test do you mean? I can only see the code size increase (64-bit register has longer encoding). Besides, it also increase memory size when spill/reload.

guy-david requested review from phoebewang, paulwalker-arm and diggerlin July 1, 2025 20:38

llvmbot added backend:AArch64 backend:PowerPC backend:X86 llvm:codegen labels Jul 1, 2025

guy-david requested a review from jcohen-apple July 1, 2025 20:38

phoebewang reviewed Jul 2, 2025

View reviewed changes

guy-david requested a review from phoebewang July 2, 2025 13:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MachineLICM] Let targets decide if copy-like instructions are cheap #146599

[MachineLICM] Let targets decide if copy-like instructions are cheap #146599

guy-david commented Jul 1, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jul 1, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jul 1, 2025

Uh oh!

phoebewang Jul 2, 2025

Uh oh!

guy-david Jul 2, 2025 •

edited

Loading

Uh oh!

phoebewang Jul 2, 2025

Uh oh!

guy-david Jul 2, 2025

Uh oh!

phoebewang Jul 3, 2025

Uh oh!

guy-david Jul 3, 2025 •

edited

Loading

Uh oh!

phoebewang Jul 4, 2025

Uh oh!

Uh oh!

[MachineLICM] Let targets decide if copy-like instructions are cheap #146599

Are you sure you want to change the base?

[MachineLICM] Let targets decide if copy-like instructions are cheap #146599

Conversation

guy-david commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 1, 2025

Uh oh!

phoebewang Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

guy-david Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phoebewang Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

guy-david Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

phoebewang Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

guy-david Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phoebewang Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

guy-david commented Jul 1, 2025 •

edited

Loading

llvmbot commented Jul 1, 2025 •

edited

Loading

guy-david Jul 2, 2025 •

edited

Loading

guy-david Jul 3, 2025 •

edited

Loading