-
Notifications
You must be signed in to change notification settings - Fork 15.7k
[SLP]Better copyable vectorization for stores with non-instructions #174249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SLP]Better copyable vectorization for stores with non-instructions #174249
Conversation
Created using spr 1.3.7
|
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Alexey Bataev (alexey-bataev) ChangesIf the stored values contain non-instructions, better to put them in the Fixes #53238 Full diff: https://github.com/llvm/llvm-project/pull/174249.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 5748d7ca752d2..9f7c3b9ef2d8a 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -27338,21 +27338,24 @@ bool SLPVectorizerPass::vectorizeStoreChains(BoUpSLP &R) {
V2->getValueOperand()->getType()->getScalarSizeInBits())
return false;
// UndefValues are compatible with all other values.
- if (auto *I1 = dyn_cast<Instruction>(V->getValueOperand()))
- if (auto *I2 = dyn_cast<Instruction>(V2->getValueOperand())) {
- DomTreeNodeBase<llvm::BasicBlock> *NodeI1 =
- DT->getNode(I1->getParent());
- DomTreeNodeBase<llvm::BasicBlock> *NodeI2 =
- DT->getNode(I2->getParent());
- assert(NodeI1 && "Should only process reachable instructions");
- assert(NodeI2 && "Should only process reachable instructions");
- assert((NodeI1 == NodeI2) ==
- (NodeI1->getDFSNumIn() == NodeI2->getDFSNumIn()) &&
- "Different nodes should have different DFS numbers");
- if (NodeI1 != NodeI2)
- return NodeI1->getDFSNumIn() < NodeI2->getDFSNumIn();
- return I1->getOpcode() < I2->getOpcode();
- }
+ auto *I1 = dyn_cast<Instruction>(V->getValueOperand());
+ auto *I2 = dyn_cast<Instruction>(V2->getValueOperand());
+ if (I1 && I2) {
+ DomTreeNodeBase<llvm::BasicBlock> *NodeI1 = DT->getNode(I1->getParent());
+ DomTreeNodeBase<llvm::BasicBlock> *NodeI2 = DT->getNode(I2->getParent());
+ assert(NodeI1 && "Should only process reachable instructions");
+ assert(NodeI2 && "Should only process reachable instructions");
+ assert((NodeI1 == NodeI2) ==
+ (NodeI1->getDFSNumIn() == NodeI2->getDFSNumIn()) &&
+ "Different nodes should have different DFS numbers");
+ if (NodeI1 != NodeI2)
+ return NodeI1->getDFSNumIn() < NodeI2->getDFSNumIn();
+ return I1->getOpcode() < I2->getOpcode();
+ }
+ if (I1 && !I2)
+ return true;
+ if (!I1 && I2)
+ return false;
return V->getValueOperand()->getValueID() <
V2->getValueOperand()->getValueID();
};
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/copyable-non-inst-in-stores.ll b/llvm/test/Transforms/SLPVectorizer/X86/copyable-non-inst-in-stores.ll
index 659a3b07bb938..efdbdb2ea867e 100644
--- a/llvm/test/Transforms/SLPVectorizer/X86/copyable-non-inst-in-stores.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/copyable-non-inst-in-stores.ll
@@ -5,21 +5,10 @@ define void @test(i32 %a, ptr %out) {
; CHECK-LABEL: define void @test(
; CHECK-SAME: i32 [[A:%.*]], ptr [[OUT:%.*]]) #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: [[ENTRY:.*:]]
-; CHECK-NEXT: store i32 [[A]], ptr [[OUT]], align 4
-; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[A]], i32 0
-; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer
-; CHECK-NEXT: [[TMP2:%.*]] = lshr <4 x i32> [[TMP1]], <i32 1, i32 2, i32 3, i32 4>
-; CHECK-NEXT: [[ARRAYIDX9_1:%.*]] = getelementptr inbounds nuw i8, ptr [[OUT]], i64 4
-; CHECK-NEXT: store <4 x i32> [[TMP2]], ptr [[ARRAYIDX9_1]], align 4
-; CHECK-NEXT: [[SHR_5:%.*]] = lshr i32 [[A]], 5
-; CHECK-NEXT: [[ARRAYIDX9_5:%.*]] = getelementptr inbounds nuw i8, ptr [[OUT]], i64 20
-; CHECK-NEXT: store i32 [[SHR_5]], ptr [[ARRAYIDX9_5]], align 4
-; CHECK-NEXT: [[SHR_6:%.*]] = lshr i32 [[A]], 6
-; CHECK-NEXT: [[ARRAYIDX9_6:%.*]] = getelementptr inbounds nuw i8, ptr [[OUT]], i64 24
-; CHECK-NEXT: store i32 [[SHR_6]], ptr [[ARRAYIDX9_6]], align 4
-; CHECK-NEXT: [[SHR_7:%.*]] = lshr i32 [[A]], 7
-; CHECK-NEXT: [[ARRAYIDX9_7:%.*]] = getelementptr inbounds nuw i8, ptr [[OUT]], i64 28
-; CHECK-NEXT: store i32 [[SHR_7]], ptr [[ARRAYIDX9_7]], align 4
+; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> poison, i32 [[A]], i32 0
+; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> poison, <8 x i32> zeroinitializer
+; CHECK-NEXT: [[TMP2:%.*]] = lshr <8 x i32> [[TMP1]], <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT: store <8 x i32> [[TMP2]], ptr [[OUT]], align 4
; CHECK-NEXT: ret void
;
entry:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR improves the SLP vectorizer's handling of store chains that contain non-instruction values (like constants or arguments). The optimization reorders stores to place non-instructions at the end, enabling better detection of copyable elements and resulting in more efficient vectorization.
Key changes:
- Modified store comparison logic to prioritize instructions over non-instructions
- Enhanced vectorization from 4-element to 8-element vectors in the test case
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | Updated store comparison logic to handle non-instruction values by placing them after instructions |
| llvm/test/Transforms/SLPVectorizer/X86/copyable-non-inst-in-stores.ll | Updated expected test output showing improved vectorization from partial to full 8-element vectors |
Comments suppressed due to low confidence (1)
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:1
- These conditions determine ordering when one value is an instruction and the other is not, but lack explanation of the design choice. Add a comment explaining why instructions are ordered before non-instructions (e.g., to enable better copyable element detection in the vectorization algorithm).
//===- SLPVectorizer.cpp - A bottom up SLP Vectorizer ---------------------===//
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
RKSimon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If the stored values contain non-instructions, better to put them in the
tail of the values to allow the algorithm for copyable elements to
detect them.
Fixes #53238