-
Notifications
You must be signed in to change notification settings - Fork 62
[tensor-descriptor]: Extend support when tensor descriptor created in control flow #4152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ad/descriptor_store operation that uses it Signed-off-by: Tiotto, Ettore <[email protected]>
Signed-off-by: Tiotto, Ettore <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code LGTM, but I'm not sure to understand why we need this PR if the conversion from tensor_descriptor
to block_pointer
is one of the first passes to run at the ttir
level, how can we have tensor_descriptor
ops with different layouts given the encodings are assigned to ops when lowering from the ttir
level to the ttgir
level.
As I understand it, we are not supposed to have tensor_descriptor
ops after translating to ttgir
dialect, no?
BTW FYI, the existing TMA lowering expects the tensor descriptor to always have a layout. |
Signed-off-by: Tiotto, Ettore <[email protected]>
Signed-off-by: Tiotto, Ettore <[email protected]>
Signed-off-by: Tiotto, Ettore <[email protected]>
I have updated the code quite a bit to change the loginc used to do the translation. The pass is now simpler and able to handle tesnor descriptors created in control flow (i.e. a branch) inside a loop. |
Signed-off-by: Tiotto, Ettore <[email protected]>
Signed-off-by: Tiotto, Ettore <[email protected]>
Signed-off-by: Tiotto, Ettore <[email protected]>
No performance degradation in Triton benchmarks: https://github.com/intel-sandbox/applications.python.intel-xpu-backend-for-triton.infrastructure/actions/runs/15142551794/job/42570177577 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Enhance layout propagation and tensor descriptor lowering to support cases where descriptors or pointers are created within control flow constructs.
- Add
updateAdvanceOpChain
to recursively update chains ofAdvanceOp
users. - Refactor
rewriteStoreOp
to trace back throughAdvanceOp
chains before creating newMakeTensorPtrOp
. - Rewrite the TensorDesc-to-block-pointer pass to drop old descriptor lookup, always find or create
MakeTensorPtrOp
, and handle loop-carried block pointer types.
Reviewed Changes
Copilot reviewed 2 out of 5 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
third_party/intel/lib/TritonIntelGPUTransforms/RemoveLayoutConversions.cpp | Added recursive chain update, improved rewriteStoreOp , and added verification asserts after each transformation stage. |
third_party/intel/lib/Dialect/Triton/Transforms/TensorDescToBlockPointer.cpp | Removed legacy descriptor lookup, consolidated pointer creation via findOrCreateMakeTensorPtr , replaced offset logic, and updated loop argument types. |
Files not reviewed (3)
- test/Triton/Intel/TensorDescToBlockPointer/basic.mlir: Language not supported
- test/Triton/Intel/TensorDescToBlockPointer/loop.mlir: Language not supported
- test/TritonIntelGPU/backward_combine_dpas_dot_layout.mlir: Language not supported
Comments suppressed due to low confidence (1)
third_party/intel/lib/TritonIntelGPUTransforms/RemoveLayoutConversions.cpp:793
- The variable
value
is undefined in this scope, leading to a compile error. It should reference the store's value (e.g.,storeOp.getValue()
) or the converted value extracted earlier.
Value dataToStore = getValueAs(value, encoding);
third_party/intel/lib/TritonIntelGPUTransforms/RemoveLayoutConversions.cpp
Outdated
Show resolved
Hide resolved
third_party/intel/lib/Dialect/Triton/Transforms/TensorDescToBlockPointer.cpp
Show resolved
Hide resolved
third_party/intel/lib/Dialect/Triton/Transforms/TensorDescToBlockPointer.cpp
Outdated
Show resolved
Hide resolved
third_party/intel/lib/TritonIntelGPUTransforms/RemoveLayoutConversions.cpp
Outdated
Show resolved
Hide resolved
third_party/intel/lib/TritonIntelGPUTransforms/RemoveLayoutConversions.cpp
Outdated
Show resolved
Hide resolved
third_party/intel/lib/TritonIntelGPUTransforms/RemoveLayoutConversions.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: Tiotto, Ettore <[email protected]>
Signed-off-by: Tiotto, Ettore <[email protected]>
@whitneywhtsang I have split this pull request and move the part that deals with removing layout conversion for store operation that use a block ptr updated by a tt.advance operation in PR #4277 |
Signed-off-by: Tiotto, Ettore <[email protected]>
Thanks, this part of the code LGTM. |
Enhance layout propagation and tensor descriptor lowering to support cases where descriptors or pointers are created within control flow constructs.