Skip to content

[tensor-descriptor]: Extend support when tensor descriptor created in control flow #4152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

etiotto
Copy link
Contributor

@etiotto etiotto commented May 9, 2025

Enhance layout propagation and tensor descriptor lowering to support cases where descriptors or pointers are created within control flow constructs.

…ad/descriptor_store operation that uses it

Signed-off-by: Tiotto, Ettore <[email protected]>
@etiotto etiotto requested a review from Copilot May 9, 2025 18:07
@etiotto etiotto self-assigned this May 9, 2025
Copilot

This comment was marked as outdated.

@etiotto etiotto requested review from alexbaden, whitneywhtsang, chengjunlu and a team May 9, 2025 18:24
@etiotto etiotto marked this pull request as ready for review May 9, 2025 18:28
Copy link
Contributor

@mfrancepillois mfrancepillois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code LGTM, but I'm not sure to understand why we need this PR if the conversion from tensor_descriptor to block_pointer is one of the first passes to run at the ttir level, how can we have tensor_descriptor ops with different layouts given the encodings are assigned to ops when lowering from the ttir level to the ttgir level.
As I understand it, we are not supposed to have tensor_descriptor ops after translating to ttgir dialect, no?

whitneywhtsang
whitneywhtsang previously approved these changes May 12, 2025
@kurapov-peter
Copy link
Contributor

BTW FYI, the existing TMA lowering expects the tensor descriptor to always have a layout.

@etiotto etiotto changed the title Created block_ptr with the layout of the descriptor load/store operation [tensor-descriptor]: Extend support when tensor descriptor created in control flow May 15, 2025
@etiotto
Copy link
Contributor Author

etiotto commented May 15, 2025

The code LGTM, but I'm not sure to understand why we need this PR if the conversion from tensor_descriptor to block_pointer is one of the first passes to run at the ttir level, how can we have tensor_descriptor ops with different layouts given the encodings are assigned to ops when lowering from the ttir level to the ttgir level. As I understand it, we are not supposed to have tensor_descriptor ops after translating to ttgir dialect, no?

I have updated the code quite a bit to change the loginc used to do the translation. The pass is now simpler and able to handle tesnor descriptors created in control flow (i.e. a branch) inside a loop.

@etiotto etiotto requested a review from whitneywhtsang May 15, 2025 21:09
@etiotto
Copy link
Contributor Author

etiotto commented May 20, 2025

@etiotto etiotto requested a review from Copilot May 20, 2025 16:27
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Enhance layout propagation and tensor descriptor lowering to support cases where descriptors or pointers are created within control flow constructs.

  • Add updateAdvanceOpChain to recursively update chains of AdvanceOp users.
  • Refactor rewriteStoreOp to trace back through AdvanceOp chains before creating new MakeTensorPtrOp.
  • Rewrite the TensorDesc-to-block-pointer pass to drop old descriptor lookup, always find or create MakeTensorPtrOp, and handle loop-carried block pointer types.

Reviewed Changes

Copilot reviewed 2 out of 5 changed files in this pull request and generated 2 comments.

File Description
third_party/intel/lib/TritonIntelGPUTransforms/RemoveLayoutConversions.cpp Added recursive chain update, improved rewriteStoreOp, and added verification asserts after each transformation stage.
third_party/intel/lib/Dialect/Triton/Transforms/TensorDescToBlockPointer.cpp Removed legacy descriptor lookup, consolidated pointer creation via findOrCreateMakeTensorPtr, replaced offset logic, and updated loop argument types.
Files not reviewed (3)
  • test/Triton/Intel/TensorDescToBlockPointer/basic.mlir: Language not supported
  • test/Triton/Intel/TensorDescToBlockPointer/loop.mlir: Language not supported
  • test/TritonIntelGPU/backward_combine_dpas_dot_layout.mlir: Language not supported
Comments suppressed due to low confidence (1)

third_party/intel/lib/TritonIntelGPUTransforms/RemoveLayoutConversions.cpp:793

  • The variable value is undefined in this scope, leading to a compile error. It should reference the store's value (e.g., storeOp.getValue()) or the converted value extracted earlier.
Value dataToStore = getValueAs(value, encoding);

@etiotto
Copy link
Contributor Author

etiotto commented May 22, 2025

@whitneywhtsang I have split this pull request and move the part that deals with removing layout conversion for store operation that use a block ptr updated by a tt.advance operation in PR #4277

@etiotto etiotto requested a review from whitneywhtsang May 22, 2025 18:28
@whitneywhtsang
Copy link
Contributor

@whitneywhtsang I have split this pull request and move the part that deals with removing layout conversion for store operation that use a block ptr updated by a tt.advance operation in PR #4277

Thanks, this part of the code LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[tensor-descriptor]: Improve translation for make_tensor_ptr operations in control flow
5 participants