Skip to content

Fix regression issue in flex decoding. #3999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 25, 2025

Conversation

chengjunlu
Copy link
Contributor

The tt.store operation with BlockPointer fallbacks to scatter store if the BLOCK shape or the value layout was not supported by the 2D BLOCK IO.
The lowering code would transform the BlockPointer to the pointers and masks.

The scatter store should apply the and to the maskElems if the llMask doesn't exsits.

@chengjunlu chengjunlu requested review from Copilot and alexbaden April 24, 2025 05:33
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a regression issue in flex decoding by updating block pointer handling and prefetch operations. Key changes include:

  • Updating the 2D prefetch shape calculation with a tensor shape bound.
  • Replacing an unreachable branch in prefetch conversion with a regular pointer prefetch handler.
  • Adjusting mask handling both in prefetch and load/store conversions.
Comments suppressed due to low confidence (1)

third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp:565

  • [nitpick] There are multiple debug print statements (e.g., 'johnlu debug:') in production-oriented code; consider removing or gating these under a debug flag to avoid cluttering output during normal execution.
llvm::outs() << "johnlu debug: operands " << (opIdx == DpasEncodingAttr::OpIdx::OperandA ? "A" : "B") << "\n";

@intel intel deleted a comment from Copilot AI Apr 24, 2025
Copy link
Contributor

@alexbaden alexbaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add a unit test. Maybe to test_core or test_block_loads in the Intel directory?

Copy link
Contributor

@etiotto etiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a lit test (unit test) to cover this bug.

@chengjunlu chengjunlu force-pushed the chengjun/fix_regression_in_flex_decoding branch 2 times, most recently from 5b9df4f to d055a4f Compare April 25, 2025 02:11
Copy link
Contributor

@alexbaden alexbaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for now. I think I prefer a python test here so we can compare to some known good output, vs the lit test which can be hard to parameterize / generalize to lots of different masks. If you have ideas for improving test_block_load.py to test this condition I would be very interested in hearing them!

@chengjunlu chengjunlu force-pushed the chengjun/fix_regression_in_flex_decoding branch 3 times, most recently from cbf6610 to 1225835 Compare April 25, 2025 02:19
@chengjunlu
Copy link
Contributor Author

Please add a lit test (unit test) to cover this bug.

Enhanced the LIT test case to cover the boundary check condition.

@chengjunlu
Copy link
Contributor Author

Ok for now. I think I prefer a python test here so we can compare to some known good output, vs the lit test which can be hard to parameterize / generalize to lots of different masks. If you have ideas for improving test_block_load.py to test this condition I would be very interested in hearing them!

I will add the unit test in another PR #3876

@chengjunlu chengjunlu merged commit 935ded3 into main Apr 25, 2025
9 checks passed
@chengjunlu chengjunlu deleted the chengjun/fix_regression_in_flex_decoding branch April 25, 2025 05:06
…nter. There are `maskElems` values if the `llMask` is null for BlockPointer case.

Signed-off-by: Lu,Chengjun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FlexAttention] Accuracy issues during running FlexDecoding UT
4 participants