Fix regression issue in flex decoding. #3999

chengjunlu · 2025-04-24T05:33:48Z

The tt.store operation with BlockPointer fallbacks to scatter store if the BLOCK shape or the value layout was not supported by the 2D BLOCK IO.
The lowering code would transform the BlockPointer to the pointers and masks.

The scatter store should apply the and to the maskElems if the llMask doesn't exsits.

Copilot

Pull Request Overview

This PR fixes a regression issue in flex decoding by updating block pointer handling and prefetch operations. Key changes include:

Updating the 2D prefetch shape calculation with a tensor shape bound.
Replacing an unreachable branch in prefetch conversion with a regular pointer prefetch handler.
Adjusting mask handling both in prefetch and load/store conversions.

Comments suppressed due to low confidence (1)

third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp:565

[nitpick] There are multiple debug print statements (e.g., 'johnlu debug:') in production-oriented code; consider removing or gating these under a debug flag to avoid cluttering output during normal execution.

llvm::outs() << "johnlu debug: operands " << (opIdx == DpasEncodingAttr::OpIdx::OperandA ? "A" : "B") << "\n";

alexbaden

We need to add a unit test. Maybe to test_core or test_block_loads in the Intel directory?

etiotto

Please add a lit test (unit test) to cover this bug.

alexbaden

Ok for now. I think I prefer a python test here so we can compare to some known good output, vs the lit test which can be hard to parameterize / generalize to lots of different masks. If you have ideas for improving test_block_load.py to test this condition I would be very interested in hearing them!

chengjunlu · 2025-04-25T02:20:05Z

Please add a lit test (unit test) to cover this bug.

Enhanced the LIT test case to cover the boundary check condition.

chengjunlu · 2025-04-25T02:22:14Z

Ok for now. I think I prefer a python test here so we can compare to some known good output, vs the lit test which can be hard to parameterize / generalize to lots of different masks. If you have ideas for improving test_block_load.py to test this condition I would be very interested in hearing them!

I will add the unit test in another PR #3876

…nter. There are `maskElems` values if the `llMask` is null for BlockPointer case. Signed-off-by: Lu,Chengjun <[email protected]>

chengjunlu requested review from Copilot and alexbaden April 24, 2025 05:33

Copilot AI reviewed Apr 24, 2025

View reviewed changes

chengjunlu force-pushed the chengjun/fix_regression_in_flex_decoding branch from 82da241 to 73c3fbc Compare April 24, 2025 05:41

chengjunlu requested a review from ESI-SYD April 24, 2025 05:44

chengjunlu mentioned this pull request Apr 24, 2025

[FlexAttention] Accuracy issues during running FlexDecoding UT #3631

Closed

chengjunlu linked an issue Apr 24, 2025 that may be closed by this pull request

[FlexAttention] Accuracy issues during running FlexDecoding UT #3631

Closed

chengjunlu mentioned this pull request Apr 24, 2025

Add more flex attention cases to benchmark. #3928

Merged

ESI-SYD approved these changes Apr 24, 2025

View reviewed changes

intel deleted a comment from Copilot AI Apr 24, 2025

alexbaden reviewed Apr 24, 2025

View reviewed changes

etiotto reviewed Apr 24, 2025

View reviewed changes

chengjunlu force-pushed the chengjun/fix_regression_in_flex_decoding branch 2 times, most recently from 5b9df4f to d055a4f Compare April 25, 2025 02:11

alexbaden approved these changes Apr 25, 2025

View reviewed changes

chengjunlu force-pushed the chengjun/fix_regression_in_flex_decoding branch 3 times, most recently from cbf6610 to 1225835 Compare April 25, 2025 02:19

chengjunlu merged commit 935ded3 into main Apr 25, 2025
9 checks passed

chengjunlu deleted the chengjun/fix_regression_in_flex_decoding branch April 25, 2025 05:06

Fix the boundary check issue in lowering the tt.store with BlockPoi…

1225835

…nter. There are `maskElems` values if the `llMask` is null for BlockPointer case. Signed-off-by: Lu,Chengjun <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix regression issue in flex decoding. #3999

Fix regression issue in flex decoding. #3999

Uh oh!

chengjunlu commented Apr 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

alexbaden left a comment

Uh oh!

etiotto left a comment

Uh oh!

alexbaden left a comment

Uh oh!

chengjunlu commented Apr 25, 2025

Uh oh!

chengjunlu commented Apr 25, 2025

Uh oh!

Uh oh!

Uh oh!

Fix regression issue in flex decoding. #3999

Fix regression issue in flex decoding. #3999

Uh oh!

Conversation

chengjunlu commented Apr 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

alexbaden left a comment

Choose a reason for hiding this comment

Uh oh!

etiotto left a comment

Choose a reason for hiding this comment

Uh oh!

alexbaden left a comment

Choose a reason for hiding this comment

Uh oh!

chengjunlu commented Apr 25, 2025

Uh oh!

chengjunlu commented Apr 25, 2025

Uh oh!

Uh oh!

Uh oh!