[FlexAttention] FlexDecoding accuracy discrepancy between XPU and CUDA while compiling `torch.ops.higher_order.flex_attention` #3588

hoshibara · 2025-03-03T09:18:04Z

Describe the bug

Issue Description:
While running the XPU FlexDecoding UT, we found a test failure due to a tensor mismatch.

python test/inductor/test_flex_decoding.py TestFlexDecoding.test_paged_attention_page_size_float16_score_mod1_head_dims1_page_size_256

We captured the compiled results on both XPU and CUDA, unified their outputs, and found that running the Triton codes generated by UT will get different results.

triton-code.zip

Environment details

You can refer to this issue to setup a reproducing environment:
#3518

liangan1 · 2025-03-10T02:29:27Z

@vlad-penkin is there any updates about this issue?

chengjunlu · 2025-03-19T06:17:27Z

Is this issue overlapped with #3631
Can we close this to as a duplicated issue?

hoshibara · 2025-03-19T07:19:54Z

Is this issue overlapped with #3631 Can we close this to as a duplicated issue?

Yes, I kept this issue open only because it has already been tagged.

hoshibara changed the title ~~[FlexDecoding] FlexDecoding accuracy discrepancy between XPU and CUDA while compiling torch.ops.higher_order.flex_attention~~ [FlexDecoding] FlexDecoding accuracy discrepancy between XPU and CUDA while compiling torch.ops.higher_order.flex_attention Mar 3, 2025

hoshibara changed the title ~~[FlexDecoding] FlexDecoding accuracy discrepancy between XPU and CUDA while compiling torch.ops.higher_order.flex_attention~~ [FlexAttention] FlexDecoding accuracy discrepancy between XPU and CUDA while compiling torch.ops.higher_order.flex_attention Mar 3, 2025

vlad-penkin added this to the 1. [PT 2.7 Upstream] TorchInductor milestone Mar 3, 2025

vlad-penkin added upstream: pytorch accuracy labels Mar 3, 2025

vlad-penkin added codegen: attention tests: torchinductor labels Mar 12, 2025

vlad-penkin modified the milestones: 1. [PT 2.7 Upstream] TorchInductor, 1. [PT 2.8 Upstream] TorchInductor Mar 17, 2025

hoshibara closed this as completed Mar 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FlexAttention] FlexDecoding accuracy discrepancy between XPU and CUDA while compiling `torch.ops.higher_order.flex_attention` #3588

[FlexAttention] FlexDecoding accuracy discrepancy between XPU and CUDA while compiling `torch.ops.higher_order.flex_attention` #3588

hoshibara commented Mar 3, 2025

liangan1 commented Mar 10, 2025

Uh oh!

chengjunlu commented Mar 19, 2025 •

edited

Loading

Uh oh!

hoshibara commented Mar 19, 2025

Uh oh!

[FlexAttention] FlexDecoding accuracy discrepancy between XPU and CUDA while compiling torch.ops.higher_order.flex_attention #3588

[FlexAttention] FlexDecoding accuracy discrepancy between XPU and CUDA while compiling torch.ops.higher_order.flex_attention #3588

Comments

hoshibara commented Mar 3, 2025

Describe the bug

Environment details

liangan1 commented Mar 10, 2025

Uh oh!

chengjunlu commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hoshibara commented Mar 19, 2025

Uh oh!

[FlexAttention] FlexDecoding accuracy discrepancy between XPU and CUDA while compiling `torch.ops.higher_order.flex_attention` #3588

[FlexAttention] FlexDecoding accuracy discrepancy between XPU and CUDA while compiling `torch.ops.higher_order.flex_attention` #3588

chengjunlu commented Mar 19, 2025 •

edited

Loading