[DEBUG] Try run `FlexAttn` in parallel #4376

anmyachev · 2025-05-30T16:52:19Z

PyTorch CI:

https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/15351655191 (-n 16)
https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/15362112329 (-n 8 --reruns 2, py3.9)
https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/15366085901 (n 8 --reruns 2, py3.10)
https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/15395523812 (-n 32 then usual run)
https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/15416558343 (-n 16 then usual run)
https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/15420897482 (-n 16 --reruns 2)
https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/15424438097 ( n 16 --reruns 2 with new PyTorch pin)

Seems quite fast (without decoding). Before this, the tests were running in one process, apparently.

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
===== 101 failed, 496 passed, 58 skipped, 1 xfailed in 2611.01s (0:43:31) ======

Although I see a lot of errors, perhaps due to parallelism. We can experiment and choose the most successful combination. (It might also be good to enable restart in case of errors)

FYI @pbchekin @vlad-penkin @alexbaden @chengjunlu @whitneywhtsang @etiotto

Signed-off-by: Anatoly Myachev <[email protected]>

pbchekin · 2025-05-30T19:59:07Z

Good results! We can add a dedicated workflow for Flex Attention. The number of workers needs to be a parameter to adjust it for client GPUs. Also it would be nice to identify the root cause for failures, looks like accuracy errors in most cases, not sure if it is due to parallelism.

anmyachev · 2025-06-02T12:10:19Z

Part of these problems I see in our usual run https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/15373802137/job/43256311378#step:8:14641:

The following tests failed consistently: ['test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s0_v_s0_do_s0_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s0_v_s0_do_s1_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s0_v_s0_do_s2_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s1_v_s1_do_s0_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s1_v_s1_do_s1_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s1_v_s1_do_s2_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s2_v_s2_do_s0_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s2_v_s2_do_s1_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s2_v_s2_do_s2_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s3_v_s3_do_s0_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s3_v_s3_do_s1_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s3_v_s3_do_s2_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s0_v_s0_do_s0_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s0_v_s0_do_s1_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s0_v_s0_do_s2_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s1_v_s1_do_s0_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s1_v_s1_do_s1_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s1_v_s1_do_s2_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s2_v_s2_do_s0_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s2_v_s2_do_s1_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s2_v_s2_do_s2_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s3_v_s3_do_s0_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s3_v_s3_do_s1_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s3_v_s3_do_s2_xpu_float16', 'test/inductor/test_flex_attention.py::TestLearnableBiasesXPU::test_head_specific_gate_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_mode_default_xpu', 'test/inductor/test_flex_attention.py::TestLearnableBiasesXPU::test_head_specific_gate_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_xpu', 'test/inductor/test_flex_attention.py::TestLearnableBiasesXPU::test_head_specific_gate_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_mode_default_xpu', 'test/inductor/test_flex_attention.py::TestLearnableBiasesXPU::test_head_specific_gate_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_xpu', 'test/inductor/test_flex_attention.py::TestLearnableBiasesXPU::test_head_specific_gate_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_default_xpu', 'test/inductor/test_flex_attention.py::TestLearnableBiasesXPU::test_head_specific_gate_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_xpu', 'test/inductor/test_flex_attention.py::TestLearnableBiasesXPU::test_relative_1d_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_xpu']

whitneywhtsang · 2025-06-02T14:24:23Z

Part of these problems I see in our usual run https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/15373802137/job/43256311378#step:8:14641:

The following tests failed consistently: ['test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s0_v_s0_do_s0_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s0_v_s0_do_s1_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s0_v_s0_do_s2_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s1_v_s1_do_s0_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s1_v_s1_do_s1_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s1_v_s1_do_s2_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s2_v_s2_do_s0_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s2_v_s2_do_s1_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s2_v_s2_do_s2_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s3_v_s3_do_s0_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s3_v_s3_do_s1_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s0_k_s3_v_s3_do_s2_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s0_v_s0_do_s0_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s0_v_s0_do_s1_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s0_v_s0_do_s2_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s1_v_s1_do_s0_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s1_v_s1_do_s1_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s1_v_s1_do_s2_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s2_v_s2_do_s0_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s2_v_s2_do_s1_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s2_v_s2_do_s2_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s3_v_s3_do_s0_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s3_v_s3_do_s1_xpu_float16', 'test/inductor/test_flex_attention.py::TestFlexAttentionXPU::test_strided_inputs_q_s1_k_s3_v_s3_do_s2_xpu_float16', 'test/inductor/test_flex_attention.py::TestLearnableBiasesXPU::test_head_specific_gate_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_mode_default_xpu', 'test/inductor/test_flex_attention.py::TestLearnableBiasesXPU::test_head_specific_gate_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_xpu', 'test/inductor/test_flex_attention.py::TestLearnableBiasesXPU::test_head_specific_gate_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_mode_default_xpu', 'test/inductor/test_flex_attention.py::TestLearnableBiasesXPU::test_head_specific_gate_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_xpu', 'test/inductor/test_flex_attention.py::TestLearnableBiasesXPU::test_head_specific_gate_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_default_xpu', 'test/inductor/test_flex_attention.py::TestLearnableBiasesXPU::test_head_specific_gate_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_xpu', 'test/inductor/test_flex_attention.py::TestLearnableBiasesXPU::test_relative_1d_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_xpu']

@chengjunlu investigated above failures before, and believe they are issues in SYCL.
TestFlexAttentionXPU::test_strided_inputs failures are fixed in PyTorch main.

scripts/test-pytorch.sh

anmyachev · 2025-06-03T13:26:54Z

Looks much better with last PyTorch pin update (https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/15416558343/job/43380600584):

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
====== 7 failed, 633 passed, 15 skipped, 1 xfailed in 2571.39s (0:42:51) =======

etiotto · 2025-06-03T14:58:53Z

If you get intermittent failures you can try to rerun the failed tests (in parallel again, or perhaps sequentially). If you rerun in parallel you may have to rerun again...

…el#4376) There was no test coverage for this. I discovered this while implementing the CPU backend.

Try run FlexAttn in parallel

1108d68

Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev changed the title ~~Try run FlexAttn in parallel~~ [DEBUG] Try run FlexAttn in parallel May 30, 2025

Update test-pytorch.sh

eb73f52

anmyachev commented Jun 2, 2025

View reviewed changes

scripts/test-pytorch.sh Outdated Show resolved Hide resolved

anmyachev added 3 commits June 2, 2025 16:53

Update scripts/test-pytorch.sh

d51cb83

Merge branch 'main' into amyachev/flex-attn

f2a75f6

Update test-pytorch.sh

59e078d

anmyachev added 2 commits June 3, 2025 17:02

Update test-pytorch.sh

58e5690

Merge branch 'main' into amyachev/flex-attn

9b52253

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DEBUG] Try run `FlexAttn` in parallel #4376

[DEBUG] Try run `FlexAttn` in parallel #4376

Uh oh!

anmyachev commented May 30, 2025 •

edited

Loading

Uh oh!

pbchekin commented May 30, 2025

Uh oh!

anmyachev commented Jun 2, 2025

Uh oh!

whitneywhtsang commented Jun 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

anmyachev commented Jun 3, 2025

Uh oh!

etiotto commented Jun 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

[DEBUG] Try run FlexAttn in parallel #4376

Are you sure you want to change the base?

[DEBUG] Try run FlexAttn in parallel #4376

Uh oh!

Conversation

anmyachev commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pbchekin commented May 30, 2025

Uh oh!

anmyachev commented Jun 2, 2025

Uh oh!

whitneywhtsang commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

anmyachev commented Jun 3, 2025

Uh oh!

etiotto commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

[DEBUG] Try run `FlexAttn` in parallel #4376

[DEBUG] Try run `FlexAttn` in parallel #4376

anmyachev commented May 30, 2025 •

edited

Loading

whitneywhtsang commented Jun 2, 2025 •

edited

Loading

etiotto commented Jun 3, 2025 •

edited

Loading