Fix Kineto+PTI profiling on BMG #4244

anmyachev · 2025-05-19T18:51:50Z

I haven't found a workaround with the current version of pti that would work for all benchmarks. For example, setting the variable PTI_DEVICE_SYNC_DELTA=1 fixes all benchmarks except prefix_sums. If also update pti to version 0.12.2, then all benchmarks work.

Note: This problem can also be related to the version of AGAMA. The developer machine has 1099 version (CI runner has 1133 version), it is enough to update pti to version 0.12.2 and everything works.

BMG CI:

Signed-off-by: Anatoly Myachev <[email protected]>

This reverts commit fd8b689.

Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev · 2025-05-19T21:44:44Z

FYI @etiotto @whitneywhtsang gemm benchmark with tensor of pointer doesn't work correctly on bmg https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/15122978565/job/42509360103

Signed-off-by: Anatoly Myachev <[email protected]>

This reverts commit a8f0d09.

Signed-off-by: Anatoly Myachev <[email protected]>

.github/workflows/triton-benchmarks.yml

anmyachev · 2025-05-20T17:52:44Z

.github/workflows/triton-benchmarks.yml

@@ -141,7 +142,10 @@ jobs:
          python build_report.py $REPORTS/matmul-performance.csv $REPORTS/gemm-triton-report.csv --benchmark gemm-legacy --compiler triton --param_cols "B,M,K,N" --tflops_col Triton-TFlops --hbm_col "Triton-GB/s" --tag $TAG
          python build_report.py $REPORTS/matmul-performance.csv $REPORTS/gemm-xetla-report.csv --benchmark gemm-legacy --compiler xetla --param_cols "B,M,K,N" --tflops_col XeTLA-TFlops --hbm_col "XeTLA-GB/s" --tag $TAG
          python build_report.py $REPORTS/matmul-performance.csv $REPORTS/gemm-onednn-report.csv --benchmark gemm-legacy --compiler onednn --param_cols "B,M,K,N" --tflops_col OneDNN-TFlops --hbm_col "OneDNN-GB/s" --tag $TAG
-          python build_report.py $REPORTS/matmul-performance.csv $REPORTS/gemm-cutlass-report.csv --benchmark gemm-legacy --compiler cutlass --param_cols "B,M,K,N" --tflops_col CUTLASS-TFlops --hbm_col "CUTLASS-GB/s" --tag $TAG
+          if [[ "${{ inputs.runner_label }}" = "max1550" ]]; then


cutlass on BMG currently doesn't work: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/15121382503/job/42504328424

Tracked through #4254.

Thanks @sommerlukas!

This reverts commit 2a6ca23.

This reverts commit 17d2a5d.

Signed-off-by: Anatoly Myachev <[email protected]>

.github/workflows/triton-benchmarks.yml

Signed-off-by: Anatoly Myachev <[email protected]>

…-triton into amyachev/issue4172 Signed-off-by: Anatoly Myachev <[email protected]>

Signed-off-by: Anatoly Myachev <[email protected]>

pbchekin · 2025-05-21T14:04:46Z

.github/workflows/triton-benchmarks.yml

@@ -141,7 +148,10 @@ jobs:
          source ../../scripts/capture-hw-details.sh
          python build_report.py $REPORTS/matmul-performance-base.csv $REPORTS/gemm-newshapes-triton-report.csv --benchmark gemm --compiler triton --param_cols "B,M,K,N" --tflops_col Triton-TFlops --hbm_col "Triton-GB/s" --tag $TAG
          python build_report.py $REPORTS/matmul-performance-base.csv $REPORTS/gemm-newshapes-onednn-report.csv --benchmark gemm --compiler onednn --param_cols "B,M,K,N" --tflops_col OneDNN-TFlops --hbm_col "OneDNN-GB/s" --tag $TAG
-          python build_report.py $REPORTS/matmul-performance-base.csv $REPORTS/gemm-newshapes-cutlass-report.csv --benchmark gemm --compiler cutlass --param_cols "B,M,K,N" --tflops_col CUTLASS-TFlops --hbm_col "CUTLASS-GB/s" --tag $TAG
+          if [[ "${{ inputs.runner_label }}" = "max1550" ]]; then


Note that inputs.runner_label is not set by default, most like the condition is not met on max1550 (please double check the last run). Potentially you need something like this: ${{ inputs.runner_label || 'max1550' }}.

anmyachev · 2025-05-21T14:10:43Z

.github/workflows/triton-benchmarks.yml

@@ -74,7 +76,7 @@ jobs:
    timeout-minutes: 720
    defaults:
      run:
-        shell: bash -noprofile --norc -eo pipefail -c "source /opt/intel/oneapi/setvars.sh > /dev/null; source {0}"
+        shell: bash -noprofile --norc -eo pipefail -c "source /opt/intel/oneapi/setvars.sh > /dev/null; export LD_LIBRARY_PATH=$PTI_LIBS_DIR:$LD_LIBRARY_PATH; source {0}"


@pbchekin do you have a suggestion how to fix it?

I decided to return it as it was, with duplication, but at least it works

Just an idea: we can add a new step in the very beginning (after installing python and intel-pti) that does not use default shell (so this code is not executed). In this step, create a file, for example, ~/.env with

PTI_LIBS_DIR=... source /opt/intel/oneapi/setvars.sh > /dev/null export LD_LIBRARY_PATH=$PTI_LIBS_DIR:$LD_LIBRARY_PATH;

Then the default shell can be changed to

shell: bash -noprofile --norc -eo pipefail -c "[[ -f ~/.env ]] && source ~/.env; source {0}"

If adjusting LD_LIBRARY_PATH is safer in each step, then my suggestion is to create a file and source it in each step

If adjusting LD_LIBRARY_PATH is safer in each step, then my suggestion is to create a file and source it in each step

NVM, i see in the last commit you have only one additional line per step, this looks good.

BTW, thanks for the idea!

…enchmark section Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev added 6 commits May 19, 2025 20:50

Try intel-pti 0.12.2 for benchmarks

81f79d2

Signed-off-by: Anatoly Myachev <[email protected]>

REVERTME

a8f0d09

Signed-off-by: Anatoly Myachev <[email protected]>

Specify: PTI_DEVICE_SYNC_DELTA=1

4c3bbb3

Signed-off-by: Anatoly Myachev <[email protected]>

specify intel_gpu_bmg_g21

fd8b689

Signed-off-by: Anatoly Myachev <[email protected]>

Revert "specify intel_gpu_bmg_g21"

2cb0d41

This reverts commit fd8b689.

don't run cutlass on bmg

3d80f0d

Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev and others added 4 commits May 20, 2025 13:21

try PTI_DEVICE_SYNC_DELTA=10000 with pti 0.12.0

17d2a5d

Signed-off-by: Anatoly Myachev <[email protected]>

Merge branch 'main' into amyachev/issue4172

e00aaff

Revert "REVERTME"

a2c22f5

This reverts commit a8f0d09.

cleanup

2a6ca23

Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev changed the title ~~[DEBUG] Try intel-pti 0.12.2 for benchmarks~~ Fix Kineto+PTI profiling on BMG May 20, 2025

anmyachev linked an issue May 20, 2025 that may be closed by this pull request

[benchmarks][BMG] The profiling numbers don't match #4172

Closed

fix lint

933eb33

Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev commented May 20, 2025

View reviewed changes

.github/workflows/triton-benchmarks.yml Show resolved Hide resolved

anmyachev commented May 20, 2025

View reviewed changes

anmyachev added 3 commits May 20, 2025 21:19

Revert "cleanup"

f72e4f0

This reverts commit 2a6ca23.

Revert "try PTI_DEVICE_SYNC_DELTA=10000 with pti 0.12.0"

5bcdf3d

This reverts commit 17d2a5d.

cleanup

5f96ac7

Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev marked this pull request as ready for review May 21, 2025 10:32

anmyachev requested review from pbchekin, whitneywhtsang and sommerlukas May 21, 2025 10:32

whitneywhtsang reviewed May 21, 2025

View reviewed changes

.github/workflows/triton-benchmarks.yml Show resolved Hide resolved

.github/workflows/triton-benchmarks.yml Outdated Show resolved Hide resolved

.github/workflows/triton-benchmarks.yml Outdated Show resolved Hide resolved

anmyachev added 3 commits May 21, 2025 15:13

address review comments

88b808b

Signed-off-by: Anatoly Myachev <[email protected]>

try setup 'LD_LIBRARY_PATH' in 'defaults:' section

61f0107

Signed-off-by: Anatoly Myachev <[email protected]>

Merge branch 'main' of https://github.com/intel/intel-xpu-backend-for…

e03e521

…-triton into amyachev/issue4172 Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev force-pushed the amyachev/issue4172 branch from 0356811 to e03e521 Compare May 21, 2025 13:25

whitneywhtsang approved these changes May 21, 2025

View reviewed changes

sommerlukas removed their request for review May 21, 2025 13:31

sommerlukas requested a review from jle-quel May 21, 2025 13:31

fix

f4ff3f7

Signed-off-by: Anatoly Myachev <[email protected]>

pbchekin reviewed May 21, 2025

View reviewed changes

anmyachev commented May 21, 2025

View reviewed changes

address review comments and return export LD_LIBRARY_PATH into each b…

2ec88c4

…enchmark section Signed-off-by: Anatoly Myachev <[email protected]>

pbchekin approved these changes May 21, 2025

View reviewed changes

anmyachev merged commit 3f3bcf3 into main May 21, 2025
16 of 17 checks passed

anmyachev deleted the amyachev/issue4172 branch May 21, 2025 16:49

whitneywhtsang mentioned this pull request May 23, 2025

[BENCHMARK][GEMM] Fix CUTLASS benchmark when running on BMG #4274

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Kineto+PTI profiling on BMG #4244

Fix Kineto+PTI profiling on BMG #4244

Uh oh!

anmyachev commented May 19, 2025 •

edited

Loading

Uh oh!

anmyachev commented May 19, 2025

Uh oh!

Uh oh!

anmyachev May 20, 2025

Uh oh!

sommerlukas May 21, 2025

Uh oh!

anmyachev May 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pbchekin May 21, 2025

Uh oh!

anmyachev May 21, 2025

Uh oh!

anmyachev May 21, 2025

Uh oh!

pbchekin May 21, 2025 •

edited

Loading

Uh oh!

pbchekin May 21, 2025

Uh oh!

pbchekin May 21, 2025

Uh oh!

anmyachev May 21, 2025

Uh oh!

Uh oh!

Uh oh!

Fix Kineto+PTI profiling on BMG #4244

Fix Kineto+PTI profiling on BMG #4244

Uh oh!

Conversation

anmyachev commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anmyachev commented May 19, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pbchekin May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

anmyachev commented May 19, 2025 •

edited

Loading

pbchekin May 21, 2025 •

edited

Loading