-
Notifications
You must be signed in to change notification settings - Fork 62
Fix Kineto+PTI profiling on BMG #4244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
This reverts commit fd8b689.
Signed-off-by: Anatoly Myachev <[email protected]>
FYI @etiotto @whitneywhtsang gemm benchmark with tensor of pointer doesn't work correctly on bmg https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/15122978565/job/42509360103 |
Signed-off-by: Anatoly Myachev <[email protected]>
This reverts commit a8f0d09.
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
@@ -141,7 +142,10 @@ jobs: | |||
python build_report.py $REPORTS/matmul-performance.csv $REPORTS/gemm-triton-report.csv --benchmark gemm-legacy --compiler triton --param_cols "B,M,K,N" --tflops_col Triton-TFlops --hbm_col "Triton-GB/s" --tag $TAG | |||
python build_report.py $REPORTS/matmul-performance.csv $REPORTS/gemm-xetla-report.csv --benchmark gemm-legacy --compiler xetla --param_cols "B,M,K,N" --tflops_col XeTLA-TFlops --hbm_col "XeTLA-GB/s" --tag $TAG | |||
python build_report.py $REPORTS/matmul-performance.csv $REPORTS/gemm-onednn-report.csv --benchmark gemm-legacy --compiler onednn --param_cols "B,M,K,N" --tflops_col OneDNN-TFlops --hbm_col "OneDNN-GB/s" --tag $TAG | |||
python build_report.py $REPORTS/matmul-performance.csv $REPORTS/gemm-cutlass-report.csv --benchmark gemm-legacy --compiler cutlass --param_cols "B,M,K,N" --tflops_col CUTLASS-TFlops --hbm_col "CUTLASS-GB/s" --tag $TAG | |||
if [[ "${{ inputs.runner_label }}" = "max1550" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cutlass on BMG currently doesn't work: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/15121382503/job/42504328424
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked through #4254.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sommerlukas!
This reverts commit 2a6ca23.
This reverts commit 17d2a5d.
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
…-triton into amyachev/issue4172 Signed-off-by: Anatoly Myachev <[email protected]>
0356811
to
e03e521
Compare
Signed-off-by: Anatoly Myachev <[email protected]>
@@ -141,7 +148,10 @@ jobs: | |||
source ../../scripts/capture-hw-details.sh | |||
python build_report.py $REPORTS/matmul-performance-base.csv $REPORTS/gemm-newshapes-triton-report.csv --benchmark gemm --compiler triton --param_cols "B,M,K,N" --tflops_col Triton-TFlops --hbm_col "Triton-GB/s" --tag $TAG | |||
python build_report.py $REPORTS/matmul-performance-base.csv $REPORTS/gemm-newshapes-onednn-report.csv --benchmark gemm --compiler onednn --param_cols "B,M,K,N" --tflops_col OneDNN-TFlops --hbm_col "OneDNN-GB/s" --tag $TAG | |||
python build_report.py $REPORTS/matmul-performance-base.csv $REPORTS/gemm-newshapes-cutlass-report.csv --benchmark gemm --compiler cutlass --param_cols "B,M,K,N" --tflops_col CUTLASS-TFlops --hbm_col "CUTLASS-GB/s" --tag $TAG | |||
if [[ "${{ inputs.runner_label }}" = "max1550" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that inputs.runner_label
is not set by default, most like the condition is not met on max1550 (please double check the last run). Potentially you need something like this: ${{ inputs.runner_label || 'max1550' }}
.
@@ -74,7 +76,7 @@ jobs: | |||
timeout-minutes: 720 | |||
defaults: | |||
run: | |||
shell: bash -noprofile --norc -eo pipefail -c "source /opt/intel/oneapi/setvars.sh > /dev/null; source {0}" | |||
shell: bash -noprofile --norc -eo pipefail -c "source /opt/intel/oneapi/setvars.sh > /dev/null; export LD_LIBRARY_PATH=$PTI_LIBS_DIR:$LD_LIBRARY_PATH; source {0}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pbchekin do you have a suggestion how to fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to return it as it was, with duplication, but at least it works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea: we can add a new step in the very beginning (after installing python and intel-pti) that does not use default shell (so this code is not executed). In this step, create a file, for example, ~/.env
with
PTI_LIBS_DIR=...
source /opt/intel/oneapi/setvars.sh > /dev/null
export LD_LIBRARY_PATH=$PTI_LIBS_DIR:$LD_LIBRARY_PATH;
Then the default shell can be changed to
shell: bash -noprofile --norc -eo pipefail -c "[[ -f ~/.env ]] && source ~/.env; source {0}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If adjusting LD_LIBRARY_PATH is safer in each step, then my suggestion is to create a file and source
it in each step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If adjusting LD_LIBRARY_PATH is safer in each step, then my suggestion is to create a file and
source
it in each step
NVM, i see in the last commit you have only one additional line per step, this looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, thanks for the idea!
…enchmark section Signed-off-by: Anatoly Myachev <[email protected]>
I haven't found a workaround with the current version of pti that would work for all benchmarks. For example, setting the variable
PTI_DEVICE_SYNC_DELTA=1
fixes all benchmarks exceptprefix_sums
. If also update pti to version 0.12.2, then all benchmarks work.Note: This problem can also be related to the version of AGAMA. The developer machine has 1099 version (CI runner has 1133 version), it is enough to update pti to version 0.12.2 and everything works.
BMG CI:
https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/15121136044