Skip to content

[oneDNN] cm3leon_generate bf16 & amp_bf16 inference performance has dropped ~ 40% #2223

@mengfei25

Description

@mengfei25

🐛 Describe the bug

Compared with v3.9.1, performance has some dropped (higher 3.10 is better).

Category Model Eager Ratio Inductor Ratio
torchbench_amp_bf16_inference cm3leon_generate 0.711793753 0.628823929
torchbench_amp_fp16_inference cm3leon_generate 0.893158604 0.871240602
torchbench_bfloat16_inference cm3leon_generate 0.711048632 0.602583587

Reproducer

pip install --pre torch==2.9.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/xpu
# 3.10 pytorch wheel: GH_TOKEN=xxx gh --repo pytorch/pytorch run download 18678105868 -n manywheel-py3_10-xpu
pip install pandas psutil scipy
git clone https://github.com/pytorch/benchmark
cd benchmark
pip install -r requirements.txt
python install.py cm3leon_generate

python benchmarks/dynamo/torchbench.py --performance --inference -d xpu -n10 --bfloat16  --only cm3leon_generate --cold-start-latency --backend=inductor --disable-cudagraphs

Versions

Name oneDNN v3.10 v3.9.1
oneDNN v3.10-rc v3.9.1
Device PVC 1100 PVC 1100
OS Ubuntu 22.04.2 LTS Ubuntu 22.04.2 LTS
Driver 25.18.33578.38-1146~22.04 25.18.33578.38-1146~22.04
IGC 2.11.27-1146~22.04 2.11.27-1146~22.04
Level Zero 1.21.9.0 -1136~22.04 1.21.9.0 -1136~22.04
Torch release/2.9 rc9 release/2.9 rc9
Torch-xpu-ops pinned pinned
Triton 3.5.0+git1b0418a9 3.5.0+git1b0418a9
Transformers 4.56.0 4.56.0
Torchvision 0.24.0+xpu 0.24.0+xpu
Torchaudio 2.9.0+xpu 2.9.0+xpu
Torchbench 74a23feff57432129df84d8099e622773cf77925 74a23feff57432129df84d8099e622773cf77925
Timms 5d535d7a2d4b435b1b5c1177fd8f04a12b942b9a 5d535d7a2d4b435b1b5c1177fd8f04a12b942b9a
Bundle 2025.2.1 2025.2.1

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions