auto">🐛 Describe the bug

Compared with v3.9.1, performance has some dropped (higher 3.10 is better).

Category	Model	Eager Ratio	Inductor Ratio
torchbench_amp_bf16_inference	cm3leon_generate	0.711793753	0.628823929
torchbench_amp_fp16_inference	cm3leon_generate	0.893158604	0.871240602
torchbench_bfloat16_inference	cm3leon_generate	0.711048632	0.602583587

Reproducer

pip install --pre torch==2.9.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/xpu
# 3.10 pytorch wheel: GH_TOKEN=xxx gh --repo pytorch/pytorch run download 18678105868 -n manywheel-py3_10-xpu
pip install pandas psutil scipy
git clone https://github.com/pytorch/benchmark
cd benchmark
pip install -r requirements.txt
python install.py cm3leon_generate

python benchmarks/dynamo/torchbench.py --performance --inference -d xpu -n10 --bfloat16  --only cm3leon_generate --cold-start-latency --backend=inductor --disable-cudagraphs

Versions

Name	oneDNN v3.10	v3.9.1
oneDNN	v3.10-rc	v3.9.1
Device	PVC 1100	PVC 1100
OS	Ubuntu 22.04.2 LTS	Ubuntu 22.04.2 LTS
Driver	25.18.33578.38-1146~22.04	25.18.33578.38-1146~22.04
IGC	2.11.27-1146~22.04	2.11.27-1146~22.04
Level Zero	1.21.9.0 -1136~22.04	1.21.9.0 -1136~22.04
Torch	release/2.9 rc9	release/2.9 rc9
Torch-xpu-ops	pinned	pinned
Triton	3.5.0+git1b0418a9	3.5.0+git1b0418a9
Transformers	4.56.0	4.56.0
Torchvision	0.24.0+xpu	0.24.0+xpu
Torchaudio	2.9.0+xpu	2.9.0+xpu
Torchbench	74a23feff57432129df84d8099e622773cf77925	74a23feff57432129df84d8099e622773cf77925
Timms	5d535d7a2d4b435b1b5c1177fd8f04a12b942b9a	5d535d7a2d4b435b1b5c1177fd8f04a12b942b9a
Bundle	2025.2.1	2025.2.1

[oneDNN] cm3leon_generate bf16 & amp_bf16 inference performance has dropped ~ 40% #2223

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions