🐛 Describe the bug
When adding 'export TORCHINDUCTOR_ONLINE_SOFTMAX=0', performance of some models will improve on BMG linux. (PS: unit is ms)
| Senario |
Model |
Batch Size |
2.11.0.dev20251222+xpu |
2.11.0.dev20251222+xpu + TORCHINDUCTOR_ONLINE_SOFTMAX=0 |
| Inference |
hf_T5 |
16 |
467.9 |
409.29 |
| Inference |
hf_T5_base |
1 |
82.45 |
73.765 |
| Inference |
hf_T5_generate |
16 |
144.32 |
129.137 |
| Inference |
OPTForCausalLM |
16 |
387.27 |
336.864 |
Versions
Pytorch: 2.11.0.dev20251222+xpu
HW: BMG linux