Skip to content

The benchmarks does not work on CPU because of AMP #9119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
haifeng-jin opened this issue May 8, 2025 · 5 comments · May be fixed by #9218
Open

The benchmarks does not work on CPU because of AMP #9119

haifeng-jin opened this issue May 8, 2025 · 5 comments · May be fixed by #9218
Assignees
Labels
benchmarking bug Something isn't working

Comments

@haifeng-jin
Copy link
Collaborator

haifeng-jin commented May 8, 2025

🐛 Bug

The benchmarks in pytorch/xla does not work on CPU because of it is set to use AMP by default.

To Reproduce

Steps to reproduce the behavior:

  1. Follow the instruction to run the benchmarks in the README.md.
  2. Or run this command directly
python xla/benchmarks/experiment_runner.py --dynamo=openxla --xla=PJRT --test=eval --test=train --suite-name=torchbench --accelerator=cpu --output-dirname=/tmp/output --repeat=1 --print-subprocess --no-resume --dump-pytorch-xla-metrics

Expected behavior

Add a new arg to the CLI --amp. Let user configure it (disable AMP) when run on CPU.

Environment

  • Reproducible on XLA backend [CPU/TPU/CUDA]: CPU
  • torch_xla version: master or 2.7
@haifeng-jin haifeng-jin self-assigned this May 8, 2025
@haifeng-jin haifeng-jin added bug Something isn't working benchmarking labels May 8, 2025
@haifeng-jin
Copy link
Collaborator Author

Need input from @zpcore @ysiraichi before creating a pull request.

@ysiraichi
Copy link
Collaborator

The idea of hard-coding amp was to be directly comparable with PyTorch HUD. Question is: why doesn't it work? Could you post the error you are getting?

@haifeng-jin
Copy link
Collaborator Author

Just got back from my OOO.
Let me run this again and paste the results.

@haifeng-jin
Copy link
Collaborator Author

Here is the stack trace of the error:

Traceback (most recent call last):
  File "/workspaces/torch/pytorch/xla/benchmarks/experiment_runner.py", line 1060, in <module>
    main()
  File "/workspaces/torch/pytorch/xla/benchmarks/experiment_runner.py", line 1056, in main
    runner.run()
  File "/workspaces/torch/pytorch/xla/benchmarks/experiment_runner.py", line 67, in run
    self.run_single_config()
  File "/workspaces/torch/pytorch/xla/benchmarks/experiment_runner.py", line 293, in run_single_config
    model = self.model_loader.load_model(model_config, experiment)
  File "/workspaces/torch/pytorch/xla/benchmarks/benchmark_model.py", line 263, in load_model
    benchmark_model.set_up()
  File "/workspaces/torch/pytorch/xla/benchmarks/torchbench_model.py", line 263, in set_up
    self.autocast, self.autocast_kwargs = self._get_autocast_with_kwargs()
  File "/workspaces/torch/pytorch/xla/benchmarks/torchbench_model.py", line 435, in _get_autocast_with_kwargs
    raise RuntimeError(f"Tried to run {name} with AMP on {accelerator}. "
RuntimeError: Tried to run BERT_pytorch with AMP on cpu. However, AMP is only supported on cuda and tpu.

@haifeng-jin haifeng-jin linked a pull request May 20, 2025 that will close this issue
@ysiraichi
Copy link
Collaborator

Thank you for posting the error. One question, though: why do you want to run it on CPU?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarking bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants