Description
🚀 The feature, motivation and pitch
AttributeError: module 'torch.xpu' has no attribute '_sleep', need support _sleep for XPU device.
reproduce step:
pytest -vs _composable/fsdp/test_fully_shard_training.py -k test_non_root_forward_backward
error message:
Traceback (most recent call last):
File "/home/sdp/penghuic/pytorch/torch/testing/_internal/common_distributed.py", line 643, in wrapper
self._join_processes(fn)
File "/home/sdp/penghuic/pytorch/torch/testing/_internal/common_distributed.py", line 907, in _join_processes
self._check_return_codes(fn, elapsed_time)
File "/home/sdp/penghuic/pytorch/torch/testing/_internal/common_distributed.py", line 947, in _check_return_codes
raise RuntimeError(error)
RuntimeError: Process 3 exited with error code 10 and exception:
Traceback (most recent call last):
File "/home/sdp/penghuic/pytorch/torch/testing/_internal/common_distributed.py", line 791, in run_test
getattr(self, test_name)()
File "/home/sdp/penghuic/pytorch/torch/testing/_internal/common_distributed.py", line 645, in wrapper
fn()
File "/home/sdp/penghuic/pytorch/torch/testing/_internal/common_utils.py", line 3148, in wrapper
method(*args, **kwargs)
File "/home/sdp/penghuic/pytorch/torch/testing/_internal/common_distributed.py", line 205, in wrapper
return func(*args, **kwargs)
File "/home/sdp/penghuic/pytorch/test/distributed/_composable/fsdp/test_fully_shard_training.py", line 521, in test_non_root_forward_backward
torch.get_device_module(device_type)._sleep(int(100 * get_cycles_per_ms()))
AttributeError: module 'torch.xpu' has no attribute '_sleep'
To execute this test, run the following from the base repo dir:
python test/distributed/_composable/fsdp/test_fully_shard_training.py TestFullyShard1DTrainingCore.test_non_root_forward_backward
This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
version:
Alternatives
No response
Additional context
No response