You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Perform the optimization step and trigger the execution of the
186
+
# accumulated XLA operations on the device for this process.
132
187
xm.optimizer_step(optimizer)
133
188
134
189
if__name__=='__main__':
190
+
# Launch the multi-device training.
135
191
torch_xla.launch(_mp_fn, args=())
136
192
```
137
193
@@ -141,7 +197,7 @@ single device snippet. Let's go over then one by one.
141
197
-`torch_xla.launch()`
142
198
- Creates the processes that each run an XLA device.
143
199
- This function is a wrapper of multithreading spawn to allow user run the script with torchrun command line also. Each process will only be able to access the device assigned to the current process. For example on a TPU v4-8, there will be 4 processes being spawn up and each process will own a TPU device.
144
-
- Note that if you print the `xm.xla_device()` on each process you will see `xla:0` on all devices. This is because each process can only see one device. This does not mean multi-process is not functioning. The only execution is with PJRT runtime on TPU v2 and TPU v3 since there will be `#devices/2` processes and each process will have 2 threads(check this [doc](https://github.com/pytorch/xla/blob/master/docs/pjrt.md#tpus-v2v3-vs-v4) for more details).
200
+
- Note that if you print the `xm.xla_device()` on each process you will see `xla:0` on all devices. This is because each process can only see one device. This does not mean multi-process is not functioning. The only exeption is with PJRT runtime on TPU v2 and TPU v3 since there will be `#devices/2` processes and each process will have 2 threads(check this [doc](https://github.com/pytorch/xla/blob/master/docs/pjrt.md#tpus-v2v3-vs-v4) for more details).
145
201
-`MpDeviceLoader`
146
202
- Loads the training data onto each device.
147
203
-`MpDeviceLoader` can wrap on a torch dataloader. It can preload the data to the device and overlap the dataloading with device execution to improve the performance.
0 commit comments