Why ddp is better than ddp_spawn #12253
Unanswered
binzhougithub
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment
-
Same question. Does anyone have any ideas about this? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am using ddp_spawn strategy now, it seems that ddp is better, because I found this page, see below
========
We use DDP this way because ddp_spawn has a few limitations (due to Python and PyTorch):
Since .spawn() trains the model in subprocesses, the model on the main process does not get updated.
Dataloader(num_workers=N), where N is large, bottlenecks training with DDP… ie: it will be VERY slow or won’t work at all. This is a PyTorch limitation.
=========
I didn't get these two points, for example, the first argument is about model on main process, my train/validation steps are in subprocesses, therefore, the model on the main process won't bother me. And in 2nd argument, "This is a PyTorch limitation", why doesn't "DDP" strategy have the same issue?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions