Skip to content

trainer.fit(strategy='ddp') executes code repeatedly #11938

Discussion options

You must be logged in to vote

hey @earendil25!

this is how DDP works exactly. To populate data across devices, DistributedSampler is added to avoid data duplication on each device and the model is wrapped around DistributedDataParallel to sync gradients. The command is launched on each device individually. Alternatively, you can also try DDP_Spawn, which creates spawn processes and won't execute the whole script on each device.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@earendil25
Comment options

Answer selected by earendil25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants