Skip to content

Distributed training with multiple optimizers #10241

Discussion options

You must be logged in to vote

My query is, for distributed training with multiple optimizers, will the above code work in the INTENDED way?

Your code looks good to me.

What should training_step_end function contain then?

No need to do anything if you don't need to run anything at the end of training_step.
https://pytorch-lightning.readthedocs.io/en/1.6.5/common/lightning_module.html#training-step-end

how does multiple optimizers update across different devices?

DDP syncs gradients across different devices overlapping backprop, and each device updates the weights with gradients synced across devices. See the PyTorch documentation for details: https://pytorch.org/docs/1.12/notes/ddp.html

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by akihironitta
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed Generic distributed-related topic optimization
2 participants