How to correctly set the T_max variable (maximum number of iterations) for the CosineAnnealingLR scheduler in DDP training #17307
Unanswered
liutianlin0121
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there! I have a question regarding how to correctly set the
T_max
variable (maximum number of iterations) for theCosineAnnealingLR
scheduler in DDP training.Suppose I am only using 1 GPU and wish to anneal the learning rate per each batch. In that case, I would simply set
T_max
tomax_epochs * len(train_loader)
, wherelen(train_loader)
is the number of batches in my dataset.Now, let's consider the scenario where I am using DDP training with 2 GPUs. Since each GPU has visibility into only half of the dataset, the number of batches for each GPU is effectively halved. In that case, to achieve consistent behavior, should I set
T_max
tomax_epochs * len(train_loader) / 2
in the CosineAnnealingLR learning rate scheduler?Thank you!
Beta Was this translation helpful? Give feedback.
All reactions