How to correctly set the T_max variable (maximum number of iterations) for the CosineAnnealingLR scheduler in DDP training #17307

liutianlin0121 · 2023-04-08T17:51:15Z

liutianlin0121
Apr 8, 2023

Hi there! I have a question regarding how to correctly set the T_max variable (maximum number of iterations) for the CosineAnnealingLR scheduler in DDP training.

Suppose I am only using 1 GPU and wish to anneal the learning rate per each batch. In that case, I would simply set T_max to max_epochs * len(train_loader), where len(train_loader) is the number of batches in my dataset.

Now, let's consider the scenario where I am using DDP training with 2 GPUs. Since each GPU has visibility into only half of the dataset, the number of batches for each GPU is effectively halved. In that case, to achieve consistent behavior, should I set T_max to max_epochs * len(train_loader) / 2 in the CosineAnnealingLR learning rate scheduler?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to correctly set the T_max variable (maximum number of iterations) for the CosineAnnealingLR scheduler in DDP training #17307

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to correctly set the T_max variable (maximum number of iterations) for the CosineAnnealingLR scheduler in DDP training #17307

Uh oh!

liutianlin0121 Apr 8, 2023

Replies: 0 comments

liutianlin0121
Apr 8, 2023