Scheduler Issue in PPO/GRPO implementation #537

ashish230897 · 2025-01-31T17:36:41Z

Hi,

This piece of code in both ppo/grpo codebase seems incorrect (for args.num_samples_per_prompt > 1):

args.num_training_steps = args.total_episodes // (args.rollout_batch_size * args.number_samples_per_prompt)
num_training_steps = args.num_training_steps * args.num_train_epochs * args.num_epochs
warm_up_steps = args.warm_up_steps
if args.warmup_ratio >= 0.0:
    warm_up_steps = int(num_training_steps * args.warmup_ratio)
scheduler = get_scheduler(
        args.lr_scheduler_type,
        optimizer=self.optimizer,
        num_warmup_steps=warm_up_steps,
        num_training_steps=num_training_steps,
    )

When args.number_samples_per_prompt > 1, the learning rate will reduce to 0 faster if we divide by args.number_samples_per_prompt.
For args.number_samples_per_prompt > 1, there are multiple updates taking place in a single training step; so for every model.step() happening inside a training step, the learning rate decreases and eventually becomes 0 earlier than it should be.
The above num_training_steps will work fine when num_samples_per_prompt is 1.

Also, the args.num_train_epochs seems redundant in the code.

To fix this, we can edit the code as:

num_scheduler_steps = args.num_training_steps * args.num_epochs * args.number_samples_per_prompt
warm_up_steps = args.warm_up_steps
if args.warmup_ratio >= 0.0:
    warm_up_steps = int(num_scheduler_steps * args.warmup_ratio)
scheduler = get_scheduler(
        args.lr_scheduler_type,
        optimizer=self.optimizer,
        num_warmup_steps=warm_up_steps,
        num_training_steps=num_scheduler_steps,
    )

The text was updated successfully, but these errors were encountered:

vwxyzjn mentioned this issue Feb 11, 2025

Fix scheduler and dataset shuffling #560

Merged

vwxyzjn closed this as completed in #560 Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler Issue in PPO/GRPO implementation #537

Scheduler Issue in PPO/GRPO implementation #537

ashish230897 commented Jan 31, 2025

Scheduler Issue in PPO/GRPO implementation #537

Scheduler Issue in PPO/GRPO implementation #537

Comments

ashish230897 commented Jan 31, 2025