Gradient Accumulation & Optax MultiSteps #2008

agemagician · 2022-03-22T10:59:35Z

agemagician
Mar 22, 2022

Hello,

I am testing "optax.MultiSteps" for gradient accumulation on Colab TPU V2, but every time I use anything above 1, I get an OOM.

It seems that it increases the memory requirements equivalently to increase the batch size, which should not be the case.

My understanding is that all I need to do is to use it, is to call it after the optimizer and increase the training batch size by multiplying it by the gradient accumulation steps.
My understanding is also that I should be able to increase the batch size by using gradient accumulation.
In my case, if the per_device_train_batch_size is 8, then I can only set gradient_accumulation_steps to 1, and if the per_device_train_batch_size is 2, then I can then increase the gradient_accumulation_steps to 4.

 train_batch_size = (
        int(training_args.per_device_train_batch_size)
        * jax.device_count()
        * training_args.gradient_accumulation_steps
    )
optimizer = optax.adamw(
            learning_rate=learning_schedule_fn,
            b1=training_args.optim_beta1,
            b2=training_args.optim_beta2,
            weight_decay=training_args.optim_weight_decay,
            mask=decay_mask_fn,
        )

if training_args.gradient_accumulation_steps > 1:
    optimizer = optax.MultiSteps(
        optimizer, training_args.gradient_accumulation_steps
    )

The rest of the code should be the same.

Is my understanding is correct or there is something else we should take care of while using the MultiSteps function?

I have posted this question on Optax repo, but it will be great to hear your feedback.
Feel free to share another library or a method to apply gradient accumulation efficiently and easily.

Answered by agemagician

Mar 24, 2022

Thanks. The optax team found the problem.

View full answer

agemagician · 2022-03-24T22:15:54Z

agemagician
Mar 24, 2022
Author

Thanks. The optax team found the problem.

2 replies

andsteing Mar 25, 2022
Maintainer

Great! Could you add a link to the discussion/solution for further reference?

agemagician Mar 25, 2022
Author

Great! Could you add a link to the discussion/solution for further reference?

Sure, the problem and the solution are here:
google-deepmind/optax#320

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient Accumulation & Optax MultiSteps #2008

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Gradient Accumulation & Optax MultiSteps #2008

agemagician Mar 22, 2022

Replies: 1 comment · 2 replies

agemagician Mar 24, 2022 Author

andsteing Mar 25, 2022 Maintainer

agemagician Mar 25, 2022 Author

agemagician
Mar 22, 2022

Replies: 1 comment 2 replies

agemagician
Mar 24, 2022
Author

andsteing Mar 25, 2022
Maintainer

agemagician Mar 25, 2022
Author