Behaviour of accumulate_gradients and multi-gpu #5796

RaivoKoot · 2021-01-25T22:46:57Z

RaivoKoot
Jan 25, 2021

Training setup:
2 GPUs on a single machine running in DDP mode. If I use a batch size of 16 and accumulate gradients=2, how does lightning handle this?

Possibility 1:

GPU1 processes one batch of size 16.
GPU2 processes one batch of size 16.

average gradients from GPU1 and GPU2 and apply weight update.

or

Possibility 2

GPU1 processes one batch of size 16.
GPU1 processes another batch of size 16
GPU1 averages the gradients of the two batches.

GPU2 processes one batch of size 16.
GPU2 processes another batch of size 16
GPU2 averages the gradients of the two batches.

average the averaged gradients from GPU1 and GPU2 and apply weight update

Which of the two ways does lightning handle this under DDP? I am asking because in the first scenario the effective batch size is 32 and in the second scenario the effective batch size is 64.

mnoukhov · 2022-08-04T20:14:27Z

mnoukhov
Aug 4, 2022

According to https://github.com/Lightning-AI/lightning/blob/master/docs/source-pytorch/common/gradient_accumulation.rst the answer is possibility 2.

If you have batch_size=n and accumulate_grad_batches=k and you have a multi-gpu setup of p devices, then each of the p devices will have an effective batch size n*k for a total effective batch size p*n*k

If you want to maintain a certain effective batch size n*k and parallelize it across p devices, then before initializing the trainer you should make accumulate_grad_batches = k / p and devices=p

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Behaviour of accumulate_gradients and multi-gpu #5796

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Behaviour of accumulate_gradients and multi-gpu #5796

Uh oh!

RaivoKoot Jan 25, 2021

Replies: 1 comment

Uh oh!

Uh oh!

mnoukhov Aug 4, 2022

RaivoKoot
Jan 25, 2021

mnoukhov
Aug 4, 2022