Behaviour of accumulate_gradients and multi-gpu #5796
Unanswered
RaivoKoot
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment
-
According to https://github.com/Lightning-AI/lightning/blob/master/docs/source-pytorch/common/gradient_accumulation.rst the answer is possibility 2. If you have If you want to maintain a certain effective batch size |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Training setup:
2 GPUs on a single machine running in DDP mode. If I use a
batch size of 16
andaccumulate gradients=2
, how does lightning handle this?Possibility 1:
or
Possibility 2
Which of the two ways does lightning handle this under DDP? I am asking because in the first scenario the effective batch size is 32 and in the second scenario the effective batch size is 64.
Beta Was this translation helpful? Give feedback.
All reactions