DDP, Out of memory, How should I set batch size? #12693
Unanswered
di0002ya
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment 2 replies
-
The information is much more clear in the following format:
So the exception is thrown when trying to allocate more memory on |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I use single gpu with batch size 16. Model works. However, when I use 8 gpu and set batch size 64, training the model with ddp strategy. Process returned out of memory. May I know how should I set a good batch size?
Actually, 8gpu includes 96 GiB memory. Why the error shows total 11.17GiB?
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.17 GiB total capacity; 7.57 GiB already allocated; 18.25 MiB free; 7.74 GiB reserved in total by PyTorch)
Beta Was this translation helpful? Give feedback.
All reactions