DDP, Out of memory, How should I set batch size? #12693

di0002ya · 2022-04-10T14:35:00Z

di0002ya
Apr 10, 2022

I use single gpu with batch size 16. Model works. However, when I use 8 gpu and set batch size 64, training the model with ddp strategy. Process returned out of memory. May I know how should I set a good batch size?

Actually, 8gpu includes 96 GiB memory. Why the error shows total 11.17GiB?

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.17 GiB total capacity; 7.57 GiB already allocated; 18.25 MiB free; 7.74 GiB reserved in total by PyTorch)

semaphore-egg · 2022-04-10T15:29:00Z

semaphore-egg
Apr 10, 2022

The information is much more clear in the following format:

GPU 0
 11.17 GiB total capacity
 7.57 GiB already allocated
 18.25 MiB free
 7.74 GiB reserved in total by PyTorch

So the exception is thrown when trying to allocate more memory on GPU 0.

2 replies

di0002ya Apr 10, 2022
Author

Thanks for your info! Is there anyway to solve the imbalanced memory usage? I have 8 gpus. Why all the memory is consumed by GPU 0?

semaphore-egg Apr 10, 2022

Well, for the multi-device, multi-process cases, the exceptions might be raised when the first device/process crashes. So it is actually the GPU 0 that first reports the GPU memory problem but it might not be the only one.

The memory consumptions are expected to be similar on each GPU card. If batch size 16 is ok for a single GPU, then batch size 64 shall be fine with 8 GPUs.

Maybe you should check with 'nvidia-smi' before launching another task to see if other processes (previously dead processes/other people's processes) are holding memory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DDP, Out of memory, How should I set batch size? #12693

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

DDP, Out of memory, How should I set batch size? #12693

Uh oh!

di0002ya Apr 10, 2022

Replies: 1 comment · 2 replies

Uh oh!

semaphore-egg Apr 10, 2022

Uh oh!

di0002ya Apr 10, 2022 Author

Uh oh!

Uh oh!

semaphore-egg Apr 10, 2022

di0002ya
Apr 10, 2022

Replies: 1 comment 2 replies

semaphore-egg
Apr 10, 2022

di0002ya Apr 10, 2022
Author