Memory use different on both gpus using ddp #11692

Unanswered

surya-narayanan asked this question in DDP / multi-GPU / multi-node

surya-narayanan
Feb 1, 2022

I have noticed that there is often a discrepancy between the memory allocated and use in multi-gpu training (allocated shown above, use shown below). Why is that? When the gradients are synced across gpus in step four of this checklist (https://pytorch-lightning.readthedocs.io/en/stable/advanced/multi_gpu.html#distributed-data-parallel), do they use one gpu more than another?

Thanks for your time.

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment