ddp_sharded crash during model save #13951
-
I am trying to train a big styleGAN model on 4 v100, I used the
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
this seems to be consolidating state dicts from all the ranks on rank 0, causing memory issues. If you just need the model's weights to be saved in the checkpoint, you can set Note that if you do that, you will not be able to resume the training using such a checkpoint. |
Beta Was this translation helpful? Give feedback.
-
Hi, |
Beta Was this translation helpful? Give feedback.
-
Ok, DeepSpeed with one optimizer is working quite well with only little changes in the code. I still have an issue, when training on 1 node with 4 GPUs everything works fine. When switching to 2 nodes with 4 GPUs per node, I get CUDA out of memory which is interesting? If anyone has the same problem with 2 optimizers, I just did the simple solution to have 1 optimizer and set either the Generator or the Discriminator
|
Beta Was this translation helpful? Give feedback.
Ok, DeepSpeed with one optimizer is working quite well with only little changes in the code.
I still have an issue, when training on 1 node with 4 GPUs everything works fine. When switching to 2 nodes with 4 GPUs per node, I get CUDA out of memory which is interesting?
If anyone has the same problem with 2 optimizers, I just did the simple solution to have 1 optimizer and set either the Generator or the Discriminator
requires_grad
to True or False.The other limitation of DeepSpeed is that, I think, you can only do one
.step()
per training step on the optimizer.