Skip to content

ddp_sharded crash during model save #13951

Discussion options

You must be logged in to vote

Ok, DeepSpeed with one optimizer is working quite well with only little changes in the code.

I still have an issue, when training on 1 node with 4 GPUs everything works fine. When switching to 2 nodes with 4 GPUs per node, I get CUDA out of memory which is interesting?

If anyone has the same problem with 2 optimizers, I just did the simple solution to have 1 optimizer and set either the Generator or the Discriminator requires_grad to True or False.
The other limitation of DeepSpeed is that, I think, you can only do one .step() per training step on the optimizer.

	def configure_optimizers(self):
		optimizer = torch.optim.Adam(
			[
				{'params': self.generators.parameters(), 'lr': self.lr…

Replies: 3 comments 4 replies

Comment options

You must be logged in to vote
4 replies
@MaugrimEP
Comment options

@MaugrimEP
Comment options

@rohitgr7
Comment options

@MaugrimEP
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by MaugrimEP
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants