how can i training_step on gpu(ddp), validation_step on cpu? #15742
Unanswered
YooSungHyun
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
my data is too big, so i can train_batch=1 on my gpu(ddp) but, validation_step is explode cuda memory oom.
so i want to run my
logits = self(input_data)
on cpu in validation_step.i use torchmetrics kwargs
compute_on_cpu=true
andmove_metrics_to_cpu=true
but, in my training_step. self.log("train_loss", loss, sync_dist=True) got error
Tensor must cuda blah blah
something like that.how can i solve my problem?
Beta Was this translation helpful? Give feedback.
All reactions