sync distributed metric error #12347
Unanswered
rogertrullo
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I am using lightning with multiple GPUs (DDP).
I am using
self.log(sync_dist=True, on_epoch=True, prog_bar=True, logger=True)
in thevalidation_step
function.I was trying to replicate the logged value, and I have found some issues.
To replicate the values I am doing this in the
validation_epoch_end
:It turns out this only works when the dataloader has all batches of the same size , escentially it works fine if I put in the dataloader
drop_last=True
Is this something expected or is it a bug?
For completeness here is the boring model modified a little bit:
Beta Was this translation helpful? Give feedback.
All reactions