Wandb-dependent Model Checkpoint #13504
Unanswered
kelvins64
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to set my model checkpoint path based on the Wandb experiment while using DDP. However, we only have access to the actual Wandb experiment in rank 0 (see this discussion). Therefore, trying to base the path on
logger.experiment
will fail as it will be aDummyExperiment
:On the other hand, DDP hangs if I try to detect whether the process is rank zero, and add the ModelCheckpoint to callbacks (or simply change its dirpath) based on it:
What is the correct way to build the ModelCheckpoint directory path based on the Wandb experiment?
Full example with BoringModel
Beta Was this translation helpful? Give feedback.
All reactions