Run Trainer.fit multiple times under DDP mode #12401
Unanswered
xmlyqing00
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment 3 replies
-
can you try it with |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I have a machine learning architecture project that requires modifying the network structure multiple times. I used PytorchLigtning codes to implement it. The overall structure is as followed.
The model definition, I ignore the
training_step
, 'validation_step' for clearly demonstration.The following main script shows that I want to update the network structure and retrain the model in 10 iterations.
When
iter == 1
, the model has been propagated into different GPU, and the model.add() results in different models. So I add a flag to make sure the modification is happened in the main process byBut this time, the program get stuck when
iter == 1
. My questions are:Thanks for your time. Any comments or suggestions are welcome.
Beta Was this translation helpful? Give feedback.
All reactions