Training stuck at the begining #8321
Unanswered
MendelXu
asked this question in
DDP / multi-GPU / multi-node
Replies: 2 comments 7 replies
-
could you please provide some sample code to reproduce it? |
Beta Was this translation helpful? Give feedback.
6 replies
-
@MendelXu @stonelazy I am not sure how the issue arises in your specific cases, but maybe place FYI, here's the general guide for debugging: https://pytorch-lightning.readthedocs.io/en/1.7.7/debug/debugging.html |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
When I use 2 GPUs, My training process is stuck at the beginning of the first epoch and even I am not able to kill it with ctrl+c. However, if I change it to 1 gpu or 4 gpu, it works fine. As there is no error information, how can I debug it and find the problem?
Beta Was this translation helpful? Give feedback.
All reactions