Multiple 400MB processes on single GPU #20114
Unanswered
changspencer
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone!
I had a question for something I've been wondering about for DP/DDP behavior. Occasionally, for my runs on a SLURM cluster, I see multiple small 400 MB processes get placed on a single GPU. I assume this GPU is something like the "master" process, but I've no idea why the small processes are necessary or if they don't get cleaned up after a method finishes running.
What could be the reason I see the processes show up on a single GPU during training (although all training processes have been started)? Could this be a SLURM resource management problem or a (personal) programming problem?
Some quick notes:
I can try to provide more details, but I wanted to see if anyone else has experienced the same situation for - possibly - different use cases.
Beta Was this translation helpful? Give feedback.
All reactions