Skip to content

Lightning in Jupyter notebook server on SLURM #16846

Discussion options

You must be logged in to vote

I figured it out. In case anyone else is attempting to follow me into this soul-sucking morass of frustration here's how you solve it.

If you're attempting to run a jupyter notebook server on a slurm-provisioned instance and use lightning with strategy ddp_notebook:

  • in sbatch, salloc, or srun
    • nodes=1, I'm not sure it's possible to run a jupyter notebook server distributed over multiple nodes, so this is 1
    • cpus-per-task=<total number of cpus desired>, pytorch dataloader will only be able to see this many processors, not this value times the number of tasks. So, if you have 4 tasks and 5 cpus-per-task, pytorch will only utilize 5 cpus
    • ntasks-per-node=1, setting this to 1 doesn't seem to a…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by devtronslab
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant