Configuring NUMA aware DDP training without Slurm #13280

laserkelvin · 2022-06-13T22:57:51Z

laserkelvin
Jun 13, 2022

As the title suggests, I'd like to see if there is a the recommended way to run multinode PyTorch Lightning DDP training, with NUMA binding for CPU processes. Essentially, something like this:

mpirun -n [$NUM_NODES * $PPN] -ppn $PPN -map-by numa python train.py

for $NUM_NODES number of nodes using $PPN number of processes per node. The motivation is to have optimized memory access on nodes with multiple CPU sockets, in a way that takes advantage of PyTorch Lightning's excellent distributed/scalable abstractions.¹ As far as I understand, there are several ways to launch this but none of them feel like a perfect fit - I'd be happy to develop my own strategy/plugin, but thought it's best to see if others (particularly those with better understanding) have opinions on the matter.

Right now, my understanding is that this is a corner case that neither SLURMEnvironment nor TorchElasticEnvironment captures: while the former allows process binding by Slurm, Slurm is also a requirement and not always used (for example, a free cluster); the latter can be used in a wide variety of environments, however doesn't provide the option to do NUMA-aware process binding.

Is there a simpler solution that I've overlooked, or does this use case require developing a separate environment class somewhere in between LightningEnvironment, SLURMEnvironment, and TorchElasticEnvironment?

You can configure your script so that you can use mpirun, but you miss out on pl.Trainer. ↩

robogast · 2022-09-07T09:23:31Z

robogast
Sep 7, 2022

Hi @laserkelvin, I've looked a bit into this issue (torchCCL doesn't work nicely without mpirun or srun ... --mpi=pmi2 ...)
My conclusion is that it should definitely be possible to simply create another plugin (i.e. a MPIEnvironment inherited from pytorch_lightning.plugins.environments.cluster_environment.ClusterEnvironment), which you pass to the trainer:

pl.Trainer(..., plugins=[..., MPIEnvironment()])

These LOC's then capture the MPIEnvironment:
https://github.com/Lightning-AI/lightning/blob/8095e2452d519167944e9924819b43710f45b1ad/src/pytorch_lightning/trainer/connectors/accelerator_connector.py#L563-L565

If you then call your script with mpirun, the MPIEnvironment should provide the associated env variables (PMI_RANK, PMI_SIZE, ...?), and all is well.
I've manually edited the SLURMEnvironment to return these PMI variables, and my code now runs with mpirun :).

A few snags before we can fully switch to mpirun however:
We need to know how to fill-in the ClusterEnvironment ABC functions without integration with SLURM (or by also supporting different mpirun backends...):

local_rank / node_rank ? -> does MPI provide this for you?
main_adress / main_port? I'm not familiar enough with MPI to know if mpirun creates these variable for you...
set_world_size, set_global_rank, creates_process_externally -> pass, pass, True, just like SLURMEnvironment does.
world_size, global_rank -> PMI_SIZE, PMI_RANK

I hope this points you in the right direction :)
Implementing the plugin is relatively low on my todo-list so I'll probably not get to it anytime soon, but just send me a message if I can help with something.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Configuring NUMA aware DDP training without Slurm #13280

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Configuring NUMA aware DDP training without Slurm #13280

Uh oh!

Uh oh!

laserkelvin Jun 13, 2022

Footnotes

Replies: 1 comment

Uh oh!

Uh oh!

robogast Sep 7, 2022

laserkelvin
Jun 13, 2022

robogast
Sep 7, 2022