Configuring NUMA aware DDP training without Slurm #13280
Replies: 1 comment
-
Hi @laserkelvin, I've looked a bit into this issue ( pl.Trainer(..., plugins=[..., MPIEnvironment()]) These LOC's then capture the If you then call your script with A few snags before we can fully switch to
I hope this points you in the right direction :) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
As the title suggests, I'd like to see if there is a the recommended way to run multinode PyTorch Lightning DDP training, with NUMA binding for CPU processes. Essentially, something like this:
for
$NUM_NODES
number of nodes using$PPN
number of processes per node. The motivation is to have optimized memory access on nodes with multiple CPU sockets, in a way that takes advantage of PyTorch Lightning's excellent distributed/scalable abstractions.1 As far as I understand, there are several ways to launch this but none of them feel like a perfect fit - I'd be happy to develop my own strategy/plugin, but thought it's best to see if others (particularly those with better understanding) have opinions on the matter.Right now, my understanding is that this is a corner case that neither
SLURMEnvironment
norTorchElasticEnvironment
captures: while the former allows process binding by Slurm, Slurm is also a requirement and not always used (for example, a free cluster); the latter can be used in a wide variety of environments, however doesn't provide the option to do NUMA-aware process binding.Is there a simpler solution that I've overlooked, or does this use case require developing a separate environment class somewhere in between
LightningEnvironment
,SLURMEnvironment
, andTorchElasticEnvironment
?Footnotes
You can configure your script so that you can use
mpirun
, but you miss out onpl.Trainer
. ↩Beta Was this translation helpful? Give feedback.
All reactions