RuntimeError: unable to open shared memory object </torch_91130_1372465664> in read-write mode #8524

EvanZ · 2021-07-22T16:18:09Z

EvanZ
Jul 22, 2021

I'm getting the following error after setting up an EC2 instance p3.8xlarge (so 4 GPUs) and setting gpus=4:

/home/ubuntu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:524: UserWarning: You requested multiple GPUs but did not specify a backend, e.g. `Trainer(accelerator="dp"|"ddp"|"ddp2")`. Setting `accelerator="ddp_spawn"` for you.
  'You requested multiple GPUs but did not specify a backend, e.g.'
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
Traceback (most recent call last):
  File "train.py", line 79, in <module>
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/tuner/tuning.py", line 197, in lr_find
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 688, in tune
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/tuner/tuning.py", line 54, in _tune
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/tuner/lr_finder.py", line 250, in lr_find
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/tuner/tuning.py", line 64, in _run
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 758, in _run
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 799, in dispatch
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 122, in start_training
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 179, in start_processes
  File "/home/ubuntu/anaconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
  File "/home/ubuntu/anaconda3/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
  File "/home/ubuntu/anaconda3/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
  File "/home/ubuntu/anaconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
  File "/home/ubuntu/anaconda3/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
  File "/home/ubuntu/anaconda3/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 328, in reduce_storage
RuntimeError: unable to open shared memory object </torch_91130_1372465664> in read-write mode

My code runs fine on a single gpu instance. Any idea what I need to look at here?

Answered by carmocca

Jul 23, 2021

Some quick googling 🔍
facebookresearch/maskrcnn-benchmark#103

This issue is not Lightning related, so if the fixes mentioned there do not help, then you should try asking on PyTorch discussions.

View full answer

carmocca · 2021-07-23T02:04:07Z

carmocca
Jul 23, 2021

Some quick googling 🔍
facebookresearch/maskrcnn-benchmark#103

This issue is not Lightning related, so if the fixes mentioned there do not help, then you should try asking on PyTorch discussions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RuntimeError: unable to open shared memory object </torch_91130_1372465664> in read-write mode #8524

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

RuntimeError: unable to open shared memory object </torch_91130_1372465664> in read-write mode #8524

Uh oh!

Uh oh!

EvanZ Jul 22, 2021

Replies: 1 comment

Uh oh!

Uh oh!

carmocca Jul 23, 2021

EvanZ
Jul 22, 2021

carmocca
Jul 23, 2021