What is the best practice to share a massive CPU tensor over multiple processes in pytorch-lightning DDP mode (read-only + single machine)? #8611

siahuat0727 · 2021-07-29T02:56:38Z

siahuat0727
Jul 29, 2021

Hi everyone, I wonder what is the best practice to share a massive CPU tensor over multiple processes in pytorch-lightning DDP mode (read-only + single machine)?

I think torch.Storage.from_file with share=True may suit my needs, but I can’t find a way to save storage and read it as a tensor. (see here for details)

I also tried to copy training data to /dev/shm (reference) and run DDP with 8 GPUs, but nothing is different. The memory usage when running with 8 GPUs is the same as before, but I tested with a single process, loading the dataset may occupy more than 1 GB of memory. Am I missing something here?

For torch.shared_memory, how should I pass the same reference to all processes in pytorch-lightning pure DDP mode?

Thank you.

Answered by siahuat0727

Aug 4, 2021

I found that torch.Storage.from_file suits my needs and it can reduce the memory usage in my Lightning DDP program.
For the way to create a storage file, see here.

View full answer

tchaton · 2021-07-29T09:59:21Z

tchaton
Jul 29, 2021
Maintainer

Dear @siahuat0727,

Lightning doesn't support shared tensors yet, but there is some work being done around it.
You can track this issue: #8230 and help if you feel like :)

Best,
TC

1 reply

siahuat0727 Jul 30, 2021
Author

Got it! Thank you :)

siahuat0727 · 2021-08-04T08:51:10Z

siahuat0727
Aug 4, 2021
Author

I found that torch.Storage.from_file suits my needs and it can reduce the memory usage in my Lightning DDP program.
For the way to create a storage file, see here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What is the best practice to share a massive CPU tensor over multiple processes in pytorch-lightning DDP mode (read-only + single machine)? #8611

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

What is the best practice to share a massive CPU tensor over multiple processes in pytorch-lightning DDP mode (read-only + single machine)? #8611

Uh oh!

siahuat0727 Jul 29, 2021

Replies: 2 comments · 1 reply

Uh oh!

tchaton Jul 29, 2021 Maintainer

Uh oh!

siahuat0727 Jul 30, 2021 Author

Uh oh!

siahuat0727 Aug 4, 2021 Author

siahuat0727
Jul 29, 2021

Replies: 2 comments 1 reply

tchaton
Jul 29, 2021
Maintainer

siahuat0727 Jul 30, 2021
Author

siahuat0727
Aug 4, 2021
Author