Commit f1f2449
Fix OSError: [Errno 24] Too many open files in multi-copy benchmark (pytorch#5083)
Summary:
Pull Request resolved: pytorch#5083
X-link: https://github.com/facebookresearch/FBGEMM/pull/2089
When running benchmarks with a large number of copies, the process may raise:
OSError: [Errno 24] Too many open files.
Example command:
(fbgemm_gpu_env)$ ulimit -n 1048576
(fbgemm_gpu_env)$ python ./bench/tbe/tbe_inference_benchmark.py nbit-cpu \
--num-embeddings=40000000 --bag-size=2 --embedding-dim=96 \
--batch-size=162 --num-tables=8 --weights-precision=int4 \
--output-dtype=fp32 --copies=96 --iters=30000
PyTorch multiprocessing provides two shared-memory strategies: 1.file_descriptor (default)
2.file_system
The default file_descriptor strategy uses file descriptors as shared memory handles, which can result in a large number of open FDs when many tensors are shared.
If the total number of open FDs exceeds the system limit and cannot be raised, the file_system strategy should be used instead.
This patch allows switching to the file_system strategy by setting:
export PYTORCH_SHARE_STRATEGY='file_system'
Reference:
https://pytorch.org/docs/stable/multiprocessing.html#sharing-strategies
Pull Request resolved: pytorch#5037
Reviewed By: spcyppt
Differential Revision: D86135817
Pulled By: q10
fbshipit-source-id: 15f6fe7e1de5e9fef828f5a1496dc1cf9b41c2931 parent 0baae82 commit f1f2449
1 file changed
+7
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
153 | 153 | | |
154 | 154 | | |
155 | 155 | | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
156 | 163 | | |
157 | 164 | | |
158 | 165 | | |
| |||
0 commit comments