-
Notifications
You must be signed in to change notification settings - Fork 226
Warn when multiprocessing start method is 'fork' #1309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/ok to test d1cce9a |
|
/ok to test 3cd8f8a |
@Andy-Jost, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
d1cce9a to
3cd8f8a
Compare
|
/ok to test fcbb370 |
3cd8f8a to
fcbb370
Compare
|
/ok to test 3d3499a |
fcbb370 to
1144c8f
Compare
@Andy-Jost, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
|
/ok to test 1144c8f |
This comment has been minimized.
This comment has been minimized.
1144c8f to
3d3499a
Compare
|
/ok to test 3d3499a |
| global _fork_warning_emitted | ||
| if _fork_warning_emitted: | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of caching it, would it be better to always call multiprocessing.get_start_method() and check it? I worry that it is a global state that we don't own and it could be changed at arbitrary point in time by the user or any package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the official multiprocessing library docs, the description of multiprocessing.set_start_method() explains that:
- The start method is a global setting that can only be chosen a single time in a given Python process.
- Calling
set_start_method()again without forcing will raise aRuntimeErrorindicating that the context has already been set. - Recent CPython implementations mention an internal/undocumented way to override this with a
forceargument, but this is not part of the public, documented API and should be avoided in normal code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Andy. Good to know Python by default checks this. However, I do see the force argument being documented, so it is part of the public API. And it does allow overwriting:
>>> import multiprocessing
>>> multiprocessing.set_start_method("spawn")
>>> multiprocessing.set_start_method("spawn")
Traceback (most recent call last):
File "<python-input-2>", line 1, in <module>
multiprocessing.set_start_method("spawn")
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
File "/local/home/leof/miniforge3/envs/py313_cu130/lib/python3.13/multiprocessing/context.py", line 247, in set_start_method
raise RuntimeError('context has already been set')
RuntimeError: context has already been set
>>> multiprocessing.set_start_method("fork", force=True)
>>> multiprocessing.set_start_method("spawn", force=True)
>>> multiprocessing.set_start_method("forkserver", force=True)
>>> If this is not on the hot path, I think not caching the result and always checking is safer.
Alternatively, we could mention in the warning message that setting set_start_method(..., force=True) is dangerous because we cannot help capture issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing, if we already warned the next warn() call is a no-op so adding the _fork_warning_emitted guard is redundant 😆
import warnings
def f():
warnings.warn("oh no", UserWarning, stacklevel=3)
f() # only this call would generate a warning
f()
f()There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, you want to limit the warning to only once per process lifetime, regardless of which object raises the warning from. NVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On minor detail -- since set_start_method can only be set once, I don't think we need to check it multiple times. In other words, we can set _fork_warning_emitted = True unconditionally rather than only when a warning is emitted so we don't check again. And maybe renaming the flag to _fork_warning_checked for clarity. Unless I'm missing some case where it might be set exceptionally late and it does need to be checked multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments. I've made adjustments to the name and conditionality as Mike suggested. I left this as a one-time check because there does not seem to be a strong consensus, but I don't feel strongly about that and wouldn't mind changing it if more discussion goes in that direction.
Incidentally, a separate warning is issued when fork is called in multithreaded programs (like those using CUDA), so if a user somehow sidesteps this warning by setting the start method multiple times, they will still get another nasty message indicating their program is invalid.
CUDA does not support the fork() system call. Forked subprocesses exhibit undefined behavior, including failure to initialize CUDA contexts and devices. Add warning checks in multiprocessing reduction functions for IPC objects (DeviceMemoryResource, IPCAllocationHandle, Event) that warn when the start method is 'fork'. The warning is emitted once per process when IPC objects are serialized. Fixes NVIDIA#1136
|
/ok to test 3d3499a |
3d3499a to
9fd6b19
Compare
|
/ok to test 9fd6b19 |
|
The previous test methodology of starting a subprocess did not work on the test runners. The latest upload avoids using a subprocess, instead mocking up key methods. |
Change mempool_device to ipc_device fixture for tests that require IPC-enabled memory resources. The ipc_device fixture properly skips on Windows where IPC is not supported.
|
/ok to test 476459f |
…t_method - Add reset_fork_warning() function for testing purposes - Rename _check_multiprocessing_start_method to check_multiprocessing_start_method (remove leading underscore) - Update all tests to use reset_fork_warning() instead of directly accessing internal flag - Fix trailing whitespace
|
/ok to test fbdd56d |
|
/ok to test 3271964 |
|
Summary
CUDA does not support the fork() system call. Forked subprocesses exhibit undefined behavior, including failure to initialize CUDA contexts and devices. This PR adds warnings when IPC objects are serialized with the fork multiprocessing start method.
Changes
_check_multiprocessing_start_method()function incuda_utils.pyxthat checks if the multiprocessing start method is 'fork' and emits a one-time warning_reduce_allocation_handlein_ipc.pyx_deep_reduce_device_memory_resourcein_ipc.pyx_reduce_eventin_event.pyxTest Coverage
Added comprehensive test suite
test_multiprocessing_warning.pywith 5 tests:Tests run in subprocesses to avoid interference from conftest.py's session fixture that sets spawn method.
Related Work
Fixes #1136