Warn when multiprocessing start method is 'fork' #1309

Andy-Jost · 2025-12-03T21:28:27Z

Summary

CUDA does not support the fork() system call. Forked subprocesses exhibit undefined behavior, including failure to initialize CUDA contexts and devices. This PR adds warnings when IPC objects are serialized with the fork multiprocessing start method.

Changes

Added _check_multiprocessing_start_method() function in cuda_utils.pyx that checks if the multiprocessing start method is 'fork' and emits a one-time warning
Updated multiprocessing reduction functions to call the warning check:
- _reduce_allocation_handle in _ipc.pyx
- _deep_reduce_device_memory_resource in _ipc.pyx
- _reduce_event in _event.pyx
Warning message explains that CUDA doesn't support fork, describes undefined behavior, and recommends using 'spawn' method

Test Coverage

Added comprehensive test suite test_multiprocessing_warning.py with 5 tests:

Warning emitted for DeviceMemoryResource pickling with fork method
Warning emitted for IPCAllocationHandle pickling with fork method
Warning emitted for Event pickling with fork method
No warning when start method is 'spawn'
Warning emitted only once (one-time check)

Tests run in subprocesses to avoid interference from conftest.py's session fixture that sets spawn method.

Related Work

Fixes #1136

copy-pr-bot · 2025-12-03T21:28:31Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Andy-Jost · 2025-12-03T21:28:34Z

/ok to test d1cce9a

Andy-Jost · 2025-12-03T21:30:51Z

/ok to test 3cd8f8a

copy-pr-bot · 2025-12-03T21:30:54Z

/ok to test 3cd8f8a

@Andy-Jost, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

Andy-Jost · 2025-12-03T21:37:04Z

/ok to test fcbb370

Andy-Jost · 2025-12-03T21:40:06Z

/ok to test 3d3499a

copy-pr-bot · 2025-12-03T21:40:10Z

/ok to test 3d3499a

@Andy-Jost, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

Andy-Jost · 2025-12-03T21:40:47Z

/ok to test 1144c8f

Andy-Jost · 2025-12-03T23:45:44Z

/ok to test 3d3499a

leofang · 2025-12-04T00:17:32Z

cuda_core/cuda/core/experimental/_utils/cuda_utils.pyx

+    global _fork_warning_emitted
+    if _fork_warning_emitted:
+        return


Instead of caching it, would it be better to always call multiprocessing.get_start_method() and check it? I worry that it is a global state that we don't own and it could be changed at arbitrary point in time by the user or any package.

In the official multiprocessing library docs, the description of multiprocessing.set_start_method() explains that:

The start method is a global setting that can only be chosen a single time in a given Python process.

Calling set_start_method() again without forcing will raise a RuntimeError indicating that the context has already been set.

Recent CPython implementations mention an internal/undocumented way to override this with a force argument, but this is not part of the public, documented API and should be avoided in normal code.

Thanks, Andy. Good to know Python by default checks this. However, I do see the force argument being documented, so it is part of the public API. And it does allow overwriting:

>>> import multiprocessing >>> multiprocessing.set_start_method("spawn") >>> multiprocessing.set_start_method("spawn") Traceback (most recent call last): File "<python-input-2>", line 1, in <module> multiprocessing.set_start_method("spawn") ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^ File "/local/home/leof/miniforge3/envs/py313_cu130/lib/python3.13/multiprocessing/context.py", line 247, in set_start_method raise RuntimeError('context has already been set') RuntimeError: context has already been set >>> multiprocessing.set_start_method("fork", force=True) >>> multiprocessing.set_start_method("spawn", force=True) >>> multiprocessing.set_start_method("forkserver", force=True) >>>

If this is not on the hot path, I think not caching the result and always checking is safer.

Alternatively, we could mention in the warning message that setting set_start_method(..., force=True) is dangerous because we cannot help capture issues.

One more thing, if we already warned the next warn() call is a no-op so adding the _fork_warning_emitted guard is redundant 😆

import warnings def f(): warnings.warn("oh no", UserWarning, stacklevel=3) f() # only this call would generate a warning f() f()

Ah I see, you want to limit the warning to only once per process lifetime, regardless of which object raises the warning from. NVM.

On minor detail -- since set_start_method can only be set once, I don't think we need to check it multiple times. In other words, we can set _fork_warning_emitted = True unconditionally rather than only when a warning is emitted so we don't check again. And maybe renaming the flag to _fork_warning_checked for clarity. Unless I'm missing some case where it might be set exceptionally late and it does need to be checked multiple times.

Thanks for the comments. I've made adjustments to the name and conditionality as Mike suggested. I left this as a one-time check because there does not seem to be a strong consensus, but I don't feel strongly about that and wouldn't mind changing it if more discussion goes in that direction.

Incidentally, a separate warning is issued when fork is called in multithreaded programs (like those using CUDA), so if a user somehow sidesteps this warning by setting the start method multiple times, they will still get another nasty message indicating their program is invalid.

CUDA does not support the fork() system call. Forked subprocesses exhibit undefined behavior, including failure to initialize CUDA contexts and devices. Add warning checks in multiprocessing reduction functions for IPC objects (DeviceMemoryResource, IPCAllocationHandle, Event) that warn when the start method is 'fork'. The warning is emitted once per process when IPC objects are serialized. Fixes NVIDIA#1136

Andy-Jost · 2025-12-04T00:31:57Z

/ok to test 3d3499a

Andy-Jost · 2025-12-04T00:39:44Z

/ok to test 9fd6b19

Andy-Jost · 2025-12-04T00:41:32Z

The previous test methodology of starting a subprocess did not work on the test runners. The latest upload avoids using a subprocess, instead mocking up key methods.

Change mempool_device to ipc_device fixture for tests that require IPC-enabled memory resources. The ipc_device fixture properly skips on Windows where IPC is not supported.

Andy-Jost · 2025-12-04T01:16:42Z

/ok to test 476459f

…t_method - Add reset_fork_warning() function for testing purposes - Rename _check_multiprocessing_start_method to check_multiprocessing_start_method (remove leading underscore) - Update all tests to use reset_fork_warning() instead of directly accessing internal flag - Fix trailing whitespace

Andy-Jost · 2025-12-04T14:42:55Z

/ok to test fbdd56d

Andy-Jost · 2025-12-04T14:51:05Z

/ok to test 3271964

github-actions · 2025-12-04T15:58:01Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

Andy-Jost added this to the cuda.core beta 10 milestone Dec 3, 2025

Andy-Jost added enhancement Any code-related improvements P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Dec 3, 2025

Andy-Jost requested review from leofang, rparolin and rwgk December 3, 2025 21:28

Andy-Jost force-pushed the warn-fork-multiprocessing branch from d1cce9a to 3cd8f8a Compare December 3, 2025 21:30

Andy-Jost force-pushed the warn-fork-multiprocessing branch from 3cd8f8a to fcbb370 Compare December 3, 2025 21:37

Andy-Jost force-pushed the warn-fork-multiprocessing branch from fcbb370 to 1144c8f Compare December 3, 2025 21:40

This comment has been minimized.

Sign in to view

Andy-Jost force-pushed the warn-fork-multiprocessing branch from 1144c8f to 3d3499a Compare December 3, 2025 23:42

leofang reviewed Dec 4, 2025

View reviewed changes

Andy-Jost force-pushed the warn-fork-multiprocessing branch from 3d3499a to 9fd6b19 Compare December 4, 2025 00:32

Andy-Jost self-assigned this Dec 4, 2025

Skip multiprocessing warning tests on Windows

476459f

Change mempool_device to ipc_device fixture for tests that require IPC-enabled memory resources. The ipc_device fixture properly skips on Windows where IPC is not supported.

leofang approved these changes Dec 4, 2025

View reviewed changes

Andy-Jost enabled auto-merge (squash) December 4, 2025 14:47

Merge branch 'main' into warn-fork-multiprocessing

3271964

Andy-Jost merged commit c860f3f into NVIDIA:main Dec 4, 2025
61 checks passed

Andy-Jost deleted the warn-fork-multiprocessing branch December 4, 2025 15:48

Warn when multiprocessing start method is 'fork' #1309

Warn when multiprocessing start method is 'fork' #1309

Uh oh!

Conversation

Andy-Jost commented Dec 3, 2025

Summary

Changes

Test Coverage

Related Work

Uh oh!

copy-pr-bot bot commented Dec 3, 2025

Uh oh!

Andy-Jost commented Dec 3, 2025

Uh oh!

Andy-Jost commented Dec 3, 2025

Uh oh!

copy-pr-bot bot commented Dec 3, 2025

Uh oh!

Andy-Jost commented Dec 3, 2025

Uh oh!

Andy-Jost commented Dec 3, 2025

Uh oh!

copy-pr-bot bot commented Dec 3, 2025

Uh oh!

Andy-Jost commented Dec 3, 2025

Uh oh!

This comment has been minimized.

Andy-Jost commented Dec 3, 2025

Uh oh!

leofang Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leofang Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

leofang Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

leofang Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

mdboom Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Andy-Jost commented Dec 4, 2025

Uh oh!

Andy-Jost commented Dec 4, 2025

Uh oh!

Andy-Jost commented Dec 4, 2025

Uh oh!

Andy-Jost commented Dec 4, 2025

Uh oh!

Andy-Jost commented Dec 4, 2025

Uh oh!

Andy-Jost commented Dec 4, 2025

Uh oh!

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Andy-Jost Dec 4, 2025 •

edited

Loading

Andy-Jost Dec 4, 2025 •

edited

Loading