Fix Race condition in --async-offload that can cause corruption #10501

rattus128 · 2025-10-27T09:56:13Z

Core change notes (2nd commit):

This sync is nessacary as pytorch will queue cuda async frees on the
same stream as created to tensor. In the case of async offload, this
will be on the offload stream.

Weights and biases can go out of scope in python which then
triggers the pytorch garbage collector to queue the free operation on
the offload stream possible before the compute stream has used the
weight. This causes a use after free on weight data leading to total
corruption of some workflows.

So sync the offload stream with the compute stream after the weight
has been used so the free has to wait for the weight to be used.

The cast_bias_weight is extended in a backwards compatible way with
the new behaviour opt-in on a defaulted parameter. This handles
custom node packs calling cast_bias_weight and defeatures
async-offload for them (as they do not handle the race).

The pattern is now:

cast_bias_weight(... , offloadable=True)
thing(weight, bias, ...)
uncast_bias_weight(...)

Example test case:

Linux, RTX3060, 96GB RAM
Wan 2.2 I2V FP8, 192x192x9f, 5+5 steps
python main.py --novram --async-offload

This repeatedly blacks screens and corrupts my outputs:

For control sake, here is the same without --async-offload:

And with this fix:

The function signature change to cast_bias_weight may cause performance regression for users of --async-offload with custom node packs that call this function as a helper. I dont see a way to both keep performance and solve the race without updating custom nodes packs one by one

Amongst my ever growing collection of custom-nodes there are some in @kijai publications (FYI):

./ComfyUI-KJNodes/nodes/model_optimization_nodes.py:203:                        weight, bias = cast_bias_weight(self, input)
./ComfyUI-WanVideoWrapper/custom_linear.py:88:        weight, bias = cast_bias_weight(self, input)

rattus128 · 2025-10-27T12:56:33Z

This one would do well with an integration test, is there a way to integ test with --async-offload set?

Kosinkadink · 2025-10-27T22:22:34Z

Nice! Talked with comfy and this will have to go in after #10498 is merged, so this PR will need to be rebased on that one.

comfyanonymous · 2025-10-28T20:28:31Z

Can you rebase this on the latest?

Make this a reusable function.

This sync is nessacary as pytorch will queue cuda async frees on the same stream as created to tensor. In the case of async offload, this will be on the offload stream. Weights and biases can go out of scope in python which then triggers the pytorch garbage collector to queue the free operation on the offload stream possible before the compute stream has used the weight. This causes a use after free on weight data leading to total corruption of some workflows. So sync the offload stream with the compute stream after the weight has been used so the free has to wait for the weight to be used. The cast_bias_weight is extended in a backwards compatible way with the new behaviour opt-in on a defaulted parameter. This handles custom node packs calling cast_bias_weight and defeatures async-offload for them (as they do not handle the race). The pattern is now: cast_bias_weight(... , offloadable=True) #This might be offloaded thing(weight, bias, ...) uncast_bias_weight(...)

This is nessacary for safe async weight offloading.

Currently this peeks ahead to sync the next stream in the queue of streams with the compute stream. This doesnt allow a lot of parallelization, as then end result is you can only get one weight load ahead regardless of how many streams you have. Rotate the loop logic here to synchronize the end of the queue before returning the next stream. This allows weights to be loaded ahead of the compute streams position.

rattus128 · 2025-10-29T12:18:13Z

Can you rebase this on the latest?

Done. Thanks.

Kosinkadink · 2025-10-29T18:16:20Z

I think it's good to go - I'll double check with comfy before merging

rattus128 requested a review from Kosinkadink as a code owner October 27, 2025 09:56

rattus128 force-pushed the async-offload-streams branch from e61b839 to a50c8c2 Compare October 27, 2025 09:58

rattus128 added 4 commits October 29, 2025 21:31

mm: factor out the current stream getter

313638d

Make this a reusable function.

controlnet: adopt new cast_bias_weight synchronization scheme

432a5a9

This is nessacary for safe async weight offloading.

rattus128 force-pushed the async-offload-streams branch from a50c8c2 to 6ceba8c Compare October 29, 2025 11:39

comfyanonymous merged commit ab7ab5b into comfyanonymous:master Oct 29, 2025
10 checks passed

rattus128 mentioned this pull request Oct 30, 2025

Comfy generates noise after cancelling Qwen Image Edit Nunchaku generation #10550

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Race condition in --async-offload that can cause corruption #10501

Fix Race condition in --async-offload that can cause corruption #10501

Uh oh!

rattus128 commented Oct 27, 2025 •

edited

Loading

Uh oh!

rattus128 commented Oct 27, 2025

Uh oh!

Kosinkadink commented Oct 27, 2025

Uh oh!

comfyanonymous commented Oct 28, 2025

Uh oh!

rattus128 commented Oct 29, 2025

Uh oh!

Kosinkadink commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix Race condition in --async-offload that can cause corruption #10501

Fix Race condition in --async-offload that can cause corruption #10501

Uh oh!

Conversation

rattus128 commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rattus128 commented Oct 27, 2025

Uh oh!

Kosinkadink commented Oct 27, 2025

Uh oh!

comfyanonymous commented Oct 28, 2025

Uh oh!

rattus128 commented Oct 29, 2025

Uh oh!

Kosinkadink commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rattus128 commented Oct 27, 2025 •

edited

Loading