[group offloading] avoid unnecessary moving out to speed up inference #12910

gameofdimension · 2026-01-04T10:07:20Z

Explicitly moving weights back to the CPU after computation is unnecessary—we can avoid it just like in the use_stream=True case. Since device-to-host copying is expensive, this change significantly improves inference speed when use_stream=False.

improvement

device: A100 40G
model: Qwen/Qwen-Image

	step latency
baseline	54s
this pr	4.8s

test code

import time
from diffusers import QwenImagePipeline, QwenImageTransformer2DModel
import torch


def main():
    device = "cuda"
    model_name = "Qwen/Qwen-Image"
    torch_dtype = torch.bfloat16
    pipe: QwenImagePipeline = QwenImagePipeline.from_pretrained(model_name, torch_dtype=torch_dtype)
    # pipe.enable_model_cpu_offload(device=device)

    offload_type = "block_level"
    num_blocks_per_group = 1
    use_stream = False
    assert isinstance(pipe.transformer, QwenImageTransformer2DModel)
    pipe.transformer.enable_group_offload(
        onload_device=device,
        offload_device="cpu",
        offload_type=offload_type,
        num_blocks_per_group=num_blocks_per_group,
        use_stream=use_stream,
    )
    pipe.to(device=device)

    positive_magic = {
        "en": ", Ultra HD, 4K, cinematic composition.",  # for english prompt
        "zh": ", 超清，4K，电影级构图.",  # for chinese prompt
    }

    # Generate image
    prompt = """A coffee shop entrance features a chalkboard sign reading "Qwen Coffee 😊 $2 per cup," with a neon light beside it displaying "通义千问". Next to it hangs a poster showing a beautiful Chinese woman, and beneath the poster is written "π≈3.1415926-53589793-23846264-33832795-02384197" perfect Ultra HD"""

    negative_prompt = (
        "very bad quality"  # using an empty string if you do not have specific concept to remove
    )

    # Generate with different aspect ratios
    aspect_ratios = {
        "1:1": (1328, 1328),
        "16:9": (1664, 928),
        "9:16": (928, 1664),
        "4:3": (1472, 1140),
        "3:4": (1140, 1472),
        "3:2": (1584, 1056),
        "2:3": (1056, 1584),
    }

    width, height = aspect_ratios["16:9"]
    generator = torch.Generator(device="cpu").manual_seed(42)

    image = pipe(
        prompt=prompt + positive_magic["en"],
        negative_prompt=negative_prompt,
        width=width,
        height=height,
        num_inference_steps=50,
        true_cfg_scale=4.0,
        generator=generator,
    ).images[0]

    image.save(f"example-{int(time.time())}.png")


if __name__ == "__main__":
    main()

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Refactor offloading logic to simplify memory management.

gameofdimension · 2026-01-05T15:02:54Z

@DN6 @yiyixuxu Could you please take a look at this change?

Simplify offloading to memory logic

b40f1da

Refactor offloading logic to simplify memory management.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[group offloading] avoid unnecessary moving out to speed up inference #12910

[group offloading] avoid unnecessary moving out to speed up inference #12910

Uh oh!

gameofdimension commented Jan 4, 2026 •

edited

Loading

Uh oh!

gameofdimension commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[group offloading] avoid unnecessary moving out to speed up inference #12910

Are you sure you want to change the base?

[group offloading] avoid unnecessary moving out to speed up inference #12910

Uh oh!

Conversation

gameofdimension commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

improvement

test code

What does this PR do?

Before submitting

Who can review?

Uh oh!

gameofdimension commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gameofdimension commented Jan 4, 2026 •

edited

Loading