add_text2video_ort_pipeline #105

naomili0924 · 2025-12-07T09:10:35Z

Relates to:
huggingface/diffusers#12846
huggingface/optimum#2389

This Pull Request is adding a uniform AutoText2VideoORTPipeline as requested from: huggingface/optimum#2168


import torch
from diffusers.utils import export_to_video

from optimum.onnxruntime.modeling_diffusion import ORTPipelineForText2Video

wan_list = [
    "Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
    "ali-vilab/text-to-video-ms-1.7b",
]

providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]

pipe = ORTPipelineForText2Video.from_pretrained(
    wan_list[1],
    provider=providers[0],  # Force GPU
    torch_dtype=torch.float16,
)
print("Loaded successfully on:", pipe.device)
prompt = "A cat walks on the grass, realistic"
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"

output = pipe(prompt=prompt, negative_prompt=negative_prompt, height=256, width=256, num_frames=50).frames[0]
export_to_video(output, "output.mp4", fps=15)

Result:

output.mp4

IlyasMoutawwakil · 2025-12-17T08:47:19Z

optimum/exporters/onnx/config.py

+class VideoOnnxConfig(OnnxConfig):
+    """Handles video architectures."""
+
+    DUMMY_INPUT_GENERATOR_CLASSES = (DummyVideoInputGenerator, DummyTimestepInputGenerator)


I don't think an abstract video onnx config is needed as it doesn't really abstract much here

IlyasMoutawwakil

Hi ! thanks a lot for the contribution !
is the PR finished ? I believe there also needs to be a method that describes how the Wan pipeline is split and which components it needs to export / use. Also some testing with a tiny model on the exporters and onnxruntime side would be great.

IlyasMoutawwakil · 2025-12-17T09:40:02Z

@naomili0924 let's rather follow the same design we did with sana, i.e. having a specific function for splitting the wan pipelines.

naomili0924 · 2025-12-25T12:18:21Z

Also some testing with a tiny model on the exporters and onnxruntime side would be great.
@IlyasMoutawwakil
It requires a tiny-Wan model and a tiny-text-to-video model to create a test case. Do you know how to create them?

naomili0924 force-pushed the ort_text_to_video_pipeline branch from 0b37b3d to 5d6c1f4 Compare December 8, 2025 07:58

add text2video onnx support for wan

4c86815

naomili0924 force-pushed the ort_text_to_video_pipeline branch 3 times, most recently from 60325d7 to 86347be Compare December 11, 2025 02:06

fix model patch for atn::_upsample-nearest-exact

307c429

naomili0924 force-pushed the ort_text_to_video_pipeline branch from 86347be to 307c429 Compare December 11, 2025 07:06

improve style

64c83bb

naomili0924 force-pushed the ort_text_to_video_pipeline branch 2 times, most recently from c8f37ec to 6840a70 Compare December 17, 2025 07:40

fix shape mismatch in wan pipeline inference

00ccf26

naomili0924 force-pushed the ort_text_to_video_pipeline branch from 6840a70 to b19e46a Compare December 17, 2025 08:14

naomili0924 changed the title ~~wan onnx exporter~~ add_text2video_ort_pipeline Dec 17, 2025

cleanup code

1dc9970

naomili0924 force-pushed the ort_text_to_video_pipeline branch from b19e46a to 1dc9970 Compare December 17, 2025 08:28

naomili0924 mentioned this pull request Dec 17, 2025

OnnxRuntime Support for Text2Video and Img2Video Pipelines huggingface/optimum#2168

Open

IlyasMoutawwakil reviewed Dec 17, 2025

View reviewed changes

IlyasMoutawwakil mentioned this pull request Dec 17, 2025

add_text2video_pipeline huggingface/optimum#2389

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add_text2video_ort_pipeline #105

add_text2video_ort_pipeline #105

Uh oh!

naomili0924 commented Dec 7, 2025 •

edited

Loading

Uh oh!

IlyasMoutawwakil Dec 17, 2025 •

edited

Loading

Uh oh!

IlyasMoutawwakil left a comment •

edited

Loading

Uh oh!

IlyasMoutawwakil commented Dec 17, 2025

Uh oh!

naomili0924 commented Dec 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add_text2video_ort_pipeline #105

Are you sure you want to change the base?

add_text2video_ort_pipeline #105

Uh oh!

Conversation

naomili0924 commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IlyasMoutawwakil Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil commented Dec 17, 2025

Uh oh!

naomili0924 commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

naomili0924 commented Dec 7, 2025 •

edited

Loading

IlyasMoutawwakil Dec 17, 2025 •

edited

Loading

IlyasMoutawwakil left a comment •

edited

Loading

naomili0924 commented Dec 25, 2025 •

edited

Loading