Skip to content

Conversation

@naomili0924
Copy link
Contributor

@naomili0924 naomili0924 commented Dec 7, 2025

Relates to:
huggingface/diffusers#12846
huggingface/optimum#2389

This Pull Request is adding a uniform AutoText2VideoORTPipeline as requested from: huggingface/optimum#2168


import torch
from diffusers.utils import export_to_video

from optimum.onnxruntime.modeling_diffusion import ORTPipelineForText2Video

wan_list = [
    "Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
    "ali-vilab/text-to-video-ms-1.7b",
]

providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]

pipe = ORTPipelineForText2Video.from_pretrained(
    wan_list[1],
    provider=providers[0],  # Force GPU
    torch_dtype=torch.float16,
)
print("Loaded successfully on:", pipe.device)
prompt = "A cat walks on the grass, realistic"
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"

output = pipe(prompt=prompt, negative_prompt=negative_prompt, height=256, width=256, num_frames=50).frames[0]
export_to_video(output, "output.mp4", fps=15)

Result:

output.mp4

@naomili0924 naomili0924 force-pushed the ort_text_to_video_pipeline branch from 0b37b3d to 5d6c1f4 Compare December 8, 2025 07:58
@naomili0924 naomili0924 force-pushed the ort_text_to_video_pipeline branch 3 times, most recently from 60325d7 to 86347be Compare December 11, 2025 02:06
@naomili0924 naomili0924 force-pushed the ort_text_to_video_pipeline branch from 86347be to 307c429 Compare December 11, 2025 07:06
@naomili0924 naomili0924 force-pushed the ort_text_to_video_pipeline branch 2 times, most recently from c8f37ec to 6840a70 Compare December 17, 2025 07:40
@naomili0924 naomili0924 force-pushed the ort_text_to_video_pipeline branch from 6840a70 to b19e46a Compare December 17, 2025 08:14
@naomili0924 naomili0924 changed the title wan onnx exporter add_text2video_ort_pipeline Dec 17, 2025
Comment on lines +417 to +420
class VideoOnnxConfig(OnnxConfig):
"""Handles video architectures."""

DUMMY_INPUT_GENERATOR_CLASSES = (DummyVideoInputGenerator, DummyTimestepInputGenerator)
Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think an abstract video onnx config is needed as it doesn't really abstract much here

Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi ! thanks a lot for the contribution !
is the PR finished ? I believe there also needs to be a method that describes how the Wan pipeline is split and which components it needs to export / use. Also some testing with a tiny model on the exporters and onnxruntime side would be great.

@IlyasMoutawwakil
Copy link
Member

@naomili0924 let's rather follow the same design we did with sana, i.e. having a specific function for splitting the wan pipelines.

@naomili0924
Copy link
Contributor Author

naomili0924 commented Dec 25, 2025

Also some testing with a tiny model on the exporters and onnxruntime side would be great.
@IlyasMoutawwakil
It requires a tiny-Wan model and a tiny-text-to-video model to create a test case. Do you know how to create them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants