-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Add LTX 2.0 Video Pipelines #12915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+9,513
−0
Merged
Add LTX 2.0 Video Pipelines #12915
Changes from all commits
Commits
Show all changes
103 commits
Select commit
Hold shift + click to select a range
aa602ac
Initial LTX 2.0 transformer implementation
dg845 b3096c3
Add tests for LTX 2 transformer model
dg845 980591d
Get LTX 2 transformer tests working
dg845 e100b8f
Rename LTX 2 compile test class to have LTX2
dg845 780fb61
Remove RoPE debug print statements
dg845 5765759
Get LTX 2 transformer compile tests passing
dg845 aeecc4d
Fix LTX 2 transformer shape errors
dg845 a5f2d2d
Initial script to convert LTX 2 transformer to diffusers
dg845 d86f89d
Add more LTX 2 transformer audio arguments
dg845 57a8b9c
Allow LTX 2 transformer to be loaded from local path for conversion
dg845 a7bc052
Improve dummy inputs and add test for LTX 2 transformer consistency
dg845 bda3ff1
Fix LTX 2 transformer bugs so consistency test passes
dg845 269cf7b
Initial implementation of LTX 2.0 video VAE
dg845 baf23e2
Explicitly specify temporal and spatial VAE scale factors when conver…
dg845 5b950d6
Add initial LTX 2.0 video VAE tests
dg845 491aae0
Add initial LTX 2.0 video VAE tests (part 2)
dg845 a748975
Get diffusers implementation on par with official LTX 2.0 video VAE i…
dg845 c6a11a5
Initial LTX 2.0 vocoder implementation
dg845 8bfeb4a
Merge pull request #3 from huggingface/ltx-2-vocoder
dg845 b1cf6ff
Merge pull request #2 from huggingface/ltx-2-video-vae
dg845 6c56954
Use RMSNorm implementation closer to original for LTX 2.0 video VAE
dg845 b34ddb1
start audio decoder.
sayakpaul f4c2435
init registration.
sayakpaul e54cd6b
up
sayakpaul 907896d
simplify and clean up
sayakpaul 4904fd6
up
sayakpaul 0028955
Initial LTX 2.0 text encoder implementation
dg845 d0f9cda
Rough initial LTX 2.0 pipeline implementation
dg845 5f0f2a0
up
sayakpaul 58257eb
up
sayakpaul 059999a
up
sayakpaul 8134da6
up
sayakpaul 409d651
resolve conflicts.
sayakpaul 7bb4cf7
Merge pull request #5 from huggingface/audio-decoder
dg845 5f7e43d
Add imports for LTX 2.0 Audio VAE
dg845 d303e2a
Conversion script for LTX 2.0 Audio VAE Decoder
dg845 ae3b6e7
Merge branch 'ltx-2-transformer' into ltx-2-t2v-pipeline
dg845 54bfc5d
Add Audio VAE logic to T2V pipeline
dg845 6e6ce20
Duplicate scheduler for audio latents
dg845 cbb10b8
Support num_videos_per_prompt for prompt embeddings
dg845 595f485
LTX 2.0 scheduler and full pipeline conversion
dg845 3bf7369
Add script to test full LTX2Pipeline T2V inference
dg845 fa7d9f7
Fix pipeline return bugs
dg845 a56cf23
Add LTX 2 text encoder and vocoder to ltx2 subdirectory __init__
dg845 90edc6a
Fix more bugs in LTX2Pipeline.__call__
dg845 1484c43
Improve CPU offload support
dg845 f9b9476
Fix pipeline audio VAE decoding dtype bug
dg845 e89d9c1
Fix video shape error in full pipeline test script
dg845 b5891b1
Get LTX 2 T2V pipeline to produce reasonable outputs
dg845 0c41297
Merge pull request #4 from huggingface/ltx-2-t2v-pipeline
dg845 581f21c
Make LTX 2.0 scheduler more consistent with original code
dg845 e1f0b7e
Fix typo when applying scheduler fix in T2V inference script
dg845 280e347
Refactor Audio VAE to be simpler and remove helpers (#7)
sayakpaul 46822c4
Add support for I2V (#8)
sayakpaul 6a236a2
Merge branch 'ltx-2-transformer' into make-scheduler-consistent
dg845 bd607b9
Denormalize audio latents in I2V pipeline (analogous to T2V change) (…
dg845 d3f10fe
test i2v.
sayakpaul aae70b9
Merge pull request #10 from huggingface/make-scheduler-consistent
dg845 caae167
Move Video and Audio Text Encoder Connectors to Transformer (#12)
dg845 0be4f31
up (#19)
sayakpaul c5b52d6
address initial feedback from lightricks team (#16)
sayakpaul 2fa4f84
When using split RoPE, make sure that the output dtype is same as inp…
dg845 bff9891
Fix apply split RoPE shape error when reshaping x to 4D
dg845 cb50cac
Add export_utils file for exporting LTX 2.0 videos with audio
dg845 ce9da5d
Merge pull request #20 from huggingface/video-export-utils-file
dg845 93a417f
Tests for T2V and I2V (#6)
sayakpaul 9b8788c
resolve conflicts.
sayakpaul c039c87
up
sayakpaul 550eca3
use export util funcs.
sayakpaul ef19911
Point original checkpoint to LTX 2.0 official checkpoint
dg845 ace2ee9
Allow the I2V pipeline to accept image URLs
dg845 dd81242
make style and make quality
dg845 2fc5789
Merge branch 'main' into ltx-2-transformer
sayakpaul 57ead0b
remove function map.
sayakpaul c39f1b8
remove args.
sayakpaul bdcf23e
update docs.
sayakpaul 61e0fb4
update doc entries.
sayakpaul 8c5ab1f
disable ltx2_consistency test
sayakpaul 64b48c1
Merge branch 'main' into ltx-2-transformer
sayakpaul 5e0cf2b
Simplify LTX 2 RoPE forward by removing coords is None logic
dg845 d01a242
make style and make quality
dg845 79cf6d7
Support LTX 2.0 audio VAE encoder
dg845 cc28cf7
Merge branch 'main' into ltx-2-transformer
sayakpaul 91ee2dd
resolve conflicts
sayakpaul 5269ee5
Merge branch 'ltx-2-transformer' of github.com:huggingface/diffusers …
dg845 a17f5cb
Apply suggestions from code review
dg845 964f106
Remove print statement in audio VAE
dg845 4dfe509
up
sayakpaul 249ae1f
Merge branch 'main' into ltx-2-transformer
sayakpaul 040c118
Fix bug when calculating audio RoPE coords
dg845 44925cb
Ltx 2 latent upsample pipeline (#12922)
sayakpaul 5e50046
Fix latent upsampler filename in LTX 2 conversion script
dg845 2b85b93
Add latent upsample pipeline to LTX 2 docs
dg845 40ee3e3
Add dummy objects for LTX 2 latent upsample pipeline
dg845 99ff722
Set default FPS to official LTX 2 ckpt default of 24.0
dg845 165b945
Set default CFG scale to official LTX 2 ckpt default of 4.0
dg845 1a4ae58
Update LTX 2 pipeline example docstrings
dg845 b4d33df
make style and make quality
dg845 724afee
Remove LTX 2 test scripts
dg845 d24faa7
Fix LTX 2 upsample pipeline example docstring
dg845 353f0db
Add logic to convert and save a LTX 2 upsampling pipeline
dg845 0c9e4e2
Merge branch 'main' into ltx-2-transformer
sayakpaul f85b969
Document LTX2VideoTransformer3DModel forward pass
dg845 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| <!-- Copyright 2025 The HuggingFace Team. All rights reserved. | ||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. --> | ||
|
|
||
| # AutoencoderKLLTX2Audio | ||
|
|
||
| The 3D variational autoencoder (VAE) model with KL loss used in [LTX-2](https://huggingface.co/Lightricks/LTX-2) was introduced by Lightricks. This is for encoding and decoding audio latent representations. | ||
|
|
||
| The model can be loaded with the following code snippet. | ||
|
|
||
| ```python | ||
| from diffusers import AutoencoderKLLTX2Audio | ||
|
|
||
| vae = AutoencoderKLLTX2Audio.from_pretrained("Lightricks/LTX-2", subfolder="vae", torch_dtype=torch.float32).to("cuda") | ||
| ``` | ||
|
|
||
| ## AutoencoderKLLTX2Audio | ||
|
|
||
| [[autodoc]] AutoencoderKLLTX2Audio | ||
| - encode | ||
| - decode | ||
| - all | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| <!-- Copyright 2025 The HuggingFace Team. All rights reserved. | ||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. --> | ||
|
|
||
| # AutoencoderKLLTX2Video | ||
|
|
||
| The 3D variational autoencoder (VAE) model with KL loss used in [LTX-2](https://huggingface.co/Lightricks/LTX-2) was introduced by Lightricks. | ||
|
|
||
| The model can be loaded with the following code snippet. | ||
|
|
||
| ```python | ||
| from diffusers import AutoencoderKLLTX2Video | ||
|
|
||
| vae = AutoencoderKLLTX2Video.from_pretrained("Lightricks/LTX-2", subfolder="vae", torch_dtype=torch.float32).to("cuda") | ||
| ``` | ||
|
|
||
| ## AutoencoderKLLTX2Video | ||
|
|
||
| [[autodoc]] AutoencoderKLLTX2Video | ||
| - decode | ||
| - encode | ||
| - all |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| <!-- Copyright 2025 The HuggingFace Team. All rights reserved. | ||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. --> | ||
|
|
||
| # LTX2VideoTransformer3DModel | ||
|
|
||
| A Diffusion Transformer model for 3D data from [LTX](https://huggingface.co/Lightricks/LTX-2) was introduced by Lightricks. | ||
|
|
||
| The model can be loaded with the following code snippet. | ||
|
|
||
| ```python | ||
| from diffusers import LTX2VideoTransformer3DModel | ||
|
|
||
| transformer = LTX2VideoTransformer3DModel.from_pretrained("Lightricks/LTX-2", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda") | ||
| ``` | ||
|
|
||
| ## LTX2VideoTransformer3DModel | ||
|
|
||
| [[autodoc]] LTX2VideoTransformer3DModel |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| <!-- Copyright 2025 The HuggingFace Team. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. --> | ||
|
|
||
| # LTX-2 | ||
|
|
||
| LTX-2 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution. | ||
|
|
||
| You can find all the original LTX-Video checkpoints under the [Lightricks](https://huggingface.co/Lightricks) organization. | ||
|
|
||
| The original codebase for LTX-2 can be found [here](https://github.com/Lightricks/LTX-2). | ||
|
|
||
| ## LTX2Pipeline | ||
|
|
||
| [[autodoc]] LTX2Pipeline | ||
| - all | ||
| - __call__ | ||
|
|
||
| ## LTX2ImageToVideoPipeline | ||
|
|
||
| [[autodoc]] LTX2ImageToVideoPipeline | ||
| - all | ||
| - __call__ | ||
|
|
||
| ## LTX2LatentUpsamplePipeline | ||
|
|
||
| [[autodoc]] LTX2LatentUpsamplePipeline | ||
| - all | ||
| - __call__ | ||
|
|
||
| ## LTX2PipelineOutput | ||
|
|
||
| [[autodoc]] pipelines.ltx2.pipeline_output.LTX2PipelineOutput |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.