CineScale is an extended work of FreeScale for higher-resolution visual generation, unlocking the 4k video generation!
Haonan Qiu, Ning Yu*, Ziqi Huang, Paul Debevec, and Ziwei Liu*
(* Corresponding Author)
From Nanyang Technological University and Netflix Eyeline Studios.
git clone https://github.com/Eyeline-Labs/CineScale.git
cd CineScale
conda create -n cinescale python=3.10
conda activate cinescale
pip install -e .
pip install xfuser>=0.4.3
pip install flash-attn==2.7.4.post1 --no-build-isolation
Model | Tuning Resolution | Checkpoint | Description |
---|---|---|---|
CineScale-1.3B-T2V (Text2Video) | 1088x1920 | Hugging Face | Support 3k(1632x2880) inference on A100 x 1 |
CineScale-14B-T2V (Text2Video) | 1088x1920 | Hugging Face | Support 4k(2176x3840) inference on A100 x 8 |
CineScale-14B-I2V (Text2Video) | 1088x1920 | Hugging Face | Support 4k(2176x3840) inference on A100 x 8 |
Download the checkpoint from Hugging Face and put it the folder models
.
Single GPU
CUDA_VISIBLE_DEVICES=0 python cinescale_t2v1.3b_single.py
Multiple GPUs
torchrun --standalone --nproc_per_node=8 cinescale_t2v1.3b.py
torchrun --standalone --nproc_per_node=8 cinescale_t2v1.3b_pro.py
torchrun --standalone --nproc_per_node=8 cinescale_t2v14b_pro.py
# May set attention_coef to 1.5 for better results (line 123, diffsynth/distributed/xdit_context_parallel.py)
torchrun --standalone --nproc_per_node=8 cinescale_i2v14b.py
This codebase is built on top of the open-source implementation of Wan2.1 based on DiffSynth-Studio repo.
@article{qiu2025cinescale,
title={CineScale: Free Lunch in High-Resolution Cinematic Visual Generation},
author={Haonan Qiu and Ning Yu and Ziqi Huang and Paul Debevec and Ziwei Liu},
journal={arXiv preprint arXiv:2508.15774},
year={2025}
}