Yawen Luo1
Xiaoyu Shi2,✉
Junhao Zhuang1
Yutian Chen1
Quande Liu2
Xintao Wang2
Pengfei Wan2
Tianfan Xue1,3,✉
1MMLab, CUHK
2Kling Team, Kuaishou Technology
3CPII under InnoHK
✉Corresponding author
- 📋 Table of Contents
- 🔥 Updates
- 📷 Introduction
- ⚙️ Code: ShotStream + Wan2.1-T2V-1.3B
- 🌟 Citation
- 🤗 Acknowledgement
Note: This open-source repository is a reference implementation. Please note that the original model utilizes internal data, and the prompts in these demo cases exhibit a distribution gap compared to our original training and inference phases.
- [2026.03.27]: Release the Training and Inference Code and the Checkpoints.
- [2026.03.27]: Release the Project Page and the Arxiv version.
TL;DR: We propose ShotStream, a novel causal multi-shot architecture that enables interactive storytelling and efficient on-the-fly frame generation, achieving 16 FPS on a single NVIDIA GPU.
Please watch more video results in our Project Page.
demo.mp4
Create a conda environment and install dependencies:
git clone https://github.com/KlingAIResearch/ShotStream.git
cd ShotStream
conda create -n shotstream python=3.10 -y
conda activate shotstream
conda install nvidia/label/cuda-12.4.1::cuda
conda install -c nvidia/label/cuda-12.4.1 cudatoolkit
pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt
pip install flash-attn --no-build-isolationOr directly:
bash tools/setup/env.shDownload the checkpoints of Wan-T2V-1.3B and ShotStream:
apt-get install git-lfs
git-lfs install
git clone https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B wan_models
git clone https://huggingface.co/KlingTeam/ShotStream ckptsOr directly:
bash tools/setup/download_ckpt.shAutoregressive 4-step Long Multi-Shot Video Generation:
Note: Due to company policy restrictions, the prompts in these demo cases exhibit a distribution shift compared to those used during our original training and inference phases.
bash tools/inference/causal_fewsteps.shNote:
- You need to update
MASTER_ADDRin allbashfiles with the main node's IP address. For multi-node training, theNNODESvariable also needs to be modified accordingly.- The multi-shot video example provided is sourced from a public dataset for demonstration purposes. Its captions differ from those used in our actual training set.
Single node:
bash tools/train/1_basemodel.sh 0Multi-nodes:
# Run this command on node 0 (main node)
bash tools/train/1_basemodel.sh 0
# Run this command on node 1 (worker node)
bash tools/train/1_basemodel.sh 1
...Following CausVid, we initialize the causal student with the bidirectional teacher's weights. Training all parameters on 5K teacher ODE solution pairs aligns their trajectories, bridging the architectural gap and stabilizing subsequent distillation.
python Teacher_Ode_Sample.py \
--ckpt_dir ckpts/bidirectional_teacher.pt \
--save_dir demo/data/ode_sample \
--data_csv_path demo/data/sample.csvpython get_ode_csv.py \
-i demo/data/ode_sample \
-o demo/data/ode_sample.csvSingle node:
bash tools/train/2_ode_init.sh 0Multi-nodes:
# Run this command on node 0 (main node)
bash tools/train/2_ode_init.sh 0
# Run this command on node 1 (worker node)
bash tools/train/2_ode_init.sh 1
...Single node:
bash tools/train/3_dmd.sh 0Multi-nodes:
# Run this command on node 0 (main node)
bash tools/train/3_dmd.sh 0
# Run this command on node 1 (worker node)
bash tools/train/3_dmd.sh 1
...Single node:
bash tools/train/4_dmd_long.sh 0Multi-nodes:
# Run this command on node 0 (main node)
bash tools/train/4_dmd_long.sh 0
# Run this command on node 1 (worker node)
bash tools/train/4_dmd_long.sh 1
...Please leave us a star 🌟 and cite our paper if you find our work helpful.
@article{luo2026shotstream,
title={ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling},
author={Luo, Yawen and Shi, Xiaoyu and Zhuang, Junhao and Chen, Yutian and Liu, Quande and Wang, Xintao and Wan, Pengfei and Xue, Tianfan},
journal={arXiv preprint arXiv:2603.25746},
year={2026}
}
- CausVid: the distillation procedure we built upon. Thanks for their wonderful work.
- Self Forcing: the distillation procedure we built upon. Thanks for their wonderful work.
- LongLive: the distillation procedure we built upon. Thanks for their wonderful work.
- Wan: the base model we built upon. Thanks for their wonderful work.
