Skip to content

KlingAIResearch/ShotStream

Repository files navigation

Streaming Multi-Shot Video Generation for Interactive Storytelling

Yawen Luo1 Xiaoyu Shi2,✉ Junhao Zhuang1 Yutian Chen1 Quande Liu2
Xintao Wang2 Pengfei Wan2 Tianfan Xue1,3,✉

1MMLab, CUHK    2Kling Team, Kuaishou Technology
3CPII under InnoHK    Corresponding author

     

📋 Table of Contents

Note: This open-source repository is a reference implementation. Please note that the original model utilizes internal data, and the prompts in these demo cases exhibit a distribution gap compared to our original training and inference phases.

🔥 Updates

📷 Introduction

TL;DR: We propose ShotStream, a novel causal multi-shot architecture that enables interactive storytelling and efficient on-the-fly frame generation, achieving 16 FPS on a single NVIDIA GPU.

Please watch more video results in our Project Page.

demo.mp4

⚙️ Code: ShotStream + Wan2.1-T2V-1.3B

Inference

1. Environment Setup

Create a conda environment and install dependencies:

git clone https://github.com/KlingAIResearch/ShotStream.git
cd ShotStream
conda create -n shotstream python=3.10 -y
conda activate shotstream
conda install nvidia/label/cuda-12.4.1::cuda
conda install -c nvidia/label/cuda-12.4.1 cudatoolkit
pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Or directly:

bash tools/setup/env.sh

2. Download Checkpoints

Download the checkpoints of Wan-T2V-1.3B and ShotStream:

apt-get install git-lfs
git-lfs install
git clone https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B wan_models
git clone https://huggingface.co/KlingTeam/ShotStream ckpts

Or directly:

bash tools/setup/download_ckpt.sh

3. Run Inference

Autoregressive 4-step Long Multi-Shot Video Generation:

Note: Due to company policy restrictions, the prompts in these demo cases exhibit a distribution shift compared to those used during our original training and inference phases.

bash tools/inference/causal_fewsteps.sh

Training

Note:

  1. You need to update MASTER_ADDR in all bash files with the main node's IP address. For multi-node training, the NNODES variable also needs to be modified accordingly.
  2. The multi-shot video example provided is sourced from a public dataset for demonstration purposes. Its captions differ from those used in our actual training set.

Step 1: Bidirectional Next-Shot Teacher Model Training

Single node:

bash tools/train/1_basemodel.sh 0

Multi-nodes:

# Run this command on node 0 (main node)
bash tools/train/1_basemodel.sh 0
# Run this command on node 1 (worker node)
bash tools/train/1_basemodel.sh 1
...

Step 2: Causal Student Model Distillation

Step 2.1: Causal Adaptation Initialization

Following CausVid, we initialize the causal student with the bidirectional teacher's weights. Training all parameters on 5K teacher ODE solution pairs aligns their trajectories, bridging the architectural gap and stabilizing subsequent distillation.

Step 2.1.1: Get ODE Pairs from Teacher
python Teacher_Ode_Sample.py \
  --ckpt_dir ckpts/bidirectional_teacher.pt \
  --save_dir demo/data/ode_sample \
  --data_csv_path demo/data/sample.csv
Step 2.1.2: Get ODE Pairs CSV
python get_ode_csv.py \
    -i demo/data/ode_sample \
    -o demo/data/ode_sample.csv
Step 2.1.3: Causal Initialization

Single node:

bash tools/train/2_ode_init.sh 0

Multi-nodes:

# Run this command on node 0 (main node)
bash tools/train/2_ode_init.sh 0
# Run this command on node 1 (worker node)
bash tools/train/2_ode_init.sh 1
...
Step 2.2: Two-stage Causal Distillation
Step 2.2.1: Intra-shot Self-forcing Distillation

Single node:

bash tools/train/3_dmd.sh 0

Multi-nodes:

# Run this command on node 0 (main node)
bash tools/train/3_dmd.sh 0
# Run this command on node 1 (worker node)
bash tools/train/3_dmd.sh 1
...
Step 2.2.2: Inter-shot Self-forcing Distillation

Single node:

bash tools/train/4_dmd_long.sh 0

Multi-nodes:

# Run this command on node 0 (main node)
bash tools/train/4_dmd_long.sh 0
# Run this command on node 1 (worker node)
bash tools/train/4_dmd_long.sh 1
...

🌟 Citation

Please leave us a star 🌟 and cite our paper if you find our work helpful.

@article{luo2026shotstream,
  title={ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling},
  author={Luo, Yawen and Shi, Xiaoyu and Zhuang, Junhao and Chen, Yutian and Liu, Quande and Wang, Xintao and Wan, Pengfei and Xue, Tianfan},
  journal={arXiv preprint arXiv:2603.25746},
  year={2026}
}

🤗 Acknowledgement

  • CausVid: the distillation procedure we built upon. Thanks for their wonderful work.
  • Self Forcing: the distillation procedure we built upon. Thanks for their wonderful work.
  • LongLive: the distillation procedure we built upon. Thanks for their wonderful work.
  • Wan: the base model we built upon. Thanks for their wonderful work.

About

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors