Streaming Multi-Shot Video Generation for Interactive Storytelling

Yawen Luo¹ Xiaoyu Shi^2,✉ Junhao Zhuang¹ Yutian Chen¹ Quande Liu²
Xintao Wang² Pengfei Wan² Tianfan Xue^1,3,✉

¹MMLab, CUHK ²Kling Team, Kuaishou Technology
³CPII under InnoHK ^✉Corresponding author

📋 Table of Contents

📋 Table of Contents
🔥 Updates
📷 Introduction
⚙️ Code: ShotStream + Wan2.1-T2V-1.3B
- Inference
- Training
  - Step 1: Bidirectional Next-Shot Teacher Model Training
  - Step 2: Causal Student Model Distillation
    - Step 2.1: Causal Adaptation Initialization
    - Step 2.2: Two-stage Causal Distillation
      - Step 2.2.1: Intra-shot Self-forcing Distillation
      - Step 2.2.2: Inter-shot Self-forcing Distillation
🌟 Citation
🤗 Acknowledgement

Note: This open-source repository is a reference implementation. Please note that the original model utilizes internal data, and the prompts in these demo cases exhibit a distribution gap compared to our original training and inference phases.

🔥 Updates

[2026.03.27]: Release the Training and Inference Code and the Checkpoints.
[2026.03.27]: Release the Project Page and the Arxiv version.

📷 Introduction

TL;DR: We propose ShotStream, a novel causal multi-shot architecture that enables interactive storytelling and efficient on-the-fly frame generation, achieving 16 FPS on a single NVIDIA GPU.

Please watch more video results in our Project Page.

demo.mp4

⚙️ Code: ShotStream + Wan2.1-T2V-1.3B

Inference

1. Environment Setup

Create a conda environment and install dependencies:

git clone https://github.com/KlingAIResearch/ShotStream.git
cd ShotStream
conda create -n shotstream python=3.10 -y
conda activate shotstream
conda install nvidia/label/cuda-12.4.1::cuda
conda install -c nvidia/label/cuda-12.4.1 cudatoolkit
pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Or directly:

bash tools/setup/env.sh

2. Download Checkpoints

Download the checkpoints of Wan-T2V-1.3B and ShotStream:

apt-get install git-lfs
git-lfs install
git clone https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B wan_models
git clone https://huggingface.co/KlingTeam/ShotStream ckpts

Or directly:

bash tools/setup/download_ckpt.sh

3. Run Inference

Autoregressive 4-step Long Multi-Shot Video Generation:

Note: Due to company policy restrictions, the prompts in these demo cases exhibit a distribution shift compared to those used during our original training and inference phases.

bash tools/inference/causal_fewsteps.sh

Training

Note:

You need to update MASTER_ADDR in all bash files with the main node's IP address. For multi-node training, the NNODES variable also needs to be modified accordingly.

The multi-shot video example provided is sourced from a public dataset for demonstration purposes. Its captions differ from those used in our actual training set.

Step 1: Bidirectional Next-Shot Teacher Model Training

Single node:

bash tools/train/1_basemodel.sh 0

Multi-nodes:

# Run this command on node 0 (main node)
bash tools/train/1_basemodel.sh 0
# Run this command on node 1 (worker node)
bash tools/train/1_basemodel.sh 1
...

Step 2: Causal Student Model Distillation

Step 2.1: Causal Adaptation Initialization

Following CausVid, we initialize the causal student with the bidirectional teacher's weights. Training all parameters on 5K teacher ODE solution pairs aligns their trajectories, bridging the architectural gap and stabilizing subsequent distillation.

Step 2.1.1: Get ODE Pairs from Teacher

python Teacher_Ode_Sample.py \
  --ckpt_dir ckpts/bidirectional_teacher.pt \
  --save_dir demo/data/ode_sample \
  --data_csv_path demo/data/sample.csv

Step 2.1.2: Get ODE Pairs CSV

python get_ode_csv.py \
    -i demo/data/ode_sample \
    -o demo/data/ode_sample.csv

Step 2.1.3: Causal Initialization

Single node:

bash tools/train/2_ode_init.sh 0

Multi-nodes:

# Run this command on node 0 (main node)
bash tools/train/2_ode_init.sh 0
# Run this command on node 1 (worker node)
bash tools/train/2_ode_init.sh 1
...

Step 2.2: Two-stage Causal Distillation

Step 2.2.1: Intra-shot Self-forcing Distillation

Single node:

bash tools/train/3_dmd.sh 0

Multi-nodes:

# Run this command on node 0 (main node)
bash tools/train/3_dmd.sh 0
# Run this command on node 1 (worker node)
bash tools/train/3_dmd.sh 1
...

Step 2.2.2: Inter-shot Self-forcing Distillation

Single node:

bash tools/train/4_dmd_long.sh 0

Multi-nodes:

# Run this command on node 0 (main node)
bash tools/train/4_dmd_long.sh 0
# Run this command on node 1 (worker node)
bash tools/train/4_dmd_long.sh 1
...

🌟 Citation

Please leave us a star 🌟 and cite our paper if you find our work helpful.

@article{luo2026shotstream,
  title={ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling},
  author={Luo, Yawen and Shi, Xiaoyu and Zhuang, Junhao and Chen, Yutian and Liu, Quande and Wang, Xintao and Wan, Pengfei and Xue, Tianfan},
  journal={arXiv preprint arXiv:2603.25746},
  year={2026}
}

🤗 Acknowledgement

CausVid: the distillation procedure we built upon. Thanks for their wonderful work.
Self Forcing: the distillation procedure we built upon. Thanks for their wonderful work.
LongLive: the distillation procedure we built upon. Thanks for their wonderful work.
Wan: the base model we built upon. Thanks for their wonderful work.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
demo		demo
figs		figs
model		model
pipeline		pipeline
tools		tools
trainer		trainer
utils		utils
wan		wan
.gitignore		.gitignore
Inference_Causal.py		Inference_Causal.py
README.md		README.md
Teacher_Ode_Sample.py		Teacher_Ode_Sample.py
get_ode_csv.py		get_ode_csv.py
merge_lora_checkpoint.py		merge_lora_checkpoint.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Streaming Multi-Shot Video Generation for Interactive Storytelling

📋 Table of Contents

🔥 Updates

📷 Introduction

⚙️ Code: ShotStream + Wan2.1-T2V-1.3B

Inference

1. Environment Setup

2. Download Checkpoints

3. Run Inference

Training

Step 1: Bidirectional Next-Shot Teacher Model Training

Step 2: Causal Student Model Distillation

Step 2.1: Causal Adaptation Initialization

Step 2.1.1: Get ODE Pairs from Teacher

Step 2.1.2: Get ODE Pairs CSV

Step 2.1.3: Causal Initialization

Step 2.2: Two-stage Causal Distillation

Step 2.2.1: Intra-shot Self-forcing Distillation

Step 2.2.2: Inter-shot Self-forcing Distillation

🌟 Citation

🤗 Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Streaming Multi-Shot Video Generation for Interactive Storytelling

📋 Table of Contents

🔥 Updates

📷 Introduction

⚙️ Code: ShotStream + Wan2.1-T2V-1.3B

Inference

1. Environment Setup

2. Download Checkpoints

3. Run Inference

Training

Step 1: Bidirectional Next-Shot Teacher Model Training

Step 2: Causal Student Model Distillation

Step 2.1: Causal Adaptation Initialization

Step 2.1.1: Get ODE Pairs from Teacher

Step 2.1.2: Get ODE Pairs CSV

Step 2.1.3: Causal Initialization

Step 2.2: Two-stage Causal Distillation

Step 2.2.1: Intra-shot Self-forcing Distillation

Step 2.2.2: Inter-shot Self-forcing Distillation

🌟 Citation

🤗 Acknowledgement

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages