GitHub - EvolvingLMMs-Lab/SimpleStream: A simple video streaming baseline that outperforms SOTAs.

A Simple Baseline for Streaming Video Understanding

A sliding-window baseline that feeds only the most recent N frames to an off-the-shelf VLM matches or surpasses published streaming video understanding models. No memory bank, no retrieval, no compression.

🚀 News

📄 [2026/04] SimpleStream paper is released.

💻 [2026/04] Code and evaluation scripts are open-sourced.

✨ Highlights

Simple yet strong. With only 4 recent frames, SimpleStream reaches 67.7% on OVO-Bench and 80.59% on StreamingBench, surpassing all published streaming methods.
Perception-memory trade-off. Adding historical context improves recall but consistently degrades real-time perception, which dominates aggregate scores.
Training-free. SimpleStream uses off-the-shelf VLMs (Qwen2.5-VL, Qwen3-VL) with zero fine-tuning.

📊 Main Results

🛠️ Getting Started

Environment Setup

conda create -n simplestream python=3.10 -y
conda activate simplestream
pip install -r requirements.txt
# Optional: faster attention backend
pip install flash-attn --no-build-isolation

Models

Downloaded automatically from HuggingFace on first run:

Qwen/Qwen3-VL-8B-Instruct (primary)
Qwen/Qwen2.5-VL-7B-Instruct (cross-validation)

Data Preparation

OVO-Bench: Download from OVO-Bench. Place annotations at data/ovo_bench/ovo_bench_new.json and chunked videos at data/ovo_bench/chunked_videos/.
StreamingBench: Download from StreamingBench. Place questions at data/streamingbench/questions_real.json and videos at data/streamingbench/videos/.

🧪 Experiments

Qwen3-VL on OVO-Bench

CUDA_VISIBLE_DEVICES=0,1 accelerate launch --num_processes=2 \
    main_experiments/eval_qwen3vl_ovo.py \
    --model_path Qwen/Qwen3-VL-8B-Instruct \
    --anno_path data/ovo_bench/ovo_bench_new.json \
    --chunked_dir data/ovo_bench/chunked_videos \
    --result_dir main_experiments/results/ovo_qwen3vl_recent8 \
    --recent_frames_only 8 \
    --chunk_duration 1.0 \
    --fps 1.0

Or use the convenience launcher for 4-GPU:

bash main_experiments/run_qwen3vl_ovo_4gpu.sh

Qwen2.5-VL on OVO-Bench

CUDA_VISIBLE_DEVICES=0,1 accelerate launch --num_processes=2 \
    main_experiments/eval_qwen25vl_ovo.py \
    --model_path Qwen/Qwen2.5-VL-7B-Instruct \
    --anno_path data/ovo_bench/ovo_bench_new.json \
    --chunked_dir data/ovo_bench/chunked_videos \
    --result_dir main_experiments/results/ovo_qwen25vl_recent8 \
    --recent_frames_only 8 \
    --chunk_duration 1.0 \
    --fps 1.0

StreamingBench

--top-k 0 disables retrieval and keeps only the most recent chunks.

CUDA_VISIBLE_DEVICES=0 python main_experiments/eval_streamingbench.py \
    --anno-path data/streamingbench/questions_real.json \
    --video-dir data/streamingbench/videos \
    --top-k 0 \
    --recent-frames-only 4 \
    --chunk-duration 1.0 \
    --fps 1.0 \
    --output-dir main_experiments/results/streamingbench_recent4

Efficiency Benchmark

Measures TTFT, throughput, and memory from a user-provided source video.

CUDA_VISIBLE_DEVICES=0 python efficiency/eval_efficiency.py \
    --source-video /path/to/your/source_video.mp4 \
    --model-name Qwen/Qwen2.5-VL-7B-Instruct \
    --chunk-size 8 \
    --recent-frames 4

Scoring

python scoring/score_ovo_bench.py \
    --result_path main_experiments/results/ovo_qwen3vl_recent8/qwen3vl_results_*.json

📢 Citation

If you find this work useful, please consider citing our paper:

@article{simplestream2026,
  title={A Simple Baseline for Streaming Video Understanding},
  author={Shen, Yujiao and Tian, Shulin and Yang, Jingkang and Liu, Ziwei},
  journal={arXiv preprint arXiv:2604.02317},
  year={2026}
}

🙏 Acknowledgement

Qwen-VL: the VLM backbone used in our experiments.
OVO-Bench and StreamingBench: the evaluation benchmarks.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
efficiency		efficiency
lib		lib
main_experiments		main_experiments
scoring		scoring
.gitignore		.gitignore
README.md		README.md
ovo_constants.py		ovo_constants.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Simple Baseline for Streaming Video Understanding

🚀 News

✨ Highlights

📊 Main Results

🛠️ Getting Started

Environment Setup

Models

Data Preparation

🧪 Experiments

📢 Citation

🙏 Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A Simple Baseline for Streaming Video Understanding

🚀 News

✨ Highlights

📊 Main Results

🛠️ Getting Started

Environment Setup

Models

Data Preparation

🧪 Experiments

📢 Citation

🙏 Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages