MVP: Motion Vector Propagation for Zero-Shot Video Object Detection

A novel approach for efficient zero-shot object detection in videos by leveraging motion vectors from compressed video streams combined with OWL-ViT for object detection.

🎯 Overview

MVP (Motion Vector Propagation) is a zero-shot object detection framework that combines:

Motion Vector Analysis: Extracts and analyzes motion vectors from H.264/MPEG-4 compressed videos
OWL-ViT Integration: Uses OWL-ViT for zero-shot object detection on keyframes
Intelligent Propagation: Propagates detections across frames using motion vector analysis
Efficient Tracking: Reduces computational overhead by avoiding full detection on every frame

✨ Key Features

Motion Vector Processing: Efficient extraction and analysis of motion vectors from compressed videos
Zero-shot Detection: OWL-ViT based zero-shot object detection capabilities
Smart Propagation: 9-grid motion vector analysis for accurate object tracking
Adaptive Strategy: Dynamic switching between detection and propagation modes
Performance Optimization: Significant speedup compared to frame-wise detection

🏗️ Architecture

The framework consists of three main components:

Motion Vector Extractor: Extracts motion vectors, frame types, and timestamps from compressed videos
OWL-ViT Detector: Performs zero-shot object detection on keyframes and when needed
Motion Propagation: Uses 9-grid analysis to propagate detections across frames

📋 Requirements

torch>=1.9.0
torchvision>=0.10.0
transformers>=4.20.0
opencv-python>=4.5.0
numpy>=1.21.0
matplotlib>=3.5.0
tqdm>=4.64.0
PIL>=8.3.0
ffmpeg-python>=0.2.0

🚀 Installation

Clone the repository:

git clone https://github.com/microa/MVP.git
cd MVP

Install dependencies:

pip install -r requirements.txt

Install motion vector extractor:

cd mv-extractor
pip install -e .

📊 Dataset Preparation

Prepare your video dataset in the following structure:

data/
├── videos/           # Video files (.mp4)
├── motion_vectors/   # Extracted motion vectors
│   └── video_name/
│       ├── frame_types.txt
│       ├── timestamps.txt
│       └── motion_vectors/
│           ├── mvs-0.npy
│           ├── mvs-1.npy
│           └── ...
└── ground_truth/     # Ground truth annotations (JSON format)

Extract motion vectors from videos:

python utils/extract_motion_vectors.py --video_dir data/videos --output_dir data/motion_vectors

🔧 Usage

Basic Usage

from src.mvp_detector import MVPDetector

# Initialize detector
detector = MVPDetector(
    model_id="google/owlv2-large-patch14-ensemble",
    device="cuda"
)

# Process video
results = detector.process_video(
    video_path="path/to/video.mp4",
    motion_vector_dir="path/to/motion_vectors",
    output_dir="path/to/output"
)

Command Line Interface

# Process single video
python src/main.py --video_path data/videos/video.mp4 --mv_dir data/motion_vectors/video_name --output_dir results/

# Process entire dataset
python src/main.py --video_dir data/videos --mv_root data/motion_vectors --output_root results/

Evaluation

# Evaluate on dataset
python evaluation/evaluate.py --pred_dir results/ --gt_dir data/ground_truth --output_dir evaluation_results/

📁 Project Structure

MVP/
├── src/                    # Core source code
│   ├── mvp_detector.py    # Main MVP detector class
│   ├── motion_analyzer.py # Motion vector analysis
│   ├── owl_detector.py    # OWL-ViT integration
│   └── main.py            # Command line interface
├── configs/               # Configuration files
│   ├── default.yaml      # Default configuration
│   └── imagenet_vid.yaml # ImageNet VID specific config
├── evaluation/            # Evaluation scripts
│   ├── evaluate.py       # Main evaluation script
│   └── metrics.py        # Evaluation metrics
├── utils/                 # Utility functions
│   ├── extract_motion_vectors.py
│   ├── visualization.py
│   └── data_utils.py
├── examples/              # Usage examples
│   ├── basic_usage.py
│   └── custom_dataset.py
├── mv-extractor/          # Motion vector extraction tool
├── docs/                  # Documentation
├── requirements.txt       # Python dependencies
└── README.md             # This file

🎨 Visualization

The framework includes tools for visualizing motion vectors and detection results:

# Visualize motion vectors
python utils/visualization.py --mv_dir data/motion_vectors/video_name --output_dir visualizations/

# Visualize detection results
python utils/visualization.py --results_dir results/video_name --output_dir visualizations/

📈 Results

The MVP framework achieves competitive performance on ImageNet VID dataset with significant speedup:

[email protected]: 0.747
[email protected]: 0.721
[email protected]: 0.609
mAP@[0.5:0.95]: 0.316

📝 Citation

If you use this code in your research, please cite our paper:

@article{huang2025mvp,
  title={MVP: Motion Vector Propagation for Zero-Shot Video Object Detection},
  author={Huang, Binhua and Wang, Ni and Yao, Wendong and Dev, Soumyabrata},
  journal={arXiv preprint arXiv:2509.18388},
  year={2025}
}

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

ImageNet VID dataset creators
OWL-ViT model by Google Research
Motion vector extraction tool by LukasBommes
PyTorch and the open-source community

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
evaluation		evaluation
examples		examples
src		src
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
install.bat		install.bat
install.sh		install.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MVP: Motion Vector Propagation for Zero-Shot Video Object Detection

🎯 Overview

✨ Key Features

🏗️ Architecture

📋 Requirements

🚀 Installation

📊 Dataset Preparation

🔧 Usage

Basic Usage

Command Line Interface

Evaluation

📁 Project Structure

🎨 Visualization

📈 Results

📝 Citation

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

microa/MVP

Folders and files

Latest commit

History

Repository files navigation

MVP: Motion Vector Propagation for Zero-Shot Video Object Detection

🎯 Overview

✨ Key Features

🏗️ Architecture

📋 Requirements

🚀 Installation

📊 Dataset Preparation

🔧 Usage

Basic Usage

Command Line Interface

Evaluation

📁 Project Structure

🎨 Visualization

📈 Results

📝 Citation

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages