Skip to content

fantasybarry/BEV

Repository files navigation

BEV - Bird's Eye View Generation Pipeline

Integration of Depth Anything V2, Omni3D, and YOLOPv2 for robust Bird's Eye View space generation from monocular video.

Paper: [Coming Soon - Title TBD] ArXiv: [Link will be added upon publication] Project Page: [Link will be added]

Demo


Overview

This project combines state-of-the-art computer vision models to create accurate bird's eye view representations from standard camera footage. The pipeline integrates:

  • Depth Anything V2: Monocular depth estimation for metric 3D scene understanding
  • Omni3D: 3D object detection and localization in the wild
  • YOLOPv2: Multi-task panoptic driving perception (lane detection, drivable area segmentation, object detection)

Features

  • Real-time and batch video processing
  • GPS data integration for georeferenced outputs
  • Frame extraction and video reconstruction
  • BEV projection with depth-aware transformations
  • Overlay generation combining multiple perception modalities

Project Structure

BEV/
├── OmniLineDepth.py         # Inference Pipeline
├── realTimeOmni.py          # Real-time BEV generation
│
├── DepthAnythingV2/         # Depth Anything V2 model integration
├── omni3d/                  # Omni3D 3D object detection
├── YOLOPv2/                 # YOLOPv2 driving perception
├── IMG_GPS/                 # GPS data processing
├── overlay/                 # Overlay generation
├── projection/              # Geometric projection utilities
│
└── demoImages/              # Demo Images for inferencing

Installation

We recommend using Python 3.9 for stability and dependencies compatibility:

# Clone repository
git clone https://github.com/fantasybarry/BEV.git
cd BEV

# Install dependencies
pip install -r requirements.txt

Note: Update the repository URL once published.

Usage

Basic Processing

# Run an example inference demo on our demo Images
python OmniLineDepth.py \
    --config-file cubercnn://outdoor/cubercnn_DLA34_FPN.yaml \
    --input-folder "demoGPS" \
    --source "demoGPS" \
    --threshold 0.50  \
    --launch-app \
    MODEL.WEIGHTS cubercnn://outdoor/cubercnn_DLA34_FPN.pth \
    OUTPUT_DIR output/demo

Sample Data

  • demoGPS/ - demo Images
  • overlay/gps/gps_data_sample.json - demo GPS coordinates

Output

The pipeline generates:

  • Bird's eye view projections
  • Depth maps
  • 3D object detections
  • Lane and drivable area segmentation
  • GPS-referenced overlays

License

This project is licensed under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) due to the licensing requirements of integrated components. See LICENSE for full details.

Citations

Citing This Work

If you find our work is useful for your research, please kindly cite our paper:

@article{tan2025bev,
  title={[Monocular 3D Perception and Lane-Aware Bird’s-Eye-View Mapping for Autonomous Driving]},
  author={Tan, Lin and Wang, Hanchen and Li, Taozhe and Hajnorouzali, Yasaman and Burch, Collin and Lee, Victoria and Xu, Bin and Arjmanzdadeh, Ziba},
  journal={[arxiv: TBA]},
  year={2025}
}

Citing Integrated Models

This work builds upon the following models. Please also cite their papers:

Paper Citations

Depth Anything V2:

@inproceedings{depthanythingv2,
  title={Depth Anything V2},
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  journal={arXiv:2406.09414},
  year={2024}
}

Omni3D:

@inproceedings{brazil2023omni3d,
  author =       {Garrick Brazil and Abhinav Kumar and Julian Straub and Nikhila Ravi and Justin Johnson and Georgia Gkioxari},
  title =        {{Omni3D}: A Large Benchmark and Model for {3D} Object Detection in the Wild},
  booktitle =    {CVPR},
  address =      {Vancouver, Canada},
  month =        {June},
  year =         {2023},
  organization = {IEEE},
}

YOLOPv2:

@article{han2022yolopv2,
  title={YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception},
  author={Han, Cheng and Zhao, Qichao and Zhang, Shuyi and Chen, Yinzi and Zhang, Zhenlin and Yuan, Jinwei},
  journal={arXiv preprint arXiv:2208.11434},
  year={2022}
}

Code Citations

Acknowledgments

This project builds upon the excellent work of multiple research teams:

  • The Depth Anything V2 team (Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao) for their robust monocular depth estimation model presented at NeurIPS 2024

  • Meta AI Research (FAIR) team (Garrick Brazil, Abhinav Kumar, Julian Straub, Nikhila Ravi, Justin Johnson, Georgia Gkioxari) for the Omni3D benchmark and 3D detection framework presented at CVPR 2023

  • The YOLOPv2 authors (Cheng Han, Qichao Zhao, Shuyi Zhang, Yinzi Chen, Zhenlin Zhang, Jinwei Yuan) for their efficient multi-task panoptic driving perception system

We are grateful for their contributions to the computer vision and autonomous driving research communities, and for making their code publicly available.

About

Integration of Depth Anything V2, Omni3d, YOLOPv2 for Robust BEV Space Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors