Integration of Depth Anything V2, Omni3D, and YOLOPv2 for robust Bird's Eye View space generation from monocular video.
Paper: [Coming Soon - Title TBD] ArXiv: [Link will be added upon publication] Project Page: [Link will be added]
This project combines state-of-the-art computer vision models to create accurate bird's eye view representations from standard camera footage. The pipeline integrates:
- Depth Anything V2: Monocular depth estimation for metric 3D scene understanding
- Omni3D: 3D object detection and localization in the wild
- YOLOPv2: Multi-task panoptic driving perception (lane detection, drivable area segmentation, object detection)
- Real-time and batch video processing
- GPS data integration for georeferenced outputs
- Frame extraction and video reconstruction
- BEV projection with depth-aware transformations
- Overlay generation combining multiple perception modalities
BEV/
├── OmniLineDepth.py # Inference Pipeline
├── realTimeOmni.py # Real-time BEV generation
│
├── DepthAnythingV2/ # Depth Anything V2 model integration
├── omni3d/ # Omni3D 3D object detection
├── YOLOPv2/ # YOLOPv2 driving perception
├── IMG_GPS/ # GPS data processing
├── overlay/ # Overlay generation
├── projection/ # Geometric projection utilities
│
└── demoImages/ # Demo Images for inferencing
We recommend using Python 3.9 for stability and dependencies compatibility:
# Clone repository
git clone https://github.com/fantasybarry/BEV.git
cd BEV
# Install dependencies
pip install -r requirements.txtNote: Update the repository URL once published.
# Run an example inference demo on our demo Images
python OmniLineDepth.py \
--config-file cubercnn://outdoor/cubercnn_DLA34_FPN.yaml \
--input-folder "demoGPS" \
--source "demoGPS" \
--threshold 0.50 \
--launch-app \
MODEL.WEIGHTS cubercnn://outdoor/cubercnn_DLA34_FPN.pth \
OUTPUT_DIR output/demodemoGPS/- demo Imagesoverlay/gps/gps_data_sample.json- demo GPS coordinates
The pipeline generates:
- Bird's eye view projections
- Depth maps
- 3D object detections
- Lane and drivable area segmentation
- GPS-referenced overlays
This project is licensed under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) due to the licensing requirements of integrated components. See LICENSE for full details.
If you find our work is useful for your research, please kindly cite our paper:
@article{tan2025bev,
title={[Monocular 3D Perception and Lane-Aware Bird’s-Eye-View Mapping for Autonomous Driving]},
author={Tan, Lin and Wang, Hanchen and Li, Taozhe and Hajnorouzali, Yasaman and Burch, Collin and Lee, Victoria and Xu, Bin and Arjmanzdadeh, Ziba},
journal={[arxiv: TBA]},
year={2025}
}This work builds upon the following models. Please also cite their papers:
Depth Anything V2:
@inproceedings{depthanythingv2,
title={Depth Anything V2},
author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
journal={arXiv:2406.09414},
year={2024}
}Omni3D:
@inproceedings{brazil2023omni3d,
author = {Garrick Brazil and Abhinav Kumar and Julian Straub and Nikhila Ravi and Justin Johnson and Georgia Gkioxari},
title = {{Omni3D}: A Large Benchmark and Model for {3D} Object Detection in the Wild},
booktitle = {CVPR},
address = {Vancouver, Canada},
month = {June},
year = {2023},
organization = {IEEE},
}YOLOPv2:
@article{han2022yolopv2,
title={YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception},
author={Han, Cheng and Zhao, Qichao and Zhang, Shuyi and Chen, Yinzi and Zhang, Zhenlin and Yuan, Jinwei},
journal={arXiv preprint arXiv:2208.11434},
year={2022}
}- Depth Anything V2: https://github.com/DepthAnything/Depth-Anything-V2
- Omni3D: https://github.com/facebookresearch/omni3d
- YOLOPv2: https://github.com/CAIC-AD/YOLOPv2
This project builds upon the excellent work of multiple research teams:
-
The Depth Anything V2 team (Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao) for their robust monocular depth estimation model presented at NeurIPS 2024
-
Meta AI Research (FAIR) team (Garrick Brazil, Abhinav Kumar, Julian Straub, Nikhila Ravi, Justin Johnson, Georgia Gkioxari) for the Omni3D benchmark and 3D detection framework presented at CVPR 2023
-
The YOLOPv2 authors (Cheng Han, Qichao Zhao, Shuyi Zhang, Yinzi Chen, Zhenlin Zhang, Jinwei Yuan) for their efficient multi-task panoptic driving perception system
We are grateful for their contributions to the computer vision and autonomous driving research communities, and for making their code publicly available.
