Yuliang Guo1*† · Sparsh Garg2† · S. Mahdi H. Miangoleh3 · Xinyu Huang1 · Liu Ren1
1Bosch Research North America 2Carnegie Mellon University 3Simon Fraser University
*corresponding author †equal technical contribution
Depth Any Camera (DAC) is a zero-shot metric depth estimation framework that extends a perspective-trained model to handle any type of camera with varying FoVs effectively.
Notably, DAC can be trained exclusively on perspective images, yet it generalizes seamlessly to fisheye and 360 cameras without requiring specialized training data. Key features include:
- Zero-shot metric depth estimation on fisheye and 360 images, significantly outperforming prior metric depth SoTA Metric3D-v2 and UniDepth.
- Geometry-based training framework adaptable to any network architecture, extendable to other 3D perception tasks.
Tired of collecting new data for specific cameras? DAC maximizes the utility of every existing 3D data for training, regardless of the specific camera types used in new applications.
The zero-shot metric depth estimation results of Depth Any Camera (DAC) are visualized on ScanNet++ fisheye videos and compared to Metric3D-v2. The visualizations of A.Rel error against ground truth highlight the superior performance of DAC.
Additionally, we showcase DAC's application on 360-degree images, where a single forward pass of depth estimation enables full 3D scene reconstruction.
Additional visual results and comparison with the prior SoTA can be found at
Depth Any Camera performs significantly better than the previous SoTA metric depth estimation models Metric3D-v2 and UniDepth in zero-shot generalization to large FoV camera images given significantly smaller training dataset and model size.
Method | Training Data Size | Matterport3D (360) | Pano3D-GV2 (360) | ScanNet++ (fisheye) | KITTI360 (fisheye) | ||||
---|---|---|---|---|---|---|---|---|---|
AbsRel | AbsRel | AbsRel | AbsRel | ||||||
UniDepth-VitL | 3M | 0.7648 | 0.2576 | 0.7892 | 0.2469 | 0.4971 | 0.3638 | 0.2939 | 0.4810 |
Metric3D-v2-VitL | 16M | 0.2924 | 0.4381 | 0.3070 | 0.4040 | 0.2229 | 0.5360 | 0.1997 | 0.7159 |
Ours-Resnet101 | 670K-indoor / 130K-outdoor | 0.156 | 0.7727 | 0.1387 | 0.8115 | 0.1323 | 0.8517 | 0.1559 | 0.7858 |
Ours-SwinL | 670K-indoor / 130K-outdoor | 0.1789 | 0.7231 | 0.1836 | 0.7287 | 0.1282 | 0.8544 | 0.1487 | 0.8222 |
We highlight the best and second best results in bold and italic respectively (better results: AbsRel
- Release of pre-trained DepthAnyCamera (DAC) models trained on moderately sized datasets.
- Demo code for easy setup and usage.
- Testing and evaluation pipeline for zero-shot metric depth estimation on perspective, fisheye, and 360-degree datasets.
- Complete DepthAnyCamera (DAC) training pipeline using mixed perspective camera data.
- Complete data preparation and curation scripts.
- [TBD] Foundation-level model trained on a large-scale, diverse dataset mixture, encompassing perspective, fisheye, and 360-degree camera data.
git clone https://github.com/yuliangguo/depth_any_camera
cd depth_any_camera
This repository can be run from within Docker, as long as the NVIDIA Container Toolkit is properly configured. For Ubuntu Installation steps, refer to this guide.
# Build the container
docker build -t dac:latest .
# Enter the container
docker run --gpus all --network host -v $(pwd):/depth_any_camera --rm -it dac /bin/bash
# Once within the container,
# source the post-entry-hooks.sh to finish the install.
source post-entry-hooks.sh
Alternatively, this repository can be run from within Conda alone.
conda create -n dac python=3.9 -y
conda activate dac
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip install -r requirements.txt
export PYTHONPATH="$PWD:$PYTHONPATH"
cd dac/models/ops/
pip install -e .
cd ../../../
Our current training set is very slim compared to prior fundation models. Currently, DAC is trained on a combination set of 3 labeled datasets (670k images) for indoor model and a combination of 2 datasets (130k) for outdoor model. Two 360 datasets and two fisheye datasets are used for zero-shot testing.
Please refer to DATA.md for detailed datasets preparation. Make sure the relative paths of datasets have been set correctly before proceeding to the actual testing and training sections.
We provide two indoor models and two outdoor modeling considering Resnet101 and SwinTransformer-Large (SwinL) as backbones. In addition, we also provide two weaker baseline models for comparison. The download links can be found in the following table or from . We suggest to save download both the model configs and model weights at checkpoints
in order to run our scripts directly.
Model Name | Training Datasets | Model Configs | Weights |
---|---|---|---|
dac-indoor-resnet101 (ours) | indoor mix 670k | huggingface | huggingface |
dac-indoor-swinL (ours) | indoor mix 670k | huggingface | huggingface |
dac-outdoor-resnet101 (ours) | outdoor mix 130k | huggingface | huggingface |
dac-outdoor-swinL (ours) | outdoor mix 130k | huggingface | huggingface |
idisc-metric3d-indoor-resnet101 (weak baseline 1) | indoor mix 670k | huggingface | huggingface |
cnndepth-metric3d-indoor-resnet101 (weak baseline 2) | indoor mix 670k | huggingface | huggingface |
We have provided a ready-to-run demo scripts in the demo
folder. demo/demo_dac_indoor.py
demonstrates how to perform inference on various types of camera data, including ScanNet++ (fisheye), Matterport3D (360), and NYU (perspective), using a single metric depth model trained on perspective images. The code generates point cloud files *.ply
and visualization results as shown below:
demo/demo_dac_outdoor.py
similarly demonstrates how a single outdoor model handle different types of camera data, including kitti (perspective) and kitti360 (fisheye).
Instead, we also provide demo script for dealing one sample, you may follow the following example command:
python demo/demo_dac_single.py --config-file checkpoints/dac_swinl_indoor.json --model-file checkpoints/dac_swinl_indoor.pt --sample-file demo/input/scannetpp_sample.json --out-dir demo/output
Given provided pretrained models saved in checkpoints/
, the following code can be run to test and evaluate on certain dataset, e.g., ScanNet++:
python script/test_dac.py --model-file checkpoints/dac_swinl_indoor.pt --model-name IDiscERP --config-file configs/test/dac_swinl_indoor_test_scannetpp.json --base-path datasets --vis
Different config files for testing all the reported datasets are included in configs/test. Interested users could also refer to the provided lauch.json for convinient use or debug provided testing scripts in VSCode. The following tables lay out those most relative ones.
Testing dataset | Testing script | --model-file | --config-file | --model-name |
---|---|---|---|---|
Matterport | scripts/test_dac.py | checkpoints/dac-indoor-resnet101.pt | relative path | IDiscERP |
Gibson-V2 | ^ | ^ | relative path | IDiscERP |
ScanNet++ | ^ | ^ | relative path | IDiscERP |
NYU | ^ | ^ | relative path | IDiscERP |
KITTI360 | ^ | checkpoints/dac-outdoor-resnet101.pt | relative path | IDisc |
KITTI | ^ | ^ | relative path | IDisc |
... | scripts/test_persp.py | checkpoints/idisc-... | ... | IDisc |
... | ^ | checkpoints/cnndepth-... | ... | CNNDepth |
Note: IDiscERP is our modified version of the IDisc model, incorporating isolated image and positional encoding features. It has been observed to improve results in small-size data training, particularly for better depth-scale equivariance. However, these modifications are not essential for large dataset training. CNNDepth refers to the CNN portion of the IDisc model, which serves as a network baseline but consistently underperforms compared to other models.
The ResNet101 models and configuration files can be replaced with the corresponding Swin-L versions. Ensure that the --model-name
parameter matches the type of trained model. For users interested in comparing our DAC framework with the Metric3D training framework, we have provided pre-trained weak baselines along with their testing scripts, as detailed in the last two rows of the table.
To train metric depth estimation models under the DepthAnyCamera (DAC) framework, you can run the default code for indoor training datasets as follows:
python scripts/train_dac.py --config-file configs/train/hm3d+taskonomy+hypersim/hm3d+taskonomy+hypersim_dac_r101.json --base-path datasets --distributed --model-name IDiscERP
If you wish to train with a larger backbone, use the following command:
python scripts/train_dac_large.py --config-file configs/train/hm3d+taskonomy+hypersim/hm3d+taskonomy+hypersim_dac_swinl_s2.json --base-path datasets --distributed --model-name IDiscERP
For users interested in comparing our DAC framework with the Metric3D training framework, the following command can be used:
python scripts/train_persp.py --config-file configs/train/hm3d+taskonomy+hypersim/hm3d+taskonomy+hypersim_r101.json --base-path datasets --distributed --model-name IDisc
The corresponding testing script can be found at scripts/test_persp.py.
Similar commands apply to outdoor model training. There are various options available depending on the dataset or architecture. Interested users can refer to the table below for basic usage or consult the provided launch.json for convenient use or debugging in VSCode. We also provide all the training configurations we’ve used in configs/train.
Training Target | Training script | --config-file | --model-name |
---|---|---|---|
dac-indoor-resnet101 | scripts/train_dac.py | relative path | IDiscERP or IDisc or CNNDepth |
dac-indoor-swinl | scripts/train_dac_large.py | relative path | IDiscERP or IDisc or CNNDepth |
dac-outdoor-resnet101 | scripts/train_dac.py | relative path | IDiscERP or IDisc or CNNDepth |
dac-outdoor-swinl | scripts/train_dac_large.py | relative path | IDiscERP or IDisc or CNNDepth |
metric3d-indoor-resnet101 | scripts/train_persp.py | relative path | IDisc or CNNDepth |
metric3d-indoor-swinl | scripts/train_persp.py | relative path | IDisc or CNNDepth |
metric3d-outdoor-resnet101 | scripts/train_persp.py | relative path | IDisc or CNNDepth |
metric3d-outdoor-swinl | scripts/train_persp.py | relative path | IDisc or CNNDepth |
We thank the authors of the following awesome codebases:
Please also consider citing them.
This software is released under MIT license. You can view a license summary here.
If you find our work useful in your research please consider citing our publication:
@inproceedings{Guo2025DepthAnyCamera,
title={Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera},
author={Yuliang Guo and Sparsh Garg and S. Mahdi H. Miangoleh and Xinyu Huang and Liu Ren},
booktitle={arXiv},
year={2025}
}