Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera

Yuliang Guo^1*† · Sparsh Garg^2† · S. Mahdi H. Miangoleh³ · Xinyu Huang¹ · Liu Ren¹

¹Bosch Research North America ²Carnegie Mellon University ³Simon Fraser University

*corresponding author †equal technical contribution

Depth Any Camera (DAC) is a zero-shot metric depth estimation framework that extends a perspective-trained model to handle any type of camera with varying FoVs effectively.

Notably, DAC can be trained exclusively on perspective images, yet it generalizes seamlessly to fisheye and 360 cameras without requiring specialized training data. Key features include:

Zero-shot metric depth estimation on fisheye and 360 images, significantly outperforming prior metric depth SoTA Metric3D-v2 and UniDepth.
Geometry-based training framework adaptable to any network architecture, extendable to other 3D perception tasks.

Tired of collecting new data for specific cameras? DAC maximizes the utility of every existing 3D data for training, regardless of the specific camera types used in new applications.

Visualization

ScanNet++ fisheye

The zero-shot metric depth estimation results of Depth Any Camera (DAC) are visualized on ScanNet++ fisheye videos and compared to Metric3D-v2. The visualizations of A.Rel error against ground truth highlight the superior performance of DAC.

Matterport3D single-view reconstruction

Additionally, we showcase DAC's application on 360-degree images, where a single forward pass of depth estimation enables full 3D scene reconstruction.

Additional visual results and comparison with the prior SoTA can be found at

Performance

Depth Any Camera performs significantly better than the previous SoTA metric depth estimation models Metric3D-v2 and UniDepth in zero-shot generalization to large FoV camera images given significantly smaller training dataset and model size.

Method	Training Data Size	Matterport3D (360)		Pano3D-GV2 (360)		ScanNet++ (fisheye)		KITTI360 (fisheye)
		AbsRel	$\delta_1$	AbsRel	$\delta_1$	AbsRel	$\delta_1$	AbsRel	$\delta_1$
UniDepth-VitL	3M	0.7648	0.2576	0.7892	0.2469	0.4971	0.3638	0.2939	0.4810
Metric3D-v2-VitL	16M	0.2924	0.4381	0.3070	0.4040	0.2229	0.5360	0.1997	0.7159
Ours-Resnet101	670K-indoor / 130K-outdoor	0.156	0.7727	0.1387	0.8115	0.1323	0.8517	0.1559	0.7858
Ours-SwinL	670K-indoor / 130K-outdoor	0.1789	0.7231	0.1836	0.7287	0.1282	0.8544	0.1487	0.8222

We highlight the best and second best results in bold and italic respectively (better results: AbsRel $\downarrow$ , $\delta_1 \uparrow$).

Catalog

Release of pre-trained DepthAnyCamera (DAC) models trained on moderately sized datasets.
Demo code for easy setup and usage.
Testing and evaluation pipeline for zero-shot metric depth estimation on perspective, fisheye, and 360-degree datasets.
Complete DepthAnyCamera (DAC) training pipeline using mixed perspective camera data.
Complete data preparation and curation scripts.
[TBD] Foundation-level model trained on a large-scale, diverse dataset mixture, encompassing perspective, fisheye, and 360-degree camera data.

Usage

Installation

Clone the Repository

git clone https://github.com/yuliangguo/depth_any_camera
cd depth_any_camera

Docker Installation

This repository can be run from within Docker, as long as the NVIDIA Container Toolkit is properly configured. For Ubuntu Installation steps, refer to this guide.

# Build the container
docker build -t dac:latest .
# Enter the container
docker run --gpus all --network host -v $(pwd):/depth_any_camera --rm -it dac /bin/bash 

# Once within the container, 
#  source the post-entry-hooks.sh to finish the install.
source post-entry-hooks.sh

Conda Installation

Alternatively, this repository can be run from within Conda alone.

conda create -n dac python=3.9 -y
conda activate dac
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116

pip install -r requirements.txt
export PYTHONPATH="$PWD:$PYTHONPATH"
cd dac/models/ops/
pip install -e .
cd  ../../../

Data Preparation

Our current training set is very slim compared to prior fundation models. Currently, DAC is trained on a combination set of 3 labeled datasets (670k images) for indoor model and a combination of 2 datasets (130k) for outdoor model. Two 360 datasets and two fisheye datasets are used for zero-shot testing.

Please refer to DATA.md for detailed datasets preparation. Make sure the relative paths of datasets have been set correctly before proceeding to the actual testing and training sections.

Pre-trained models

We provide two indoor models and two outdoor modeling considering Resnet101 and SwinTransformer-Large (SwinL) as backbones. In addition, we also provide two weaker baseline models for comparison. The download links can be found in the following table or from . We suggest to save download both the model configs and model weights at checkpoints in order to run our scripts directly.

Model Name	Training Datasets	Model Configs	Weights
dac-indoor-resnet101 (ours)	indoor mix 670k	huggingface	huggingface
dac-indoor-swinL (ours)	indoor mix 670k	huggingface	huggingface
dac-outdoor-resnet101 (ours)	outdoor mix 130k	huggingface	huggingface
dac-outdoor-swinL (ours)	outdoor mix 130k	huggingface	huggingface
idisc-metric3d-indoor-resnet101 (weak baseline 1)	indoor mix 670k	huggingface	huggingface
cnndepth-metric3d-indoor-resnet101 (weak baseline 2)	indoor mix 670k	huggingface	huggingface

Demo

We have provided a ready-to-run demo scripts in the demo folder. demo/demo_dac_indoor.py demonstrates how to perform inference on various types of camera data, including ScanNet++ (fisheye), Matterport3D (360), and NYU (perspective), using a single metric depth model trained on perspective images. The code generates point cloud files *.ply and visualization results as shown below:

demo/demo_dac_outdoor.py similarly demonstrates how a single outdoor model handle different types of camera data, including kitti (perspective) and kitti360 (fisheye).

Instead, we also provide demo script for dealing one sample, you may follow the following example command:

python demo/demo_dac_single.py --config-file checkpoints/dac_swinl_indoor.json --model-file checkpoints/dac_swinl_indoor.pt --sample-file demo/input/scannetpp_sample.json --out-dir demo/output

Testing

Given provided pretrained models saved in checkpoints/, the following code can be run to test and evaluate on certain dataset, e.g., ScanNet++:

python script/test_dac.py --model-file checkpoints/dac_swinl_indoor.pt --model-name IDiscERP --config-file configs/test/dac_swinl_indoor_test_scannetpp.json --base-path datasets --vis

Different config files for testing all the reported datasets are included in configs/test. Interested users could also refer to the provided lauch.json for convinient use or debug provided testing scripts in VSCode. The following tables lay out those most relative ones.

Testing dataset	Testing script	--model-file	--config-file	--model-name
Matterport	scripts/test_dac.py	checkpoints/dac-indoor-resnet101.pt	relative path	IDiscERP
Gibson-V2	^	^	relative path	IDiscERP
ScanNet++	^	^	relative path	IDiscERP
NYU	^	^	relative path	IDiscERP
KITTI360	^	checkpoints/dac-outdoor-resnet101.pt	relative path	IDisc
KITTI	^	^	relative path	IDisc
...	scripts/test_persp.py	checkpoints/idisc-...	...	IDisc
...	^	checkpoints/cnndepth-...	...	CNNDepth

Note: IDiscERP is our modified version of the IDisc model, incorporating isolated image and positional encoding features. It has been observed to improve results in small-size data training, particularly for better depth-scale equivariance. However, these modifications are not essential for large dataset training. CNNDepth refers to the CNN portion of the IDisc model, which serves as a network baseline but consistently underperforms compared to other models.

The ResNet101 models and configuration files can be replaced with the corresponding Swin-L versions. Ensure that the --model-name parameter matches the type of trained model. For users interested in comparing our DAC framework with the Metric3D training framework, we have provided pre-trained weak baselines along with their testing scripts, as detailed in the last two rows of the table.

Training

To train metric depth estimation models under the DepthAnyCamera (DAC) framework, you can run the default code for indoor training datasets as follows:

python scripts/train_dac.py --config-file configs/train/hm3d+taskonomy+hypersim/hm3d+taskonomy+hypersim_dac_r101.json --base-path datasets --distributed --model-name IDiscERP

If you wish to train with a larger backbone, use the following command:

python scripts/train_dac_large.py --config-file configs/train/hm3d+taskonomy+hypersim/hm3d+taskonomy+hypersim_dac_swinl_s2.json --base-path datasets --distributed --model-name IDiscERP

For users interested in comparing our DAC framework with the Metric3D training framework, the following command can be used:

python scripts/train_persp.py --config-file configs/train/hm3d+taskonomy+hypersim/hm3d+taskonomy+hypersim_r101.json --base-path datasets --distributed --model-name IDisc

The corresponding testing script can be found at scripts/test_persp.py.

Similar commands apply to outdoor model training. There are various options available depending on the dataset or architecture. Interested users can refer to the table below for basic usage or consult the provided launch.json for convenient use or debugging in VSCode. We also provide all the training configurations we’ve used in configs/train.

Training Target	Training script	--config-file	--model-name
dac-indoor-resnet101	scripts/train_dac.py	relative path	IDiscERP or IDisc or CNNDepth
dac-indoor-swinl	scripts/train_dac_large.py	relative path	IDiscERP or IDisc or CNNDepth
dac-outdoor-resnet101	scripts/train_dac.py	relative path	IDiscERP or IDisc or CNNDepth
dac-outdoor-swinl	scripts/train_dac_large.py	relative path	IDiscERP or IDisc or CNNDepth
metric3d-indoor-resnet101	scripts/train_persp.py	relative path	IDisc or CNNDepth
metric3d-indoor-swinl	scripts/train_persp.py	relative path	IDisc or CNNDepth
metric3d-outdoor-resnet101	scripts/train_persp.py	relative path	IDisc or CNNDepth
metric3d-outdoor-swinl	scripts/train_persp.py	relative path	IDisc or CNNDepth

Acknowledgements

We thank the authors of the following awesome codebases:

Please also consider citing them.

License

This software is released under MIT license. You can view a license summary here.

Citation

If you find our work useful in your research please consider citing our publication:

@inproceedings{Guo2025DepthAnyCamera,
  title={Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera},
  author={Yuliang Guo and Sparsh Garg and S. Mahdi H. Miangoleh and Xinyu Huang and Liu Ren},
  booktitle={arXiv},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.vscode		.vscode
configs		configs
dac		dac
demo		demo
docs		docs
scripts		scripts
splits		splits
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
post-entry-hooks.sh		post-entry-hooks.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera

Visualization

ScanNet++ fisheye

Matterport3D single-view reconstruction

Performance

Catalog

Usage

Installation

Clone the Repository

Docker Installation

Conda Installation

Data Preparation

Pre-trained models

Demo

Testing

Training

Acknowledgements

License

Citation

About

Releases

Packages

Contributors 3

Languages

License

yuliangguo/depth_any_camera

Folders and files

Latest commit

History

Repository files navigation

Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera

Visualization

ScanNet++ fisheye

Matterport3D single-view reconstruction

Performance

Catalog

Usage

Installation

Clone the Repository

Docker Installation

Conda Installation

Data Preparation

Pre-trained models

Demo

Testing

Training

Acknowledgements

License

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages