Skip to content

yuliangguo/depth_any_camera

Repository files navigation

Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera

Yuliang Guo1*† · Sparsh Garg2† · S. Mahdi H. Miangoleh3 · Xinyu Huang1 · Liu Ren1

1Bosch Research North America    2Carnegie Mellon University    3Simon Fraser University    

 *corresponding author †equal technical contribution

Paper PDF Project Page

teaser

Depth Any Camera (DAC) is a zero-shot metric depth estimation framework that extends a perspective-trained model to handle any type of camera with varying FoVs effectively.

Notably, DAC can be trained exclusively on perspective images, yet it generalizes seamlessly to fisheye and 360 cameras without requiring specialized training data. Key features include:

  1. Zero-shot metric depth estimation on fisheye and 360 images, significantly outperforming prior metric depth SoTA Metric3D-v2 and UniDepth.
  2. Geometry-based training framework adaptable to any network architecture, extendable to other 3D perception tasks.

Tired of collecting new data for specific cameras? DAC maximizes the utility of every existing 3D data for training, regardless of the specific camera types used in new applications.

Visualization

ScanNet++ fisheye

The zero-shot metric depth estimation results of Depth Any Camera (DAC) are visualized on ScanNet++ fisheye videos and compared to Metric3D-v2. The visualizations of A.Rel error against ground truth highlight the superior performance of DAC.

animated

Matterport3D single-view reconstruction

Additionally, we showcase DAC's application on 360-degree images, where a single forward pass of depth estimation enables full 3D scene reconstruction.

animated

Additional visual results and comparison with the prior SoTA can be found at Project Page

Performance

Depth Any Camera performs significantly better than the previous SoTA metric depth estimation models Metric3D-v2 and UniDepth in zero-shot generalization to large FoV camera images given significantly smaller training dataset and model size.

Method Training Data Size Matterport3D (360) Pano3D-GV2 (360) ScanNet++ (fisheye) KITTI360 (fisheye)
AbsRel $\delta_1$ AbsRel $\delta_1$ AbsRel $\delta_1$ AbsRel $\delta_1$
UniDepth-VitL 3M 0.7648 0.2576 0.7892 0.2469 0.4971 0.3638 0.2939 0.4810
Metric3D-v2-VitL 16M 0.2924 0.4381 0.3070 0.4040 0.2229 0.5360 0.1997 0.7159
Ours-Resnet101 670K-indoor / 130K-outdoor 0.156 0.7727 0.1387 0.8115 0.1323 0.8517 0.1559 0.7858
Ours-SwinL 670K-indoor / 130K-outdoor 0.1789 0.7231 0.1836 0.7287 0.1282 0.8544 0.1487 0.8222

We highlight the best and second best results in bold and italic respectively (better results: AbsRel $\downarrow$ , $\delta_1 \uparrow$).

Catalog

  • Release of pre-trained DepthAnyCamera (DAC) models trained on moderately sized datasets.
  • Demo code for easy setup and usage.
  • Testing and evaluation pipeline for zero-shot metric depth estimation on perspective, fisheye, and 360-degree datasets.
  • Complete DepthAnyCamera (DAC) training pipeline using mixed perspective camera data.
  • Complete data preparation and curation scripts.
  • [TBD] Foundation-level model trained on a large-scale, diverse dataset mixture, encompassing perspective, fisheye, and 360-degree camera data.

Usage

Installation

Clone the Repository

git clone https://github.com/yuliangguo/depth_any_camera
cd depth_any_camera

Docker Installation

This repository can be run from within Docker, as long as the NVIDIA Container Toolkit is properly configured. For Ubuntu Installation steps, refer to this guide.

# Build the container
docker build -t dac:latest .
# Enter the container
docker run --gpus all --network host -v $(pwd):/depth_any_camera --rm -it dac /bin/bash 

# Once within the container, 
#  source the post-entry-hooks.sh to finish the install.
source post-entry-hooks.sh

Conda Installation

Alternatively, this repository can be run from within Conda alone.

conda create -n dac python=3.9 -y
conda activate dac
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116

pip install -r requirements.txt
export PYTHONPATH="$PWD:$PYTHONPATH"
cd dac/models/ops/
pip install -e .
cd  ../../../

Data Preparation

Our current training set is very slim compared to prior fundation models. Currently, DAC is trained on a combination set of 3 labeled datasets (670k images) for indoor model and a combination of 2 datasets (130k) for outdoor model. Two 360 datasets and two fisheye datasets are used for zero-shot testing.

data

Please refer to DATA.md for detailed datasets preparation. Make sure the relative paths of datasets have been set correctly before proceeding to the actual testing and training sections.

Pre-trained models

We provide two indoor models and two outdoor modeling considering Resnet101 and SwinTransformer-Large (SwinL) as backbones. In addition, we also provide two weaker baseline models for comparison. The download links can be found in the following table or from . We suggest to save download both the model configs and model weights at checkpoints in order to run our scripts directly.

Model Name Training Datasets Model Configs Weights
dac-indoor-resnet101 (ours) indoor mix 670k huggingface huggingface
dac-indoor-swinL (ours) indoor mix 670k huggingface huggingface
dac-outdoor-resnet101 (ours) outdoor mix 130k huggingface huggingface
dac-outdoor-swinL (ours) outdoor mix 130k huggingface huggingface
idisc-metric3d-indoor-resnet101 (weak baseline 1) indoor mix 670k huggingface huggingface
cnndepth-metric3d-indoor-resnet101 (weak baseline 2) indoor mix 670k huggingface huggingface

Demo

We have provided a ready-to-run demo scripts in the demo folder. demo/demo_dac_indoor.py demonstrates how to perform inference on various types of camera data, including ScanNet++ (fisheye), Matterport3D (360), and NYU (perspective), using a single metric depth model trained on perspective images. The code generates point cloud files *.ply and visualization results as shown below:

demo output

demo/demo_dac_outdoor.py similarly demonstrates how a single outdoor model handle different types of camera data, including kitti (perspective) and kitti360 (fisheye).

Instead, we also provide demo script for dealing one sample, you may follow the following example command:

python demo/demo_dac_single.py --config-file checkpoints/dac_swinl_indoor.json --model-file checkpoints/dac_swinl_indoor.pt --sample-file demo/input/scannetpp_sample.json --out-dir demo/output

Testing

Given provided pretrained models saved in checkpoints/, the following code can be run to test and evaluate on certain dataset, e.g., ScanNet++:

python script/test_dac.py --model-file checkpoints/dac_swinl_indoor.pt --model-name IDiscERP --config-file configs/test/dac_swinl_indoor_test_scannetpp.json --base-path datasets --vis

Different config files for testing all the reported datasets are included in configs/test. Interested users could also refer to the provided lauch.json for convinient use or debug provided testing scripts in VSCode. The following tables lay out those most relative ones.

Testing dataset Testing script --model-file --config-file --model-name
Matterport scripts/test_dac.py checkpoints/dac-indoor-resnet101.pt relative path IDiscERP
Gibson-V2 ^ ^ relative path IDiscERP
ScanNet++ ^ ^ relative path IDiscERP
NYU ^ ^ relative path IDiscERP
KITTI360 ^ checkpoints/dac-outdoor-resnet101.pt relative path IDisc
KITTI ^ ^ relative path IDisc
... scripts/test_persp.py checkpoints/idisc-... ... IDisc
... ^ checkpoints/cnndepth-... ... CNNDepth

Note: IDiscERP is our modified version of the IDisc model, incorporating isolated image and positional encoding features. It has been observed to improve results in small-size data training, particularly for better depth-scale equivariance. However, these modifications are not essential for large dataset training. CNNDepth refers to the CNN portion of the IDisc model, which serves as a network baseline but consistently underperforms compared to other models.

The ResNet101 models and configuration files can be replaced with the corresponding Swin-L versions. Ensure that the --model-name parameter matches the type of trained model. For users interested in comparing our DAC framework with the Metric3D training framework, we have provided pre-trained weak baselines along with their testing scripts, as detailed in the last two rows of the table.

Training

To train metric depth estimation models under the DepthAnyCamera (DAC) framework, you can run the default code for indoor training datasets as follows:

python scripts/train_dac.py --config-file configs/train/hm3d+taskonomy+hypersim/hm3d+taskonomy+hypersim_dac_r101.json --base-path datasets --distributed --model-name IDiscERP

If you wish to train with a larger backbone, use the following command:

python scripts/train_dac_large.py --config-file configs/train/hm3d+taskonomy+hypersim/hm3d+taskonomy+hypersim_dac_swinl_s2.json --base-path datasets --distributed --model-name IDiscERP

For users interested in comparing our DAC framework with the Metric3D training framework, the following command can be used:

python scripts/train_persp.py --config-file configs/train/hm3d+taskonomy+hypersim/hm3d+taskonomy+hypersim_r101.json --base-path datasets --distributed --model-name IDisc

The corresponding testing script can be found at scripts/test_persp.py.

Similar commands apply to outdoor model training. There are various options available depending on the dataset or architecture. Interested users can refer to the table below for basic usage or consult the provided launch.json for convenient use or debugging in VSCode. We also provide all the training configurations we’ve used in configs/train.

Training Target Training script --config-file --model-name
dac-indoor-resnet101 scripts/train_dac.py relative path IDiscERP or IDisc or CNNDepth
dac-indoor-swinl scripts/train_dac_large.py relative path IDiscERP or IDisc or CNNDepth
dac-outdoor-resnet101 scripts/train_dac.py relative path IDiscERP or IDisc or CNNDepth
dac-outdoor-swinl scripts/train_dac_large.py relative path IDiscERP or IDisc or CNNDepth
metric3d-indoor-resnet101 scripts/train_persp.py relative path IDisc or CNNDepth
metric3d-indoor-swinl scripts/train_persp.py relative path IDisc or CNNDepth
metric3d-outdoor-resnet101 scripts/train_persp.py relative path IDisc or CNNDepth
metric3d-outdoor-swinl scripts/train_persp.py relative path IDisc or CNNDepth

Acknowledgements

We thank the authors of the following awesome codebases:

Please also consider citing them.

License

This software is released under MIT license. You can view a license summary here.

Citation

If you find our work useful in your research please consider citing our publication:

@inproceedings{Guo2025DepthAnyCamera,
  title={Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera},
  author={Yuliang Guo and Sparsh Garg and S. Mahdi H. Miangoleh and Xinyu Huang and Liu Ren},
  booktitle={arXiv},
  year={2025}
}