Jinsun Park*, Yongseop Jeong*, Kyungdon Joo, Donghyeon Cho, and In So Kweon [* Equal Contribution]
- Nov. 2024: The official implementation is released.
We propose an adaptive cost volume fusion algorithm, dubbed MMDNet, for multi-modal depth estimation in dynamic environments. Our method leverages measurements from multi-modal sensors to exploit their complementary characteristics, generating depth cues from each modality in the form of adaptive cost volumes using deep neural networks. The proposed adaptive cost volume takes into account sensor configurations and computational costs, addressing the imbalanced and redundant depth basis problem of conventional cost volumes. We further extend its role to a generalized depth representation and introduce a geometry-aware cost fusion algorithm. This unified and geometrically consistent depth representation facilitates accurate and efficient multi-modal sensor fusion, which is crucial for robustness in changing environments. To validate the proposed framework, we introduce a new Multi-Modal Depth in Changing Environments (MMDCE) dataset, collected using our own vehicular system equipped with RGB, NIR, and LiDAR sensors. Experimental results demonstrate that our method is robust, accurate, and reliable in diverse environmental conditions.
Our paper is available in IEEE Xplore.
@article{park2022adaptive,
title={Adaptive Cost Volume Fusion Network for Multi-Modal Depth Estimation in Changing Environments},
author={Park, Jinsun and Jeong, Yongseop and Joo, Kyungdon and Cho, Donghyeon and Kweon, In So},
journal={IEEE Robotics and Automation Letters},
volume={7},
number={2},
pages={5095--5102},
year={2022},
publisher={IEEE}
}
Our released implementation is tested on:
- Ubuntu 22.04
- Python 3.12 (Anaconda 24.5)
- PyTorch 2.3.1 / torchvision 0.18.1
- NVIDIA CUDA 12.1 (pytorch-cuda 12.1)
- NVIDIA RTX A6000 (48GB) x 2EA / NVIDIA RTX 4060 Ti (16GB) x 2EA
Prepare the KITTI Multi-Modal Depth (KITTI MMD) dataset using the KITTI Depth Completion (KITTIDC) and KITTI Raw (KITTIRAW) datasets.
Stereo RGB, stereo grayscale, poses and calibrations from the KITTIRAW dataset are copied to the KITTIDC dataset.
$ cd utils
$ python prepare_KITTIMMD.py --path_dc KITTIDC_ROOT --path_raw KITTIRAW_ROOT
After some time, you will obtain a data structure as follows:
KITTIMMD Dataset Directory Tree (Click to Expand)
KITTIMMD_ROOT
├── devkit
│ └── ...
├── train
│ ├── 2011_09_26_drive_0001_sync
│ │ ├── calib_cam_to_cam.txt
│ │ ├── calib_imu_to_velo.txt
│ │ ├── calib_velo_to_cam.txt
│ │ ├── image_00
│ │ │ ├── timestamps.txt
│ │ │ └── data
│ │ │ ├── 0000000000.png
│ │ │ └── ...
│ │ ├── image_01
│ │ │ └── ...
│ │ ├── image_02
│ │ │ └── ...
│ │ ├── image_03
│ │ │ └── ...
│ │ ├── oxts
│ │ │ ├── dataformat.txt
│ │ │ ├── timestamps.txt
│ │ │ └── data
│ │ │ ├── 0000000000.txt
│ │ │ └── ...
│ │ └── proj_depth
│ │ ├── groundtruth
│ │ │ ├── image_00
│ │ │ │ ├── 0000000005.png
│ │ │ │ └── ...
│ │ │ ├── image_01
│ │ │ │ └── ...
│ │ │ ├── image_02
│ │ │ │ └── ...
│ │ │ └── image_03
│ │ │ └── ...
│ │ └── velodyne_raw
│ │ ├── image_00
│ │ │ ├── 0000000005.png
│ │ │ └── ...
│ │ ├── image_01
│ │ │ └── ...
│ │ ├── image_02
│ │ │ └── ...
│ │ └── image_03
│ │ └── ...
│ └── ...
├── val
│ ├── 2011_09_26_drive_0002_sync
│ │ ├── calib_cam_to_cam.txt
│ │ ├── calib_imu_to_velo.txt
│ │ ├── calib_velo_to_cam.txt
│ │ ├── image_00
│ │ │ ├── timestamps.txt
│ │ │ └── data
│ │ │ ├── 0000000000.png
│ │ │ └── ...
│ │ ├── image_01
│ │ │ └── ...
│ │ ├── image_02
│ │ │ └── ...
│ │ ├── image_03
│ │ │ └── ...
│ │ ├── oxts
│ │ │ ├── dataformat.txt
│ │ │ ├── timestamps.txt
│ │ │ └── data
│ │ │ ├── 0000000000.txt
│ │ │ └── ...
│ │ └── proj_depth
│ │ ├── groundtruth
│ │ │ ├── image_00
│ │ │ │ ├── 0000000005.png
│ │ │ │ └── ...
│ │ │ ├── image_01
│ │ │ │ └── ...
│ │ │ ├── image_02
│ │ │ │ └── ...
│ │ │ └── image_03
│ │ │ └── ...
│ │ └── velodyne_raw
│ │ ├── image_00
│ │ │ ├── 0000000005.png
│ │ │ └── ...
│ │ ├── image_01
│ │ │ └── ...
│ │ ├── image_02
│ │ │ └── ...
│ │ └── image_03
│ │ └── ...
│ └── ...
└── depth_selection
├── test_depth_completion_anonymous
│ └── ...
├── test_depth_prediction_anonymous
│ └── ...
├── val_multi_modal
│ ├── 2011_09_26_drive_0002_sync
│ │ ├── calib_cam_to_cam.txt
│ │ ├── calib_imu_to_velo.txt
│ │ ├── calib_velo_to_cam.txt
│ │ ├── image_00
│ │ │ └── data
│ │ │ ├── 0000000005.png
│ │ │ └── ...
│ │ ├── image_01
│ │ │ └── ...
│ │ ├── image_02
│ │ │ └── ...
│ │ ├── image_03
│ │ │ └── ...
│ │ └── proj_depth
│ │ ├── groundtruth
│ │ │ ├── image_00
│ │ │ │ ├── 0000000005.png
│ │ │ │ └── ...
│ │ │ ├── image_01
│ │ │ │ └── ...
│ │ │ ├── image_02
│ │ │ │ └── ...
│ │ │ └── image_03
│ │ │ └── ...
│ │ └── velodyne_raw
│ │ ├── image_00
│ │ │ ├── 0000000005.png
│ │ │ └── ...
│ │ ├── image_01
│ │ │ └── ...
│ │ ├── image_02
│ │ │ └── ...
│ │ └── image_03
│ │ └── ...
│ └── ...
└── val_selection_cropped
└── ...
Note that the root directory for KITTI MMD (KITTIMMD_ROOT) will be the same as that for KITTI Depth Completion (KITTIDC_ROOT).
After the preparation, you should generate a JSON file containing paths to each sample.
$ cd MMDNET_ROOT/utils
$ python generate_json_KITTIMMD.py --path_kittimmd KITTIMMD_ROOT
This command will generate MMDNET_ROOT/list_data/kitti_mmd.json, which contains 32,917 samples for training, 3,426 samples for validation, and 1,000 samples for testing.
The official dataset split file kitti_mmd.json is already created and contained in this repository.
Note that various input arguments are supported in generate_json_KITTIMMD.py. For example, if you want to create a JSON file with fewer samples for prototyping, you can use the following command:
$ python generate_json_KITTIMMD.py --path_kittimmd KITTIMMD_ROOT --name kitti_mmd_tiny.json --num_train 32 --num_val 16 --num_test 8
The resulting MMDNET_ROOT/list_data/kitti_mmd_tiny.json will contain 32 samples for training, 16 samples for validation, and 8 samples for testing.
Please refer to generate_json_KITTIMMD.py for more details.
Download the Multi-Modal Depth in Changing Environments (MMDCE) dataset from the following link: Google Drive
After extracting the dataset to MMDCE_ROOT, you will obtain a data structure as follows:
MMDCE Dataset Directory Tree (Click to Expand)
MMDCE_ROOT
├── day
│ ├── train
│ │ ├── 2020-10-10-16-24-32
│ │ │ ├── calib.npy
│ │ │ ├── info.txt
│ │ │ ├── dep_ir1
│ │ │ │ ├── 1602314674513103008.png
│ │ │ │ └── ...
│ │ │ ├── dep_ir2
│ │ │ │ └── ...
│ │ │ ├── dep_rgb1
│ │ │ │ └── ...
│ │ │ ├── dep_rgb2
│ │ │ │ └── ...
│ │ │ ├── gt_dep_ir1
│ │ │ │ └── ...
│ │ │ ├── gt_dep_ir1_filtered
│ │ │ │ └── ...
│ │ │ ├── gt_dep_ir2
│ │ │ │ └── ...
│ │ │ ├── gt_dep_rgb1
│ │ │ │ └── ...
│ │ │ ├── gt_dep_rgb1_filtered
│ │ │ │ └── ...
│ │ │ ├── gt_dep_rgb2
│ │ │ │ └── ...
│ │ │ ├── ir1
│ │ │ │ └── ...
│ │ │ ├── ir2
│ │ │ │ └── ...
│ │ │ ├── rgb1
│ │ │ │ └── ...
│ │ │ └── rgb2
│ │ │ └── ...
│ │ └── ...
│ ├── val
│ │ ├── 2020-11-07-17-18-38
│ │ │ ├── calib.npy
│ │ │ ├── info.txt
│ │ │ ├── dep_ir1
│ │ │ │ ├── 1604737119707223295.png
│ │ │ │ └── ...
│ │ │ ├── dep_ir2
│ │ │ │ └── ...
│ │ │ ├── dep_rgb1
│ │ │ │ └── ...
│ │ │ ├── dep_rgb2
│ │ │ │ └── ...
│ │ │ ├── gt_dep_ir1
│ │ │ │ └── ...
│ │ │ ├── gt_dep_ir1_filtered
│ │ │ │ └── ...
│ │ │ ├── gt_dep_ir2
│ │ │ │ └── ...
│ │ │ ├── gt_dep_rgb1
│ │ │ │ └── ...
│ │ │ ├── gt_dep_rgb1_filtered
│ │ │ │ └── ...
│ │ │ ├── gt_dep_rgb2
│ │ │ │ └── ...
│ │ │ ├── ir1
│ │ │ │ └── ...
│ │ │ ├── ir2
│ │ │ │ └── ...
│ │ │ ├── rgb1
│ │ │ │ └── ...
│ │ │ └── rgb2
│ │ │ └── ...
│ │ └── ...
│ └── test
│ ├── 2020-10-02-17-34-35
│ │ ├── calib.npy
│ │ ├── info.txt
│ │ ├── dep_ir1
│ │ │ ├── 1601627722067240953.png
│ │ │ └── ...
│ │ ├── dep_ir2
│ │ │ └── ...
│ │ ├── dep_rgb1
│ │ │ └── ...
│ │ ├── dep_rgb2
│ │ │ └── ...
│ │ ├── gt_dep_ir1
│ │ │ └── ...
│ │ ├── gt_dep_ir1_filtered
│ │ │ └── ...
│ │ ├── gt_dep_ir2
│ │ │ └── ...
│ │ ├── gt_dep_rgb1
│ │ │ └── ...
│ │ ├── gt_dep_rgb1_filtered
│ │ │ └── ...
│ │ ├── gt_dep_rgb2
│ │ │ └── ...
│ │ ├── ir1
│ │ │ └── ...
│ │ ├── ir2
│ │ │ └── ...
│ │ ├── rgb1
│ │ │ └── ...
│ │ └── rgb2
│ │ └── ...
│ └── ...
└── night
├── train
│ └── 2020-10-11-00-43-29
│ ├── calib.npy
│ ├── info.txt
│ ├── dep_ir1
│ │ ├── 1602344610899789572.png
│ │ └── ...
│ ├── dep_ir2
│ │ └── ...
│ ├── dep_rgb1
│ │ └── ...
│ ├── dep_rgb2
│ │ └── ...
│ ├── gt_dep_ir1
│ │ └── ...
│ ├── gt_dep_ir1_filtered
│ │ └── ...
│ ├── gt_dep_ir2
│ │ └── ...
│ ├── gt_dep_rgb1
│ │ └── ...
│ ├── gt_dep_rgb1_filtered
│ │ └── ...
│ ├── gt_dep_rgb2
│ │ └── ...
│ ├── ir1
│ │ └── ...
│ ├── ir2
│ │ └── ...
│ ├── rgb1
│ │ └── ...
│ └── rgb2
│ └── ...
└── test
└── 2020-11-06-17-45-16
├── calib.npy
├── info.txt
├── dep_ir1
│ ├── 1604652317523400370.png
│ └── ...
├── dep_ir2
│ └── ...
├── dep_rgb1
│ └── ...
├── dep_rgb2
│ └── ...
├── gt_dep_ir1
│ └── ...
├── gt_dep_ir1_filtered
│ └── ...
├── gt_dep_ir2
│ └── ...
├── gt_dep_rgb1
│ └── ...
├── gt_dep_rgb1_filtered
│ └── ...
├── gt_dep_rgb2
│ └── ...
├── ir1
│ └── ...
├── ir2
│ └── ...
├── rgb1
│ └── ...
└── rgb2
└── ...
After the preparation, you should generate a JSON file containing paths to each sample.
# For the daytime split
$ cd MMDNET_ROOT/utils
$ python generate_json_MMDCE.py --path_mmd MMDCE_ROOT/day --name mmdce_day.json
# For the nighttime split
$ cd MMDCE_ROOT/night
$ ln -s test val
$ cd MMDNET_ROOT/utils
$ python generate_json_MMDCE.py --path_mmd MMDCE_ROOT/night --name mmdce_night.json
The first command will generate the MMDCE daytime (MMDCE Day) dataset JSON file in MMDNET_ROOT/list_data/mmdce_day.json, which contains 4,344 samples for training, 656 samples for validation, and 876 samples for testing.
The second command will generate the MMDCE nighttime (MMDCE Night) dataset JSON file in MMDNET_ROOT/list_data/mmdce_night.json, which contains 601 samples for training and 151 samples for testing. Note that in the MMDCE Night, the test set serves as the validation set.
The official dataset split files mmdce_day.json and mmdce_night.json are already created and contained in this repository.
$ cd MMDNET_ROOT/src
# An example command for KITTI MMD dataset training
$ python main.py --path_kittimmd KITTIMMD_ROOT --dataset KITTIMMD --list_data ../list_data/kitti_mmd.json --method RGB-IR-LIDAR --gpus 0,1 --port 29500 --patch_width 1216 --patch_height 240 --loss 1.0*L1 --epochs 30 --batch_size 16 --lr 0.001 --decay 20,25,30 --gamma 1.0,0.2,0.04 --top_crop 100 --save_best min --save NAME_SAVE
# An example command for MMDCE Day dataset training
$ python main.py --path_mmdce MMDCE_ROOT/day --dataset MMDCE --list_data ../list_data/mmdce_day.json --method RGB-IR-LIDAR --gpus 0,1 --port 29500 --patch_width 1216 --patch_height 240 --loss 1.0*L1+1.0*L2 --epochs 30 --batch_size 8 --lr 0.001 --decay 15,20,25 --gamma 1.0,0.2,0.04 --save_best min --save NAME_SAVE
# An example command for MMDCE Night dataset training
$ python main.py --path_mmdce MMDCE_ROOT/night --dataset MMDCE --list_data ../list_data/mmdce_night.json --method RGB-IR-LIDAR --gpus 0,1 --port 29500 --patch_width 1216 --patch_height 240 --loss 1.0*L1+1.0*L2 --epochs 30 --batch_size 8 --lr 0.0002 --decay 15,20,25 --gamma 1.0,0.2,0.04 --save_best min --save NAME_SAVE --pretrain ../checkpoints/mmdceday_best.pt
Please refer to the config.py for more options.
During the training, tensorboard logs are saved under the experiments directory. To run the tensorboard:
$ cd MMDNET_ROOT/experiments
$ tensorboard --logdir=. --bind_all --port NUM_PORT
Then you can access the tensorboard via http://YOUR_SERVER_IP:NUM_PORT
$ cd MMDNET_ROOT/src
# An example command for KITTI MMD dataset testing
$ python main.py --path_kittimmd KITTIMMD_ROOT --dataset KITTIMMD --list_data ../list_data/kitti_mmd.json --method RGB-IR-LIDAR --gpus 0 --port 29500 --save mmdnet_kittimmd_test --test_only --pretrain PATH_TO_CHECKPOINT
# An example command for KITTI MMD dataset testing and saving prediction images
$ python main.py --path_kittimmd KITTIMMD_ROOT --dataset KITTIMMD --list_data ../list_data/kitti_mmd.json --method RGB-IR-LIDAR --gpus 0 --port 29500 --save mmdnet_kittimmd_test --test_only --pretrain PATH_TO_CHECKPOINT --save_image --save_result_only
# An example command for MMDCE Day dataset testing
$ python main.py --path_mmdce MMDCE_ROOT/day --dataset MMDCE --list_data ../list_data/mmdce_day.json --method RGB-IR-LIDAR --gpus 0 --port 29500 --save NAME_SAVE --test_only --pretrain PATH_TO_CHECKPOINT
# An example command for MMDCE Night dataset testing
$ python main.py --path_mmdce MMDCE_ROOT/night --dataset MMDCE --list_data ../list_data/mmdce_night.json --method RGB-IR-LIDAR --gpus 0 --port 29500 --save NAME_SAVE --test_only --pretrain PATH_TO_CHECKPOINT
To save depth and disparity prediction images, use --save_image and --save_result_only arguments together.
To obtain real depth or disparity values from the prediction images, apply the following conversion: value = double(image) / 256.0
We release our pre-trained models on the KITTI MMD, MMDCE Day, and MMDCE Night datasets.
Please note that the results obtained with the released models slightly differ from those reported in the paper due to code updates.
Type | RMSE (mm) | MAE (mm) | iRMSE (1/km) | iMAE (1/km) |
---|---|---|---|---|
KITTI MMD Test Set (Paper) | 673.34 | 202.56 | 1.69 | 0.80 |
KITTI MMD Test Set (Released) | 675.27 | 198.22 | 1.66 | 0.78 |
MMDCE Day Test Set (Paper) | 1226.2 | 610.4 | 6.9 | 3.8 |
MMDCE Day Test Set (Released) | 1210.3 | 595.0 | 6.7 | 3.7 |
MMDCE Night Test Set (Paper) | 1371.3 | 663.6 | 8.2 | 4.8 |
MMDCE Night Test Set (Released) | 1323.1 | 662.6 | 9.0 | 5.3 |
We also release our prediction results on the KITTI MMD, MMDCE Day, and MMDCE Night datasets.
- We cleaned and updated our original implementation for this release.