GitHub - zzangjinsun/MMDCE_RAL22: Park et al., Adaptive Cost Volume Fusion Network for Multi-Modal Depth Estimation in Changing Environments, RAL, 2022

Adaptive Cost Volume Fusion Network for Multi-Modal Depth Estimation in Changing Environments

Jinsun Park, Yongseop Jeong, Kyungdon Joo, Donghyeon Cho, and In So Kweon [* Equal Contribution]

IEEE Robotics and Automation Letters (RAL), Feb 2022

IEEE International Conference on Robotics and Automation (ICRA), May 2022

News

Nov. 2024: The official implementation is released.

Introduction

We propose an adaptive cost volume fusion algorithm, dubbed MMDNet, for multi-modal depth estimation in dynamic environments. Our method leverages measurements from multi-modal sensors to exploit their complementary characteristics, generating depth cues from each modality in the form of adaptive cost volumes using deep neural networks. The proposed adaptive cost volume takes into account sensor configurations and computational costs, addressing the imbalanced and redundant depth basis problem of conventional cost volumes. We further extend its role to a generalized depth representation and introduce a geometry-aware cost fusion algorithm. This unified and geometrically consistent depth representation facilitates accurate and efficient multi-modal sensor fusion, which is crucial for robustness in changing environments. To validate the proposed framework, we introduce a new Multi-Modal Depth in Changing Environments (MMDCE) dataset, collected using our own vehicular system equipped with RGB, NIR, and LiDAR sensors. Experimental results demonstrate that our method is robust, accurate, and reliable in diverse environmental conditions.

Experimental Results

Citation

Our paper is available in IEEE Xplore.

@article{park2022adaptive,
  title={Adaptive Cost Volume Fusion Network for Multi-Modal Depth Estimation in Changing Environments},
  author={Park, Jinsun and Jeong, Yongseop and Joo, Kyungdon and Cho, Donghyeon and Kweon, In So},
  journal={IEEE Robotics and Automation Letters},
  volume={7},
  number={2},
  pages={5095--5102},
  year={2022},
  publisher={IEEE}
}

Dependencies

Our released implementation is tested on:

Ubuntu 22.04
Python 3.12 (Anaconda 24.5)
PyTorch 2.3.1 / torchvision 0.18.1
NVIDIA CUDA 12.1 (pytorch-cuda 12.1)
NVIDIA RTX A6000 (48GB) x 2EA / NVIDIA RTX 4060 Ti (16GB) x 2EA

Dataset Preparation

KITTI Multi-Modal Depth (KITTI MMD) Dataset

Prepare the KITTI Multi-Modal Depth (KITTI MMD) dataset using the KITTI Depth Completion (KITTIDC) and KITTI Raw (KITTIRAW) datasets.

Stereo RGB, stereo grayscale, poses and calibrations from the KITTIRAW dataset are copied to the KITTIDC dataset.

$ cd utils
$ python prepare_KITTIMMD.py --path_dc KITTIDC_ROOT --path_raw KITTIRAW_ROOT

After some time, you will obtain a data structure as follows:

KITTIMMD Dataset Directory Tree (Click to Expand)

KITTIMMD_ROOT
├── devkit
│   └── ...
├── train
│   ├── 2011_09_26_drive_0001_sync
│   │   ├── calib_cam_to_cam.txt
│   │   ├── calib_imu_to_velo.txt
│   │   ├── calib_velo_to_cam.txt
│   │   ├── image_00
│   │   │   ├── timestamps.txt
│   │   │   └── data
│   │   │       ├── 0000000000.png
│   │   │       └── ...
│   │   ├── image_01
│   │   │   └── ...
│   │   ├── image_02
│   │   │   └── ...
│   │   ├── image_03
│   │   │   └── ...
│   │   ├── oxts
│   │   │   ├── dataformat.txt
│   │   │   ├── timestamps.txt
│   │   │   └── data
│   │   │       ├── 0000000000.txt
│   │   │       └── ...
│   │   └── proj_depth
│   │       ├── groundtruth
│   │       │   ├── image_00
│   │       │   │   ├── 0000000005.png
│   │       │   │   └── ...
│   │       │   ├── image_01
│   │       │   │   └── ...
│   │       │   ├── image_02
│   │       │   │   └── ...
│   │       │   └── image_03
│   │       │       └── ...
│   │       └── velodyne_raw
│   │           ├── image_00
│   │           │   ├── 0000000005.png
│   │           │   └── ...
│   │           ├── image_01
│   │           │   └── ...
│   │           ├── image_02
│   │           │   └── ...
│   │           └── image_03
│   │               └── ...
│   └── ...
├── val
│   ├── 2011_09_26_drive_0002_sync
│   │   ├── calib_cam_to_cam.txt
│   │   ├── calib_imu_to_velo.txt
│   │   ├── calib_velo_to_cam.txt
│   │   ├── image_00
│   │   │   ├── timestamps.txt
│   │   │   └── data
│   │   │       ├── 0000000000.png
│   │   │       └── ...
│   │   ├── image_01
│   │   │   └── ...
│   │   ├── image_02
│   │   │   └── ...
│   │   ├── image_03
│   │   │   └── ...
│   │   ├── oxts
│   │   │   ├── dataformat.txt
│   │   │   ├── timestamps.txt
│   │   │   └── data
│   │   │       ├── 0000000000.txt
│   │   │       └── ...
│   │   └── proj_depth
│   │       ├── groundtruth
│   │       │   ├── image_00
│   │       │   │   ├── 0000000005.png
│   │       │   │   └── ...
│   │       │   ├── image_01
│   │       │   │   └── ...
│   │       │   ├── image_02
│   │       │   │   └── ...
│   │       │   └── image_03
│   │       │       └── ...
│   │       └── velodyne_raw
│   │           ├── image_00
│   │           │   ├── 0000000005.png
│   │           │   └── ...
│   │           ├── image_01
│   │           │   └── ...
│   │           ├── image_02
│   │           │   └── ...
│   │           └── image_03
│   │               └── ...
│   └── ...
└── depth_selection
    ├── test_depth_completion_anonymous
    │   └── ...
    ├── test_depth_prediction_anonymous
    │   └── ...
    ├── val_multi_modal
    │   ├── 2011_09_26_drive_0002_sync
    │   │   ├── calib_cam_to_cam.txt
    │   │   ├── calib_imu_to_velo.txt
    │   │   ├── calib_velo_to_cam.txt
    │   │   ├── image_00
    │   │   │   └── data
    │   │   │       ├── 0000000005.png
    │   │   │       └── ...
    │   │   ├── image_01
    │   │   │   └── ...
    │   │   ├── image_02
    │   │   │   └── ...
    │   │   ├── image_03
    │   │   │   └── ...
    │   │   └── proj_depth
    │   │       ├── groundtruth
    │   │       │   ├── image_00
    │   │       │   │   ├── 0000000005.png
    │   │       │   │   └── ...
    │   │       │   ├── image_01
    │   │       │   │   └── ...
    │   │       │   ├── image_02
    │   │       │   │   └── ...
    │   │       │   └── image_03
    │   │       │       └── ...
    │   │       └── velodyne_raw
    │   │           ├── image_00
    │   │           │   ├── 0000000005.png
    │   │           │   └── ...
    │   │           ├── image_01
    │   │           │   └── ...
    │   │           ├── image_02
    │   │           │   └── ...
    │   │           └── image_03
    │   │               └── ...
    │   └── ...
    └── val_selection_cropped
        └── ...

Note that the root directory for KITTI MMD (KITTIMMD_ROOT) will be the same as that for KITTI Depth Completion (KITTIDC_ROOT).

After the preparation, you should generate a JSON file containing paths to each sample.

$ cd MMDNET_ROOT/utils
$ python generate_json_KITTIMMD.py --path_kittimmd KITTIMMD_ROOT

This command will generate MMDNET_ROOT/list_data/kitti_mmd.json, which contains 32,917 samples for training, 3,426 samples for validation, and 1,000 samples for testing.

The official dataset split file kitti_mmd.json is already created and contained in this repository.

Note that various input arguments are supported in generate_json_KITTIMMD.py. For example, if you want to create a JSON file with fewer samples for prototyping, you can use the following command:

$ python generate_json_KITTIMMD.py --path_kittimmd KITTIMMD_ROOT --name kitti_mmd_tiny.json --num_train 32 --num_val 16 --num_test 8

The resulting MMDNET_ROOT/list_data/kitti_mmd_tiny.json will contain 32 samples for training, 16 samples for validation, and 8 samples for testing.

Please refer to generate_json_KITTIMMD.py for more details.

Multi-Modal Depth in Changing Environments (MMDCE) Dataset

Download the Multi-Modal Depth in Changing Environments (MMDCE) dataset from the following link: Google Drive

After extracting the dataset to MMDCE_ROOT, you will obtain a data structure as follows:

MMDCE Dataset Directory Tree (Click to Expand)

MMDCE_ROOT
├── day
│   ├── train
│   │   ├── 2020-10-10-16-24-32
│   │   │   ├── calib.npy
│   │   │   ├── info.txt
│   │   │   ├── dep_ir1
│   │   │   │   ├── 1602314674513103008.png
│   │   │   │   └── ...
│   │   │   ├── dep_ir2
│   │   │   │   └── ...
│   │   │   ├── dep_rgb1
│   │   │   │   └── ...
│   │   │   ├── dep_rgb2
│   │   │   │   └── ...
│   │   │   ├── gt_dep_ir1
│   │   │   │   └── ...
│   │   │   ├── gt_dep_ir1_filtered
│   │   │   │   └── ...
│   │   │   ├── gt_dep_ir2
│   │   │   │   └── ...
│   │   │   ├── gt_dep_rgb1
│   │   │   │   └── ...
│   │   │   ├── gt_dep_rgb1_filtered
│   │   │   │   └── ...
│   │   │   ├── gt_dep_rgb2
│   │   │   │   └── ...
│   │   │   ├── ir1
│   │   │   │   └── ...
│   │   │   ├── ir2
│   │   │   │   └── ...
│   │   │   ├── rgb1
│   │   │   │   └── ...
│   │   │   └── rgb2
│   │   │       └── ...
│   │   └── ...
│   ├── val
│   │   ├── 2020-11-07-17-18-38
│   │   │   ├── calib.npy
│   │   │   ├── info.txt
│   │   │   ├── dep_ir1
│   │   │   │   ├── 1604737119707223295.png
│   │   │   │   └── ...
│   │   │   ├── dep_ir2
│   │   │   │   └── ...
│   │   │   ├── dep_rgb1
│   │   │   │   └── ...
│   │   │   ├── dep_rgb2
│   │   │   │   └── ...
│   │   │   ├── gt_dep_ir1
│   │   │   │   └── ...
│   │   │   ├── gt_dep_ir1_filtered
│   │   │   │   └── ...
│   │   │   ├── gt_dep_ir2
│   │   │   │   └── ...
│   │   │   ├── gt_dep_rgb1
│   │   │   │   └── ...
│   │   │   ├── gt_dep_rgb1_filtered
│   │   │   │   └── ...
│   │   │   ├── gt_dep_rgb2
│   │   │   │   └── ...
│   │   │   ├── ir1
│   │   │   │   └── ...
│   │   │   ├── ir2
│   │   │   │   └── ...
│   │   │   ├── rgb1
│   │   │   │   └── ...
│   │   │   └── rgb2
│   │   │       └── ...
│   │   └── ...
│   └── test
│       ├── 2020-10-02-17-34-35
│       │   ├── calib.npy
│       │   ├── info.txt
│       │   ├── dep_ir1
│       │   │   ├── 1601627722067240953.png
│       │   │   └── ...
│       │   ├── dep_ir2
│       │   │   └── ...
│       │   ├── dep_rgb1
│       │   │   └── ...
│       │   ├── dep_rgb2
│       │   │   └── ...
│       │   ├── gt_dep_ir1
│       │   │   └── ...
│       │   ├── gt_dep_ir1_filtered
│       │   │   └── ...
│       │   ├── gt_dep_ir2
│       │   │   └── ...
│       │   ├── gt_dep_rgb1
│       │   │   └── ...
│       │   ├── gt_dep_rgb1_filtered
│       │   │   └── ...
│       │   ├── gt_dep_rgb2
│       │   │   └── ...
│       │   ├── ir1
│       │   │   └── ...
│       │   ├── ir2
│       │   │   └── ...
│       │   ├── rgb1
│       │   │   └── ...
│       │   └── rgb2
│       │       └── ...
│       └── ...
└── night
    ├── train
    │   └── 2020-10-11-00-43-29
    │          ├── calib.npy
    │          ├── info.txt
    │          ├── dep_ir1
    │          │   ├── 1602344610899789572.png
    │          │   └── ...
    │          ├── dep_ir2
    │          │   └── ...
    │          ├── dep_rgb1
    │          │   └── ...
    │          ├── dep_rgb2
    │          │   └── ...
    │          ├── gt_dep_ir1
    │          │   └── ...
    │          ├── gt_dep_ir1_filtered
    │          │   └── ...
    │          ├── gt_dep_ir2
    │          │   └── ...
    │          ├── gt_dep_rgb1
    │          │   └── ...
    │          ├── gt_dep_rgb1_filtered
    │          │   └── ...
    │          ├── gt_dep_rgb2
    │          │   └── ...
    │          ├── ir1
    │          │   └── ...
    │          ├── ir2
    │          │   └── ...
    │          ├── rgb1
    │          │   └── ...
    │          └── rgb2
    │              └── ...
    └── test
        └── 2020-11-06-17-45-16
               ├── calib.npy
               ├── info.txt
               ├── dep_ir1
               │   ├── 1604652317523400370.png
               │   └── ...
               ├── dep_ir2
               │   └── ...
               ├── dep_rgb1
               │   └── ...
               ├── dep_rgb2
               │   └── ...
               ├── gt_dep_ir1
               │   └── ...
               ├── gt_dep_ir1_filtered
               │   └── ...
               ├── gt_dep_ir2
               │   └── ...
               ├── gt_dep_rgb1
               │   └── ...
               ├── gt_dep_rgb1_filtered
               │   └── ...
               ├── gt_dep_rgb2
               │   └── ...
               ├── ir1
               │   └── ...
               ├── ir2
               │   └── ...
               ├── rgb1
               │   └── ...
               └── rgb2
                   └── ...

After the preparation, you should generate a JSON file containing paths to each sample.

# For the daytime split
$ cd MMDNET_ROOT/utils
$ python generate_json_MMDCE.py --path_mmd MMDCE_ROOT/day --name mmdce_day.json

# For the nighttime split
$ cd MMDCE_ROOT/night
$ ln -s test val
$ cd MMDNET_ROOT/utils
$ python generate_json_MMDCE.py --path_mmd MMDCE_ROOT/night --name mmdce_night.json

The first command will generate the MMDCE daytime (MMDCE Day) dataset JSON file in MMDNET_ROOT/list_data/mmdce_day.json, which contains 4,344 samples for training, 656 samples for validation, and 876 samples for testing.

The second command will generate the MMDCE nighttime (MMDCE Night) dataset JSON file in MMDNET_ROOT/list_data/mmdce_night.json, which contains 601 samples for training and 151 samples for testing. Note that in the MMDCE Night, the test set serves as the validation set.

The official dataset split files mmdce_day.json and mmdce_night.json are already created and contained in this repository.

Training

$ cd MMDNET_ROOT/src

# An example command for KITTI MMD dataset training
$ python main.py --path_kittimmd KITTIMMD_ROOT --dataset KITTIMMD --list_data ../list_data/kitti_mmd.json --method RGB-IR-LIDAR --gpus 0,1 --port 29500 --patch_width 1216 --patch_height 240 --loss 1.0*L1 --epochs 30 --batch_size 16 --lr 0.001 --decay 20,25,30 --gamma 1.0,0.2,0.04 --top_crop 100 --save_best min --save NAME_SAVE

# An example command for MMDCE Day dataset training
$ python main.py --path_mmdce MMDCE_ROOT/day --dataset MMDCE --list_data ../list_data/mmdce_day.json --method RGB-IR-LIDAR --gpus 0,1 --port 29500 --patch_width 1216 --patch_height 240 --loss 1.0*L1+1.0*L2 --epochs 30 --batch_size 8 --lr 0.001 --decay 15,20,25 --gamma 1.0,0.2,0.04 --save_best min --save NAME_SAVE

# An example command for MMDCE Night dataset training
$ python main.py --path_mmdce MMDCE_ROOT/night --dataset MMDCE --list_data ../list_data/mmdce_night.json --method RGB-IR-LIDAR --gpus 0,1 --port 29500 --patch_width 1216 --patch_height 240 --loss 1.0*L1+1.0*L2 --epochs 30 --batch_size 8 --lr 0.0002 --decay 15,20,25 --gamma 1.0,0.2,0.04 --save_best min --save NAME_SAVE --pretrain ../checkpoints/mmdceday_best.pt

Please refer to the config.py for more options.

During the training, tensorboard logs are saved under the experiments directory. To run the tensorboard:

$ cd MMDNET_ROOT/experiments
$ tensorboard --logdir=. --bind_all --port NUM_PORT

Then you can access the tensorboard via http://YOUR_SERVER_IP:NUM_PORT

Testing

$ cd MMDNET_ROOT/src

# An example command for KITTI MMD dataset testing
$ python main.py --path_kittimmd KITTIMMD_ROOT --dataset KITTIMMD --list_data ../list_data/kitti_mmd.json --method RGB-IR-LIDAR --gpus 0 --port 29500 --save mmdnet_kittimmd_test --test_only --pretrain PATH_TO_CHECKPOINT

# An example command for KITTI MMD dataset testing and saving prediction images
$ python main.py --path_kittimmd KITTIMMD_ROOT --dataset KITTIMMD --list_data ../list_data/kitti_mmd.json --method RGB-IR-LIDAR --gpus 0 --port 29500 --save mmdnet_kittimmd_test --test_only --pretrain PATH_TO_CHECKPOINT --save_image --save_result_only

# An example command for MMDCE Day dataset testing
$ python main.py --path_mmdce MMDCE_ROOT/day --dataset MMDCE --list_data ../list_data/mmdce_day.json --method RGB-IR-LIDAR --gpus 0 --port 29500 --save NAME_SAVE --test_only --pretrain PATH_TO_CHECKPOINT

# An example command for MMDCE Night dataset testing
$ python main.py --path_mmdce MMDCE_ROOT/night --dataset MMDCE --list_data ../list_data/mmdce_night.json --method RGB-IR-LIDAR --gpus 0 --port 29500 --save NAME_SAVE --test_only --pretrain PATH_TO_CHECKPOINT

To save depth and disparity prediction images, use --save_image and --save_result_only arguments together.

To obtain real depth or disparity values from the prediction images, apply the following conversion: value = double(image) / 256.0

Pre-trained Models and Results

We release our pre-trained models on the KITTI MMD, MMDCE Day, and MMDCE Night datasets.

Please note that the results obtained with the released models slightly differ from those reported in the paper due to code updates.

Type	RMSE (mm)	MAE (mm)	iRMSE (1/km)	iMAE (1/km)
KITTI MMD Test Set (Paper)	673.34	202.56	1.69	0.80
KITTI MMD Test Set (Released)	675.27	198.22	1.66	0.78
MMDCE Day Test Set (Paper)	1226.2	610.4	6.9	3.8
MMDCE Day Test Set (Released)	1210.3	595.0	6.7	3.7
MMDCE Night Test Set (Paper)	1371.3	663.6	8.2	4.8
MMDCE Night Test Set (Released)	1323.1	662.6	9.0	5.3

We also release our prediction results on the KITTI MMD, MMDCE Day, and MMDCE Night datasets.

Notes

We cleaned and updated our original implementation for this release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adaptive Cost Volume Fusion Network for Multi-Modal Depth Estimation in Changing Environments

Jinsun Park, Yongseop Jeong, Kyungdon Joo, Donghyeon Cho, and In So Kweon [* Equal Contribution]

IEEE Robotics and Automation Letters (RAL), Feb 2022

IEEE International Conference on Robotics and Automation (ICRA), May 2022

News

Introduction

Experimental Results

Citation

Dependencies

Dataset Preparation

KITTI Multi-Modal Depth (KITTI MMD) Dataset

Multi-Modal Depth in Changing Environments (MMDCE) Dataset

Training

Testing

Pre-trained Models and Results

Notes

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
checkpoints		checkpoints
images		images
list_data		list_data
src		src
utils		utils
.gitignore		.gitignore
README.md		README.md

zzangjinsun/MMDCE_RAL22

Folders and files

Latest commit

History

Repository files navigation

Adaptive Cost Volume Fusion Network for Multi-Modal Depth Estimation in Changing Environments

Jinsun Park*, Yongseop Jeong*, Kyungdon Joo, Donghyeon Cho, and In So Kweon [* Equal Contribution]

IEEE Robotics and Automation Letters (RAL), Feb 2022

IEEE International Conference on Robotics and Automation (ICRA), May 2022

News

Introduction

Experimental Results

Citation

Dependencies

Dataset Preparation

KITTI Multi-Modal Depth (KITTI MMD) Dataset

Multi-Modal Depth in Changing Environments (MMDCE) Dataset

Training

Testing

Pre-trained Models and Results

Notes

About

Resources

Stars

Watchers

Forks

Languages

Jinsun Park, Yongseop Jeong, Kyungdon Joo, Donghyeon Cho, and In So Kweon [* Equal Contribution]