ICCV, 2025
Tianyi Zhao
·
Boyang liu
·
Yanglei Gao
·
Yiming Sun
·
Maoxun Yuan
·
Xingxing Wei
This repository is the official Pytorch implementation for the paper Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning.
If you have any questions, please feel free to open an issue or contact me with emails: [email protected]. Any kind discussions are welcomed!
Paper Links: ICCV 2025,
Please leave a STAR ⭐ if you like this project!
- Update on 2025/11/29: The full code and the guidence readme file have been released
- 🔥 Update on 2025/6/26: This work has been accepted by the top conference ICCV 2025 !
- Update on 2025/06/22: Release the M2D-LIF project repository.
If you find our work helpful for your research, please consider citing the following BibTeX entry.
@InProceedings{Zhao_2025_ICCV,
author= {Zhao, Tianyi and Liu, Boyang and Gao, Yanglei and Sun, Yiming and Yuan, Maoxun and Wei, Xingxing},
title= {Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month= {October},
year= {2025},
pages= {6364-6373}
}Multi-Modal Object Detection (MMOD), due to its stronger adaptability to various complex environments, has been widely applied in various applications. Extensive research is dedicated to the RGB-IR object detection, primarily focusing on how to integrate complementary features from RGB-IR modalities. However, they neglect the mono-modality insufficient learning problem, which arises from decreased feature extraction capability in multi-modal joint learning. This leads to a prevalent but unreasonable phenomenon\textemdash Fusion Degradation, which hinders the performance improvement of the MMOD model. Motivated by this, in this paper, we introduce linear probing evaluation to the multi-modal detectors and rethink the multi-modal object detection task from the mono-modality learning perspective. Therefore, we construct a novel framework called M2D-LIF, which consists of the Mono-Modality Distillation (M2D) method and the Local Illumination-aware Fusion (LIF) module. The M2D-LIF framework facilitates the sufficient learning of mono-modality during multi-modal joint training and explores a lightweight yet effective feature fusion manner to achieve superior object detection performance. Extensive experiments conducted on three MMOD datasets demonstrate that our M2D-LIF effectively mitigates the Fusion Degradation phenomenon and outperforms the previous SOTA detectors.
git clone https://github.com/Zhao-Tian-yi/M2D-LIF.git
cd M2D-LIF
conda env create -f environment.yaml
conda activate M2D-LIF
- FLIR
- LLVIP
- DroneVehicle
LLVIP_Mul/
├── images/
│ ├── train/
│ └── val/
├── images_ir/
│ ├── train/
│ └── val/
└── labels/
├── train/
└── val/
! Corresponding files in each folder share the same base filename. All label files have been converted to the YOLO format to be compatible with the Ultralytics framework. Note that the detailed dataset descriptions can be found in the [YOLOV8](ultralytics/ultralytics: Ultralytics YOLO 🚀) documentation.
Model weights released at: https://pan.baidu.com/s/1GKDkfhJrKeskrnDNRzmFXw?pwd=vmvr
- After the environment and the dataset was downloaded, you can download the checkpoint we supply.
- change the dataset PATH in the dataset yaml file in
./data - change the PATH in the
val.pyorval_obb.pyfile, including model and data. For example:
# val.py
if __name__ == '__main__':
model = YOLO(r"./your_ckpt_path/LLVIP/best_checkpoint.pt")
data = r"./data/LLVIP.yaml"
batch = 1
device = 0
imgsz = 640
DEFAULT_CFG.save_dir = f"./runs/v8m/val"
model.val(data=data, batch=batch, imgsz=imgsz, device=device, save=True, rect=True)
# val_obb.py
from ultralytics.models.yolo.obb import OBBValidator
if __name__ == '__main__':
data = r"./data/DroneVehicle.yaml"
args = dict(model="/your_ckpt_path/DroneVehicle.pt",
data=data,
device=0,
imgsz=640, batch=1, save=True, rect=True)
validator = OBBValidator(args=args)
validator(model=args["model"])-
evaluation
python val.py/val_obb.py
Email: [email protected].


