Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning

ICCV, 2025
Tianyi Zhao · Boyang liu · Yanglei Gao · Yiming Sun · Maoxun Yuan · Xingxing Wei

This repository is the official Pytorch implementation for the paper Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning.

If you have any questions, please feel free to open an issue or contact me with emails: [email protected]. Any kind discussions are welcomed!

Paper Links: ICCV 2025,

Please leave a STAR ⭐ if you like this project!

🔥News

Update on 2025/11/29: The full code and the guidence readme file have been released
🔥 Update on 2025/6/26: This work has been accepted by the top conference ICCV 2025 !
Update on 2025/06/22: Release the M2D-LIF project repository.

❤️Citation

If you find our work helpful for your research, please consider citing the following BibTeX entry.

@InProceedings{Zhao_2025_ICCV,
    author= {Zhao, Tianyi and Liu, Boyang and Gao, Yanglei and Sun, Yiming and Yuan, Maoxun and Wei, Xingxing},
    title= {Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month= {October},
    year= {2025},
    pages= {6364-6373}
}

📑 Abstarct

Multi-Modal Object Detection (MMOD), due to its stronger adaptability to various complex environments, has been widely applied in various applications. Extensive research is dedicated to the RGB-IR object detection, primarily focusing on how to integrate complementary features from RGB-IR modalities. However, they neglect the mono-modality insufficient learning problem, which arises from decreased feature extraction capability in multi-modal joint learning. This leads to a prevalent but unreasonable phenomenon\textemdash Fusion Degradation, which hinders the performance improvement of the MMOD model. Motivated by this, in this paper, we introduce linear probing evaluation to the multi-modal detectors and rethink the multi-modal object detection task from the mono-modality learning perspective. Therefore, we construct a novel framework called M2D-LIF, which consists of the Mono-Modality Distillation (M2D) method and the Local Illumination-aware Fusion (LIF) module. The M2D-LIF framework facilitates the sufficient learning of mono-modality during multi-modal joint training and explores a lightweight yet effective feature fusion manner to achieve superior object detection performance. Extensive experiments conducted on three MMOD datasets demonstrate that our M2D-LIF effectively mitigates the Fusion Degradation phenomenon and outperforms the previous SOTA detectors.

🔨Environment Installation

git clone https://github.com/Zhao-Tian-yi/M2D-LIF.git
cd M2D-LIF
conda env create -f environment.yaml
conda activate M2D-LIF

Datasets

FLIR
LLVIP
DroneVehicle

Dataset Organization Format

LLVIP_Mul/
├── images/
│   ├── train/
│   └── val/
├── images_ir/
│   ├── train/
│   └── val/
└── labels/
    ├── train/
    └── val/

! Corresponding files in each folder share the same base filename. All label files have been converted to the YOLO format to be compatible with the Ultralytics framework. Note that the detailed dataset descriptions can be found in the [YOLOV8](ultralytics/ultralytics: Ultralytics YOLO 🚀) documentation.

Weights

Model weights released at: https://pan.baidu.com/s/1GKDkfhJrKeskrnDNRzmFXw?pwd=vmvr

Evaluation

After the environment and the dataset was downloaded, you can download the checkpoint we supply.
change the dataset PATH in the dataset yaml file in ./data
change the PATH in the val.py or val_obb.py file, including model and data. For example:

# val.py
if __name__ == '__main__':
    model = YOLO(r"./your_ckpt_path/LLVIP/best_checkpoint.pt")
    data = r"./data/LLVIP.yaml"
    batch = 1
    device = 0
    imgsz = 640

    DEFAULT_CFG.save_dir = f"./runs/v8m/val"
    model.val(data=data, batch=batch, imgsz=imgsz, device=device, save=True, rect=True)
    
# val_obb.py
from ultralytics.models.yolo.obb import OBBValidator
if __name__ == '__main__':
    data = r"./data/DroneVehicle.yaml"

    args = dict(model="/your_ckpt_path/DroneVehicle.pt", 					
                data=data,
               	device=0,
                imgsz=640, batch=1, save=True, rect=True)
    validator = OBBValidator(args=args)
    validator(model=args["model"])

evaluation
```
python val.py/val_obb.py
```

Experiments Results

1. DroneVehicle

2. FLIR and LLVIP

Contacts

Email: [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
README.assets		README.assets
data		data
docker		docker
docs		docs
examples		examples
model_yaml		model_yaml
model_yaml_obb		model_yaml_obb
teacherTraining		teacherTraining
ultralytics		ultralytics
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml
readme.md		readme.md
train_dist.py		train_dist.py
train_dist_obb.py		train_dist_obb.py
val.py		val.py
val_obb.py		val_obb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning

Paper Links: ICCV 2025,

🔥News

❤️Citation

📑 Abstarct

🔨Environment Installation

Datasets

Dataset Organization Format

Weights

Evaluation

Experiments Results

1. DroneVehicle

2. FLIR and LLVIP

Contacts

Star History

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Zhao-Tian-yi/M2D-LIF

Folders and files

Latest commit

History

Repository files navigation

Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning

Paper Links: ICCV 2025,

🔥News

❤️Citation

📑 Abstarct

🔨Environment Installation

Datasets

Dataset Organization Format

Weights

Evaluation

Experiments Results

1. DroneVehicle

2. FLIR and LLVIP

Contacts

Star History

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages