Dingjie Song, Sicheng Lai, Shunian Chen, Lichao Sun, Benyou Wang*
-
🏆 Accepted to ICML 2025 Workshop DIG-BUG
📅 Date of Acceptance: June 2025
🎤 Oral Presentation
The rapid progression of multimodal large language models (MLLMs) has demonstrated superior performance on various multimodal benchmarks. However, the issue of data contamination during training creates challenges in performance evaluation and comparison. While numerous methods exist for detecting dataset contamination in large language models (LLMs), they are less effective for MLLMs due to their various modalities and multiple training phases. Therefore, we introduce a multimodal data contamination detection framework, MM-Detect. Besides, we employ a heuristic method to discern whether the contamination originates from the pre-training phase of LLMs.
git clone https://github.com/FreedomIntelligence/MM-Detect.git
conda create -n MM-Detect python=3.10
cd MM-Detect
pip install torch==2.1.2
pip install -r requirements.txt
pip install googletrans==3.1.0a0
pip install httpx==0.27.2
Ensure that your system has Java installed to enable the use of the Stanford POS Tagger.
sudo apt update
sudo apt install openjdk-11-jdk
Our codebase supports the following models on ScienceQA, MMStar, COCO-Caption, Nocaps and Vintage:
-
White-box Models:
LLaVA-1.5
VILA1.5
Qwen-VL-Chat
idefics2
Phi-3-vision-instruct
Yi-VL
InternVL2
DeepSeek-VL2
-
Grey-box Models:
fuyu
-
Black-box Models:
GPT-4o
Gemini-1.5-Pro
Claude-3.5-Sonnet
🔐 Important: When detecting contamination of black-box models, ensure to add your API key at Line 26
in mm_detect/mllms/gpt.py
:
api_key='your-api-key'
🌱 To save intermediate results and enable the Resume function, please add your output_dir at line 77
in multimodal_methods/option_order_sensitivity_test.py
and at line 104
in multimodal_methods/slot_guessing_for_perturbation_caption.py
:
results_file = "output_dir/results.json"
📌 To run contamination detection for MLLMs, you can follow the multiple test scripts in scripts/tests/mllms
folder. For instance, use the following command to run Option Order Sensitivity Test on ScienceQA with GPT-4o:
bash scripts/mllms/option_order_sensitivity_test/test_ScienceQA.sh -m gpt-4o
We support the following LLMs on MMStar:
- LLMs:
LLaMA2
Qwen
Internlm2
Mistral
Phi-3-instruct
Yi
DeepSeek-MoE-Chat
📌 For instance, use the following command to run the Qwen-7B:
bash scripts/llms/detect_pretrain/test_MMStar.sh -m Qwen/Qwen-7B
⭐ If you find our implementation and paper helpful, please consider citing our work and starring the repository⭐:
@misc{song2024textimagesleakedsystematic,
title={Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination},
author={Dingjie Song and Sicheng Lai and Shunian Chen and Lichao Sun and Benyou Wang},
year={2024},
eprint={2411.03823},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.03823},
}
If you encounter the following error when using googletrans:
AttributeError: module 'httpcore' has no attribute 'SyncHTTPTransport'
please refer to the solution provided on this Stack Overflow page for further guidance.