The rapid progression of multimodal large language models (MLLMs) has demonstrated superior performance on various multimodal benchmarks. However, the issue of data contamination during training creates challenges in performance evaluation and comparison. While numerous methods exist for detecting dataset contamination in large language models (LLMs), they are less effective for MLLMs due to their various modalities and multiple training phases. Therefore, we introduce a multimodal data contamination detection framework, MM-Detect. Besides, we employ a heuristic method to discern whether the contamination originates from the pre-training phase of LLMs.
git clone https://github.com/FreedomIntelligence/MM-Detect.git
conda create -n MM-Detect python=3.11.8
cd MM-Detect
pip install torch==2.1.2
pip install -r requirements.txt
Our codebase supports the following models on ScienceQA, MMStar, COCO-Caption, Nocaps and Vintage:
-
White-box Models:
LLaVA-1.5
VILA1.5
Qwen-VL-Chat
idefics2
Phi-3-vision-instruct
Yi-VL
InternVL2
-
Grey-box Models:
fuyu
-
Black-box Models:
GPT-4o
Gemini-1.5-Pro
Claude-3.5-Sonnet
🔐 Important: When detecting contamination of black-box models, ensure to add your API key at Line 26
in mm_detect/mllms/gpt.py
:
api_key='your-api-key'
📌 To run contamination detection for MLLMs, you can follow the multiple test scripts in scripts/tests/mllms folder. For instance, use the following command to run Option Order Sensitivity Test on ScienceQA with GPT-4o:
bash scripts/mllms/option_order_sensitivity_test/test_ScienceQA.sh -m gpt-4o
We support the following LLMs on MMStar:
- LLMs:
LLaMA2
Qwen
Internlm2
Mistral
Phi-3-instruct
Yi
📌 For instance, use the following command to run the Qwen-7B:
bash scripts/llms/detect_pretrain/test_MMStar.sh -m Qwen/Qwen-7B
⭐ If you find our implementation and paper helpful, please consider citing our work ⭐:
@misc{song2024textimagesleakedsystematic,
title={Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination},
author={Dingjie Song and Sicheng Lai and Shunian Chen and Lichao Sun and Benyou Wang},
year={2024},
eprint={2411.03823},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.03823},
}