Skip to content

Latest commit

 

History

History
137 lines (114 loc) · 4.57 KB

README.md

File metadata and controls

137 lines (114 loc) · 4.57 KB

🕵️ MM-Detect: The First Multimodal Data Contamination Detection Framework

🤗 Paper | 📖 arXiv

Overview

The rapid progression of multimodal large language models (MLLMs) has demonstrated superior performance on various multimodal benchmarks. However, the issue of data contamination during training creates challenges in performance evaluation and comparison. While numerous methods exist for detecting dataset contamination in large language models (LLMs), they are less effective for MLLMs due to their various modalities and multiple training phases. Therefore, we introduce a multimodal data contamination detection framework, MM-Detect. Besides, we employ a heuristic method to discern whether the contamination originates from the pre-training phase of LLMs.

MM-Detect

🤖 Environment Setup

git clone https://github.com/FreedomIntelligence/MM-Detect.git
conda create -n MM-Detect python=3.11.8
cd MM-Detect
pip install torch==2.1.2
pip install -r requirements.txt

🚀 Run MM-Detect

Our codebase supports the following models on ScienceQA, MMStar, COCO-Caption, Nocaps and Vintage:

  • White-box Models:

    • LLaVA-1.5
    • VILA1.5
    • Qwen-VL-Chat
    • idefics2
    • Phi-3-vision-instruct
    • Yi-VL
    • InternVL2
  • Grey-box Models:

    • fuyu
  • Black-box Models:

    • GPT-4o
    • Gemini-1.5-Pro
    • Claude-3.5-Sonnet

🔐 Important: When detecting contamination of black-box models, ensure to add your API key at Line 26 in mm_detect/mllms/gpt.py:

api_key='your-api-key'

📌 To run contamination detection for MLLMs, you can follow the multiple test scripts in scripts/tests/mllms folder. For instance, use the following command to run Option Order Sensitivity Test on ScienceQA with GPT-4o:

bash scripts/mllms/option_order_sensitivity_test/test_ScienceQA.sh -m gpt-4o

🔍 Discern the Source of Contamination

We support the following LLMs on MMStar:

  • LLMs:
    • LLaMA2
    • Qwen
    • Internlm2
    • Mistral
    • Phi-3-instruct
    • Yi

📌 For instance, use the following command to run the Qwen-7B:

bash scripts/llms/detect_pretrain/test_MMStar.sh -m Qwen/Qwen-7B

Citation

⭐ If you find our implementation and paper helpful, please consider citing our work ⭐:

@misc{song2024textimagesleakedsystematic,
  title={Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination},
  author={Dingjie Song and Sicheng Lai and Shunian Chen and Lichao Sun and Benyou Wang},
  year={2024},
  eprint={2411.03823},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2411.03823},
}

Acknowledgement