This repository complements the paper Large Multimodal Models Evaluation: A Survey and organizes benchmarks and resources across understanding (general and specialized), generation, and community platforms. It serves as a hub for researchers to find key datasets, papers, and code.
We will continuously maintain and update this repo to ensure long-term value for the community.
Paper: SCIS Project Page: AIBench / LMM Evaluation Survey
We welcome pull requests (PRs)! If you contribute five or more valid benchmarks with relevant details, your contribution will be acknowledged in the next update of the paper's Acknowledgment section.
Come on and join us !!
If you find our work useful, please give us a star. Thank you !!
If you find our work useful, please cite our paper as:
@article{zhang2025large,
author = {Zhang, Zicheng and Wang, Junying and Wen, Farong and Guo, Yijin and Zhao, Xiangyu and Fang, Xinyu and Ding, Shengyuan and Jia, Ziheng and Xiao, Jiahao and Shen, Ye and Zheng, Yushuo and Zhu, Xiaorong and Wu, Yalun and Jiao, Ziheng and Sun, Wei and Chen, Zijian and Zhang, Kaiwei and Fu, Kang and Cao, Yuqin and Hu, Ming and Zhou, Yue and Zhou, Xuemei and Cao, Juntai and Zhou, Wei and Cao, Jinyu and Li, Ronghui and Zhou, Donghao and Tian, Yuan and Zhu, Xiangyang and Li, Chunyi and Wu, Haoning and Liu, Xiaohong and He, Junjun and Zhou, Yu and Liu, Hui and Zhang, Lin and Wang, Zesheng and Duan, Huiyu and Zhou, Yingjie and Min, Xiongkuo and Jia, Qi and Zhou, Dongzhan and Zhang, Wenlong and Cao, Jiezhang and Yang, Xue and Yu, Junzhi and Zhang, Songyang and Duan, Haodong and Zhai, Guangtao},
title = {Large Multimodal Models Evaluation: A Survey},
journal = {SCIENCE CHINA Information Sciences},
year = {2025},
volume = {},
pages = {},
url = {https://www.sciengine.com/SCIS/doi/10.1007/s11432-025-4676-4},
doi = {https://doi.org/10.1007/s11432-025-4676-4}
}- Large Multimodal Models Evaluation: A Survey
| Benchmark | Paper | Project Page |
|---|---|---|
| Self-rag | Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection | GitHub |
| AMEM | A-MEM: Agentic Memory for LLM Agents | GitHub |
