Mathvision bug fixes , Reproduce Qwen2.5VL results #660

RadhaGulhane13 · 2025-05-01T21:54:04Z

Description :
This PR fixes a bug in MathVision evaluation and includes improvements to reproduce the MathVision evaluation for Qwen2.5-VL.

The current MathVision evaluation codebase breaks due to the following issues:

The value for "until" is incorrectly set to "\n\n" by default, which causes the response to be truncated incorrectly.
Reference
The mathvision_doc_to_text(doc) function needs to accept lmms_eval_specific_kwargs as an argument.
Reference

Reproducing MathVision Results on Qwen2.5-3B-Instruct :

The current implementation produces a MathVision Standard Eval score of 0.46, which is incorrect. This issue occurs due to parsing only a truncated response (see Bug 1 above).

With the changes introduced in this PR and using a max_token_len of 2048, the official result of 21.68 for the Qwen2.5-3B-Instruct model can be reproduced.

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
mathvision_test	Yaml	none	0	mathvision_standard_eval	↑	21.68	±	N/A

RadhaGulhane13 added 2 commits April 29, 2025 04:29

Mathvision eval bug fix

07376be

Fix prompt to reduce parsing errors in respone

92cd5de

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mathvision bug fixes , Reproduce Qwen2.5VL results #660

Mathvision bug fixes , Reproduce Qwen2.5VL results #660

RadhaGulhane13 commented May 1, 2025 •

edited

Loading

Mathvision bug fixes , Reproduce Qwen2.5VL results #660

Are you sure you want to change the base?

Mathvision bug fixes , Reproduce Qwen2.5VL results #660

Conversation

RadhaGulhane13 commented May 1, 2025 • edited Loading

RadhaGulhane13 commented May 1, 2025 •

edited

Loading