Skip to content

Mathvision bug fixes , Reproduce Qwen2.5VL results #660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

RadhaGulhane13
Copy link

@RadhaGulhane13 RadhaGulhane13 commented May 1, 2025

Description :
This PR fixes a bug in MathVision evaluation and includes improvements to reproduce the MathVision evaluation for Qwen2.5-VL.

The current MathVision evaluation codebase breaks due to the following issues:

  • The value for "until" is incorrectly set to "\n\n" by default, which causes the response to be truncated incorrectly.
    Reference

  • The mathvision_doc_to_text(doc) function needs to accept lmms_eval_specific_kwargs as an argument.
    Reference

Reproducing MathVision Results on Qwen2.5-3B-Instruct :

The current implementation produces a MathVision Standard Eval score of 0.46, which is incorrect. This issue occurs due to parsing only a truncated response (see Bug 1 above).

With the changes introduced in this PR and using a max_token_len of 2048, the official result of 21.68 for the Qwen2.5-3B-Instruct model can be reproduced.

Tasks Version Filter n-shot Metric Value Stderr
mathvision_test Yaml none 0 mathvision_standard_eval 21.68 ± N/A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant