We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
这是评测的脚本:torchrun --nproc-per-node=1 run.py --data MMMU_DEV_VAL --model llava_onevision_qwen2_0.5b_ov --verbose 评测下来的指标是: split validation dev Overall 0.3522222222222222 0.31333333333333335 Accounting 0.43333333333333335 0.0 Agriculture 0.36666666666666664 0.2 Architecture_and_Engineering 0.23333333333333334 0.2 Art 0.3333333333333333 0.0 Art_Theory 0.43333333333333335 0.6 使用的 gpt3.5 进行评测,和llava-one-vision 论文里面给的结果0.31,有很大的出入。这是怎么回事呢。
The text was updated successfully, but these errors were encountered:
Hi, @linxid , 我们推荐使用官方的 OPENAI API 进行评测,可以确认下你这边是否使用的是 OPENAI 官方 API。 依据我们的测试结果,与 0.31 没有显著差异 (见下图):
Sorry, something went wrong.
看样子还是有些差别,请问差别是来自哪里呀
No branches or pull requests
这是评测的脚本:torchrun --nproc-per-node=1 run.py --data MMMU_DEV_VAL --model llava_onevision_qwen2_0.5b_ov --verbose
评测下来的指标是:
split validation dev
Overall 0.3522222222222222 0.31333333333333335
Accounting 0.43333333333333335 0.0
Agriculture 0.36666666666666664 0.2
Architecture_and_Engineering 0.23333333333333334 0.2
Art 0.3333333333333333 0.0
Art_Theory 0.43333333333333335 0.6
使用的 gpt3.5 进行评测,和llava-one-vision 论文里面给的结果0.31,有很大的出入。这是怎么回事呢。
The text was updated successfully, but these errors were encountered: