Skip to content

Commit 3fd8f22

Browse files
alfassyelronbandel
andauthored
Vision bench update (#1704)
* Added LMMS_eval template, exact_match_mm metric, RGB image augmentor. * Revert "Added LMMS_eval template, exact_match_mm metric, RGB image augmentor." This reverts commit fbf63e1. * up to date benchmark results * Fix ruff Signed-off-by: elronbandel <[email protected]> --------- Signed-off-by: elronbandel <[email protected]> Co-authored-by: Elron Bandel <[email protected]>
1 parent ba416fc commit 3fd8f22

File tree

3 files changed

+16
-17
lines changed

3 files changed

+16
-17
lines changed

examples/evaluate_llama_vision_benchmark.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,9 @@
4141
"""
4242
| subset | score | score_name | num_of_instances |
4343
|:---------|---------:|:----------------|-------------------:|
44-
| ALL | 0.553122 | subsets_mean | 120 |
45-
| doc_vqa | 0.666774 | anls | 30 |
46-
| info_vqa | 0.51238 | anls | 30 |
44+
| ALL | 0.570827 | subsets_mean | 120 |
45+
| doc_vqa | 0.704262 | anls | 30 |
46+
| info_vqa | 0.545713 | anls | 30 |
4747
| chart_qa | 0.266667 | relaxed_overall | 30 |
4848
| ai2d | 0.766667 | exact_match_mm | 30 |
4949
"""

examples/evaluate_vision_default_benchmark.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,10 @@
4040
Using LLAMA-VISION-11B
4141
| subset | score | score_name | num_of_instances |
4242
|:---------|---------:|:----------------|-------------------:|
43-
| ALL | 0.384752 | subsets_mean | 150 |
44-
| doc_vqa | 0.717027 | anls | 30 |
45-
| info_vqa | 0.485069 | anls | 30 |
43+
| ALL | 0.432395 | subsets_mean | 150 |
44+
| doc_vqa | 0.802078 | anls | 30 |
45+
| info_vqa | 0.506233 | anls | 30 |
4646
| chart_qa | 0.266667 | relaxed_overall | 30 |
4747
| ai2d | 0.1 | exact_match_mm | 30 |
48-
| websrc | 0.355 | websrc_squad_f1 | 30 |
48+
| websrc | 0.487 | websrc_squad_f1 | 30 |
4949
"""

examples/evaluate_vision_full_benchmark.py

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -35,18 +35,17 @@
3535
print("Subsets scores:")
3636
print(results.subsets_scores.summary)
3737
"""
38-
Using LLAMA-VISION-11B
3938
| subset | score | score_name | num_of_instances |
4039
|:-------------------------------|---------:|:----------------|-------------------:|
41-
| ALL | 0.462355 | subsets_mean | 4608 |
42-
| doc_vqa_default | 0.7814 | anls | 512 |
43-
| info_vqa_default | 0.562389 | anls | 512 |
44-
| chart_qa_default | 0.197266 | relaxed_overall | 512 |
40+
| ALL | 0.482178 | subsets_mean | 4608 |
41+
| doc_vqa_default | 0.817023 | anls | 512 |
42+
| info_vqa_default | 0.530652 | anls | 512 |
43+
| chart_qa_default | 0.199219 | relaxed_overall | 512 |
4544
| ai2d_default | 0.126953 | exact_match_mm | 512 |
46-
| websrc_default | 0.371 | websrc_squad_f1 | 512 |
47-
| doc_vqa_llama_vision_template | 0.653235 | anls | 512 |
48-
| info_vqa_llama_vision_template | 0.508014 | anls | 512 |
49-
| chart_qa_llama_vision_template | 0.197266 | relaxed_overall | 512 |
50-
| ai2d_llama_vision_template | 0.763672 | exact_match_mm | 512 |
45+
| websrc_default | 0.465 | websrc_squad_f1 | 512 |
46+
| doc_vqa_llama_vision_template | 0.715098 | anls | 512 |
47+
| info_vqa_llama_vision_template | 0.514954 | anls | 512 |
48+
| chart_qa_llama_vision_template | 0.201172 | relaxed_overall | 512 |
49+
| ai2d_llama_vision_template | 0.769531 | exact_match_mm | 512 |
5150
5251
"""

0 commit comments

Comments
 (0)