We attempted to reproduce the paper's results on the video-mme dataset but were unable to achieve comparable performance. As shown in the figure, there is a significant gap in accuracy. Providing evaluation code for the dataset would greatly benefit the open-source community.

We attempted to reproduce the paper's results on the video-mme dataset but were unable to achieve comparable performance. As shown in the figure, there is a significant gap in accuracy. Providing evaluation code for the dataset would greatly benefit the open-source community.