-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Hi,
I am trying to use the Creation MM benchmark to evaluate the Qwen-2.5 7B VLM. Since I don’t have access to the OpenAI API key, I’ve modified the code to use the Hyperbolic API and Meta LLaMA as the judge model.
For initial testing, I ran the workflow on only 5 data items from the dataset. As shown in the attached image, I am successfully retrieving results for the objective judge, but I encounter an error for the subjective judge. Could you please explain the difference between these two judges and why I might be seeing this error?
I also ran the following standalone script to verify access to the judge model, and that script works as expected.
from vlmeval.api import OpenAIWrapper
model = OpenAIWrapper('meta-llama/Llama-3.3-70B-Instruct', verbose=True, use_hyperbolic=True)
msgs = [dict(type='text', value='Hello!')]
code, answer, resp = model.generate_inner(msgs)
print(code, answer, resp)
(the code is slighly modified to accommodate hyperbolic api)
Let me know if you would require more context.