Skip to content

Error with Subjective Judge in Creation MM Benchmark Using Custom Judge Model #1

@sankalp0412

Description

@sankalp0412

Hi,

I am trying to use the Creation MM benchmark to evaluate the Qwen-2.5 7B VLM. Since I don’t have access to the OpenAI API key, I’ve modified the code to use the Hyperbolic API and Meta LLaMA as the judge model.

For initial testing, I ran the workflow on only 5 data items from the dataset. As shown in the attached image, I am successfully retrieving results for the objective judge, but I encounter an error for the subjective judge. Could you please explain the difference between these two judges and why I might be seeing this error?

I also ran the following standalone script to verify access to the judge model, and that script works as expected.


from vlmeval.api import OpenAIWrapper
model = OpenAIWrapper('meta-llama/Llama-3.3-70B-Instruct', verbose=True, use_hyperbolic=True)
msgs = [dict(type='text', value='Hello!')]
code, answer, resp = model.generate_inner(msgs)
print(code, answer, resp)

(the code is slighly modified to accommodate hyperbolic api)

Image

Let me know if you would require more context.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions