Skip to content

BinaryContent is naively parsed when include_input is used in LLMJudge #2089

@plutasnyy

Description

@plutasnyy

Initial Checks

Description

When BinaryContent is used as input to the LLM and later included in LLMJudge, instead of being parsed as an image it is put as sequence of characters and 1 image exceeds 200k context window token.

Expceted:

  • It should be passed as an image

Image

Example Code

hallucination_dataset = Dataset(
    cases=[
        Case(
            name="test_case1",
            inputs=BinaryContent(
                data=(DATASET_DIR / "my_file.png").read_bytes(), media_type="image/png"
            ),
        )
    ],
    evaluators=[        
        LLMJudge(
            rubric=JUDGE_PROMPT,
            include_input=True,
            model='anthropic:claude-3-7-sonnet-latest',  
        )],
)


report = hallucination_dataset.evaluate_sync(run_prediction)
report.print(include_input=False, include_output=True, include_durations=True)

Python, Pydantic AI & LLM client version

0.3.4

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions