Enable use of only specific attributes of output, etc., on LLMJudge #1433

dmontagu · 2025-04-10T04:05:19Z

@sofi444 pointed out to me earlier today that, in the interest of using fewer tokens, you may want to apply an LLMJudge to only a part of the output, rather than the whole output, especially when you might use multiple LLMJudges with each judging different parts of the output.

This PR adds a new argument to LLMJudge to make it possible to specify attribute-access-paths (which I'm currently calling "accessors"; these also work as expected with mappings) so that you can specify to only provide a certain subset of the output as the argument to the judge_output function.

Note that as implemented, this change also allows you to make use of the full evaluator context, including metadata, expected_output, metrics, attributes, rather than just the output. (And you can access specific (nested) attributes of these as well.)

I'll also note that while you might imagine wanting to be able to "filter" the output in more interesting ways before passing it to the LLMJudge, if that was the case, you could just move the relevant "filtering" in the task function itself (and change the returned output), so I think this is not too restrictive.

@alexmojaki I'm hoping you can help me clean up and improve the terminology, and the attribute access logic/syntax/behavior 🙏.

Needs better docstring, and any docs (like much of the rest of evals.. 😕) but hopefully the intent of the change is clear.

github-actions · 2025-04-10T04:11:17Z

Docs Preview

commit:	`f98ef11`
Preview URL:	https://fa27cb02-pydantic-ai-previews.pydantic.workers.dev

sofi444 · 2025-04-15T09:04:36Z

Thanks @dmontagu !!

I think this would be very helpful in some cases. For example, think of the 'evaluation task' as a function that calls an agent.run and, instead of returning the agent's final response, return the result.all_messages() or similar method to look at the step-by-step process that the agent performed.

The name is not super clear to me, but I don't mind it too much (I am thinking of 'filters' - ?).

And yes, it is possible to add the filtering step to the evaluation task function but this way, we can use different accessors for different LLMJudges.

alexmojaki · 2025-04-15T13:13:07Z

instead of returning the agent's final response, return the result.all_messages() or similar method to look at the step-by-step process that the agent performed.

I don't think the solution would be to change the return value of the task. Rather set result.all_messages() as an eval attribute, or get the messages from the span tree. The output should be the output.

This PR might confuse the model in the sense that we wrap everything from the accessors in <Output> in the prompt.

Suggesting input as an accessor is weird given the include_input flag.

alexmojaki · 2025-04-22T15:35:18Z

Conclusion from discussion with @dmontagu:

default accessors to inputs, output and remove include_input
for each accessor, decide whether it should be passed to inputs or outputs of the judge
if neither makes sense (e.g name, expected_output, duration) then raise an error and we can relax that later if we want

Improve LLMJudge

f98ef11

dmontagu assigned alexmojaki Apr 10, 2025

DouweM marked this pull request as draft April 30, 2025 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable use of only specific attributes of output, etc., on LLMJudge #1433

Enable use of only specific attributes of output, etc., on LLMJudge #1433

Uh oh!

dmontagu commented Apr 10, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 10, 2025

Uh oh!

sofi444 commented Apr 15, 2025

Uh oh!

alexmojaki commented Apr 15, 2025

Uh oh!

alexmojaki commented Apr 22, 2025

Uh oh!

Uh oh!

Enable use of only specific attributes of output, etc., on LLMJudge #1433

Are you sure you want to change the base?

Enable use of only specific attributes of output, etc., on LLMJudge #1433

Uh oh!

Conversation

dmontagu commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 10, 2025

Docs Preview

Uh oh!

sofi444 commented Apr 15, 2025

Uh oh!

alexmojaki commented Apr 15, 2025

Uh oh!

alexmojaki commented Apr 22, 2025

Uh oh!

Uh oh!

dmontagu commented Apr 10, 2025 •

edited

Loading