Skip to content

Enable use of only specific attributes of output, etc., on LLMJudge #1433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dmontagu
Copy link
Contributor

@dmontagu dmontagu commented Apr 10, 2025

@sofi444 pointed out to me earlier today that, in the interest of using fewer tokens, you may want to apply an LLMJudge to only a part of the output, rather than the whole output, especially when you might use multiple LLMJudges with each judging different parts of the output.

This PR adds a new argument to LLMJudge to make it possible to specify attribute-access-paths (which I'm currently calling "accessors"; these also work as expected with mappings) so that you can specify to only provide a certain subset of the output as the argument to the judge_output function.

Note that as implemented, this change also allows you to make use of the full evaluator context, including metadata, expected_output, metrics, attributes, rather than just the output. (And you can access specific (nested) attributes of these as well.)

I'll also note that while you might imagine wanting to be able to "filter" the output in more interesting ways before passing it to the LLMJudge, if that was the case, you could just move the relevant "filtering" in the task function itself (and change the returned output), so I think this is not too restrictive.

@alexmojaki I'm hoping you can help me clean up and improve the terminology, and the attribute access logic/syntax/behavior 🙏.

Needs better docstring, and any docs (like much of the rest of evals.. 😕) but hopefully the intent of the change is clear.

Copy link

Docs Preview

commit: f98ef11
Preview URL: https://fa27cb02-pydantic-ai-previews.pydantic.workers.dev

@sofi444
Copy link

sofi444 commented Apr 15, 2025

Thanks @dmontagu !!

I think this would be very helpful in some cases. For example, think of the 'evaluation task' as a function that calls an agent.run and, instead of returning the agent's final response, return the result.all_messages() or similar method to look at the step-by-step process that the agent performed.

The name is not super clear to me, but I don't mind it too much (I am thinking of 'filters' - ?).

And yes, it is possible to add the filtering step to the evaluation task function but this way, we can use different accessors for different LLMJudges.

@alexmojaki
Copy link
Contributor

instead of returning the agent's final response, return the result.all_messages() or similar method to look at the step-by-step process that the agent performed.

I don't think the solution would be to change the return value of the task. Rather set result.all_messages() as an eval attribute, or get the messages from the span tree. The output should be the output.

This PR might confuse the model in the sense that we wrap everything from the accessors in <Output> in the prompt.

Suggesting input as an accessor is weird given the include_input flag.

@alexmojaki
Copy link
Contributor

Conclusion from discussion with @dmontagu:

  • default accessors to inputs, output and remove include_input
  • for each accessor, decide whether it should be passed to inputs or outputs of the judge
  • if neither makes sense (e.g name, expected_output, duration) then raise an error and we can relax that later if we want

@DouweM DouweM marked this pull request as draft April 30, 2025 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants