Enable use of only specific attributes of output, etc., on LLMJudge #1433
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@sofi444 pointed out to me earlier today that, in the interest of using fewer tokens, you may want to apply an LLMJudge to only a part of the output, rather than the whole output, especially when you might use multiple LLMJudges with each judging different parts of the output.
This PR adds a new argument to LLMJudge to make it possible to specify attribute-access-paths (which I'm currently calling "accessors"; these also work as expected with mappings) so that you can specify to only provide a certain subset of the output as the argument to the
judge_output
function.Note that as implemented, this change also allows you to make use of the full evaluator context, including metadata, expected_output, metrics, attributes, rather than just the output. (And you can access specific (nested) attributes of these as well.)
I'll also note that while you might imagine wanting to be able to "filter" the output in more interesting ways before passing it to the LLMJudge, if that was the case, you could just move the relevant "filtering" in the task function itself (and change the returned output), so I think this is not too restrictive.
@alexmojaki I'm hoping you can help me clean up and improve the terminology, and the attribute access logic/syntax/behavior 🙏.
Needs better docstring, and any docs (like much of the rest of evals.. 😕) but hopefully the intent of the change is clear.