Skip to content

Add Pydantic AI tool-call schema validator #240

@hidai25

Description

@hidai25

Why

"Wrong tool args" is one of the top regression modes for tool-calling agents — the right tool gets picked but with a bad payload, and unless you're validating, the diff just looks like an output change.

Pydantic AI already declares typed tool schemas. We can use those schemas at diff time to flag "the tool was called with arguments that don't validate" as a first-class regression class, separate from TOOLS_CHANGED.

What

Extend the Pydantic AI adapter (evalview/adapters/pydantic_ai_adapter.py) and the tool-call evaluator to surface schema-validation failures as a distinct reason code.

Acceptance criteria

  • Tool-call evaluator validates captured args against the Pydantic schema when available
  • New ReasonCode (or extension of an existing one) for "tool args failed schema validation"
  • Severity ranking documented (where does it sit vs. TOOLS_CHANGED and REGRESSION?)
  • Test in tests/evaluators/ covering: valid args pass, invalid args flagged
  • Docs updated in the evaluators section

Hints

  • evalview/evaluators/ has the orchestrator and per-eval modules.
  • evalview/core/types.py is where ReasonCode lives.
  • Keep it Pydantic-AI-specific for now; we can generalize to a ToolSchemaProvider ABC in a later PR if other adapters want to opt in.

Size

~2-3 hours.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions