Skip to content

ConfidentInstrumentationSettings with pydantic-ai: tools_called, expected_tools, and actual_output are all None when using OpenAIResponsesModel #2508

@anuar12

Description

@anuar12

When using the pydantic-ai integration with ConfidentInstrumentationSettings, the integration does not capture actual_output, tools_called, or expected_tools from the agent execution trace. All three are None in the resulting test case, which causes ToolCorrectnessMetric to crash with:

deepeval.errors.MissingTestCaseParamsError: 'tools_called' and 'expected_tools' cannot be None for the 'Tool Correctness' metric

And even for metrics that don't require tools (e.g., AnswerRelevancyMetric), actual_output is None so scoring fails.

Versions:
deepeval==3.8.6
pydantic-ai==1.31.0
Python 3.12

Minimal reproduction:

import asyncio
from pydantic_ai import Agent
from deepeval.integrations.pydantic_ai.instrumentator import ConfidentInstrumentationSettings
from deepeval.metrics import ToolCorrectnessMetric, AnswerRelevancyMetric
from deepeval.dataset import EvaluationDataset, Golden
from deepeval.test_case import ToolCall

metrics = [ToolCorrectnessMetric(), AnswerRelevancyMetric()]

agent = Agent(
    "gpt-4.1-mini",
    instructions="You are a helpful assistant.",
    instrument=ConfidentInstrumentationSettings(
        is_test_mode=True,
        agent_metrics=metrics,
    ),
)

dataset = EvaluationDataset(
    goldens=[
        Golden(
            input="What does NDA stand for?",
            expected_tools=[ToolCall(name="some_tool")],
        ),
    ]
)

async def run_agent(input_text: str):
    result = await agent.run(input_text)
    return result.output

for golden in dataset.evals_iterator(metrics=metrics):
    task = asyncio.create_task(run_agent(golden.input))
    dataset.evaluate(task)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions