ConfidentInstrumentationSettings with pydantic-ai: tools_called, expected_tools, and actual_output are all None when using OpenAIResponsesModel

When using the [pydantic-ai integration](https://deepeval.com/integrations/frameworks/pydanticai) with ConfidentInstrumentationSettings, the integration does not capture actual_output, tools_called, or expected_tools from the agent execution trace. All three are None in the resulting test case, which causes ToolCorrectnessMetric to crash with:

`deepeval.errors.MissingTestCaseParamsError: 'tools_called' and 'expected_tools' cannot be None for the 'Tool Correctness' metric`

And even for metrics that don't require tools (e.g., AnswerRelevancyMetric), actual_output is None so scoring fails.

Versions:
`deepeval==3.8.6`
`pydantic-ai==1.31.0`
`Python 3.12`

**Minimal reproduction:**

```python
import asyncio
from pydantic_ai import Agent
from deepeval.integrations.pydantic_ai.instrumentator import ConfidentInstrumentationSettings
from deepeval.metrics import ToolCorrectnessMetric, AnswerRelevancyMetric
from deepeval.dataset import EvaluationDataset, Golden
from deepeval.test_case import ToolCall

metrics = [ToolCorrectnessMetric(), AnswerRelevancyMetric()]

agent = Agent(
    "gpt-4.1-mini",
    instructions="You are a helpful assistant.",
    instrument=ConfidentInstrumentationSettings(
        is_test_mode=True,
        agent_metrics=metrics,
    ),
)

dataset = EvaluationDataset(
    goldens=[
        Golden(
            input="What does NDA stand for?",
            expected_tools=[ToolCall(name="some_tool")],
        ),
    ]
)

async def run_agent(input_text: str):
    result = await agent.run(input_text)
    return result.output

for golden in dataset.evals_iterator(metrics=metrics):
    task = asyncio.create_task(run_agent(golden.input))
    dataset.evaluate(task)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConfidentInstrumentationSettings with pydantic-ai: tools_called, expected_tools, and actual_output are all None when using OpenAIResponsesModel #2508

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ConfidentInstrumentationSettings with pydantic-ai: tools_called, expected_tools, and actual_output are all None when using OpenAIResponsesModel #2508

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions