Notes on structured output strategies #1615

dmontagu · 2025-04-28T21:14:23Z

@DouweM is looking into this and asked me to write up some notes on the requirements our approach to supporting different output strategies will need to satisfy:

Until https://peps.python.org/pep-0747/ is supported by type-checkers (specifically, pyright), we can't have output_type accept unions directly (i.e., something like Agent(..., output_type=A | B).
- I believe this also is related to why you get a pyright error for output_type=Annotated[A, ...].
- Note that we could introduce a generic type as a workaround for this whose only purpose was to accept type forms as the generic parameter so they can be passed without a type error. I.e., add a generic class called TypeValue and do output_type=TypeValue[A | Annotated[B, ...]] or similar. But this is kind of ugly and becomes unnecessary if/when PEP-747 is supported by type-checkers, so 🤷‍♂ probably not worth it.
We need to support using callables as the way to specify the output schema and call it:
- This is important because if you plan to call a function with the output of a tool call, and the function can error, you want the error to go straight back to the model rather than needing to re-call the model, etc.
- The return type of the callable should be the return type of the agent run. (I.e., it should be compatible with the value of the second generic parameter to Agent.)
- ToolOutput needs to support callables. (I already have an open draft PR for this, but it needs to be finished.)
- New forms of output should also support callables. In particular, even if I'm using response_format or just parsing the content directly, I should be able to use that as the input to a callable and go back to the model if the call fails. And I should not need to manually define a struct with the same fields as the callable, and the output of the callable should be the type
We need a way to provide a list of ToolOutput (for X = Tool, etc.) as a way to more explicitly control precisely what outputs the model can produce. In particular, this gives a way to use ToolOutput(type_=str) alongside other tool outputs, and a way to set the tool name for each. It also gives a way to use an anyOf JSON schema as the parameters schema for a tool call. (Some/most models may not support this today, but there's nothing wrong with it in principle.)
- Note that it probably doesn't make sense to support a list of ResponseFormatOutput or ContentOutput strategies (or whatever we choose to call them) since there isn't a good mechanism to specify which one is being used. But I'd be open to it if we did come up with such a mechanism. (At least for content output, it's more obvious how this could be done, but I'm not sure it's necessary.)

Eventually, I think we need a better native agent handoff, which is in many ways like a callable ToolOutput where the call is the relevant agent.run method. Maybe allowing an agent as an output strategy is enough, I'm not sure. (Maybe it requires some care to get multi-agent streaming to work nicely across handoffs.) But I wouldn't get too caught up on this now unless it's obvious how it could/should work.

The text was updated successfully, but these errors were encountered:

nzlzcat · 2025-04-30T19:02:29Z

First of all, thanks for this effort!

Few comments partially related to the above @dmontagu , I have serious concerns about the output_type for the agent and its wrap around OutputSchema/OutputSchemaTool. Is this effort going to solve the following? 👇

If output_type is provided to the agent, and that has already been returned by one of the tool_calls, we should be directly completing the graph with that result, not creating a new request. I don't even think this should be optional or configurable, but default behavior.

Currently all our code is making one final (accumulatively-expensive) call to wrap-up and duplicate. [https://github.com//issues/127]

pydantic-ai/pydantic_ai_slim/pydantic_ai/_agent_graph.py

Line 459 in 5fab108

for call, output_tool in output_schema.find_tool(tool_calls):

I assume the behavior of the line above should be different, and it should allow matching tool result type against output_type. Maybe not for str if you want to have some additional control... but at least for pydantic types.

Alternatively, exposing the agent graph and allowing to manually iterate nodes would be very useful. This is currently not possible and makes building custom solutions around the pydantic_ai Agent very hard.

In the following example, the agent.run should be able to respond with a single model request.

class Response(BaseModel):
    response: str

my_agent = Agent(
    model="openai:gpt-4o",
    instrument=True,
    output_type=Response,
    system_prompt="Make the operation the user wants",
    model_settings={"temperature": 0.0},
)

@my_agent.tool_plain
async def sum_numbers(args: Foo) -> Response:
    result = args.num1 + args.num2
    # Already provides the result in the final output type
    return Response(response="The sum is: " + str(result))

DouweM · 2025-04-30T22:18:41Z

Alternatively, exposing the agent graph and allowing to manually iterate nodes would be very useful. This is currently not possible and makes building custom solutions around the pydantic_ai Agent very hard.

@nzlzcat Have you seen https://ai.pydantic.dev/agents/#iterating-over-an-agents-graph?

In the following example, the agent.run should be able to respond with a single model request.

With the callable output_type feature being implemented in #1463, you'd be able to pass output_type=sum_numbers (or even a list of multiple functions), that the LLM would call as a final response, with the actual agent run output being the Response returned by that function. Would that work for your use case?

nzlzcat · 2025-05-02T09:02:22Z

@nzlzcat Have you seen https://ai.pydantic.dev/agents/#iterating-over-an-agents-graph?

Hadn't seen that particular option, thanks!

... Would that work for your use case?

Waiting for changes at #1628 . What is the reasoning behind solving it via output_type callables and not via tools? Do you envision 4 ways to declare tools?

tool
tool_plain
mcp_server_tool
output_tool <-- new?

Quote

Tools or functions are also used to define the schema(s) for structured responses, thus a model might have access to many tools, some of which call function tools while others end the run and produce a final output.

Do these 2 "types" have, or will have different decorators?

Or that is supposed to be handled via StructuredOutput? Will StructuredOutput(type_=Response) also avoid the additional wrap-up request?

DouweM · 2025-05-02T21:56:22Z

@nzlzcat output_type specifies what the model should do as its final action that completes the agent run, while tools (defined using the decorators, MCP, or tools argument on the agent) are things the model can use during its run to help it in its goal (specified by the prompts and output_type). So if we want its final action to be to call a function with some arguments, that is (from PydanticAI's perspective) an output constraint rather than "just" a tool the model has available to it.

Behind the scenes, we currently use the model's tool call functionality to enable it to return JSON matching some type (or to be more precise, the type's JSON schema) by calling a special output tool named final_result, but in the new PR we're also going to be using the model's native structured JSON output functionality when available, either based on the model implementation's heuristics for which approach to prefer, or through the user forcing a specific output mode using a marker class like ToolOutput(type_=...) or StructuredOutput(type_=...) (name likely to change).

Which output mode is used is not tied to whether your output type is a model or a function, in both cases the model will just generate JSON that's pared out by PydanticAI and passed to the model constructor or the custom function.

So in your scenario, output_type=sum_numbers, output_type=ToolOutput(type_=sum_numbers) and output_type=StructuredOutput(type_=sum_numbers) will all work, but the first of those lets the model implementation change the output mode, while in the latter two you're forcing a specific mode and would get an error if the model doesn't support that particular mode.

There will be no new way of defining tools, as this is more about a new type of output (call a function) rather than a new thing the model can do during its run.

dmontagu assigned DouweM Apr 28, 2025

DouweM mentioned this issue May 2, 2025

Support structured and manual JSON output_type modes in addition to tool calls #1628

Draft

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Notes on structured output strategies #1615

Notes on structured output strategies #1615

dmontagu commented Apr 28, 2025

nzlzcat commented Apr 30, 2025 •

edited

Loading

Uh oh!

DouweM commented Apr 30, 2025

Uh oh!

nzlzcat commented May 2, 2025 •

edited

Loading

Uh oh!

DouweM commented May 2, 2025

Uh oh!

Notes on structured output strategies #1615

Notes on structured output strategies #1615

Comments

dmontagu commented Apr 28, 2025

nzlzcat commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DouweM commented Apr 30, 2025

Uh oh!

nzlzcat commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DouweM commented May 2, 2025

Uh oh!

nzlzcat commented Apr 30, 2025 •

edited

Loading

nzlzcat commented May 2, 2025 •

edited

Loading