Skip to content

Notes on structured output strategies #1615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Tracked by #1628
dmontagu opened this issue Apr 28, 2025 · 4 comments
Open
Tracked by #1628

Notes on structured output strategies #1615

dmontagu opened this issue Apr 28, 2025 · 4 comments
Assignees

Comments

@dmontagu
Copy link
Contributor

@DouweM is looking into this and asked me to write up some notes on the requirements our approach to supporting different output strategies will need to satisfy:

  • Until https://peps.python.org/pep-0747/ is supported by type-checkers (specifically, pyright), we can't have output_type accept unions directly (i.e., something like Agent(..., output_type=A | B).
    • I believe this also is related to why you get a pyright error for output_type=Annotated[A, ...].
    • Note that we could introduce a generic type as a workaround for this whose only purpose was to accept type forms as the generic parameter so they can be passed without a type error. I.e., add a generic class called TypeValue and do output_type=TypeValue[A | Annotated[B, ...]] or similar. But this is kind of ugly and becomes unnecessary if/when PEP-747 is supported by type-checkers, so 🤷‍♂ probably not worth it.
  • We need to support using callables as the way to specify the output schema and call it:
    • This is important because if you plan to call a function with the output of a tool call, and the function can error, you want the error to go straight back to the model rather than needing to re-call the model, etc.
    • The return type of the callable should be the return type of the agent run. (I.e., it should be compatible with the value of the second generic parameter to Agent.)
    • ToolOutput needs to support callables. (I already have an open draft PR for this, but it needs to be finished.)
    • New forms of output should also support callables. In particular, even if I'm using response_format or just parsing the content directly, I should be able to use that as the input to a callable and go back to the model if the call fails. And I should not need to manually define a struct with the same fields as the callable, and the output of the callable should be the type
  • We need a way to provide a list of ToolOutput (for X = Tool, etc.) as a way to more explicitly control precisely what outputs the model can produce. In particular, this gives a way to use ToolOutput(type_=str) alongside other tool outputs, and a way to set the tool name for each. It also gives a way to use an anyOf JSON schema as the parameters schema for a tool call. (Some/most models may not support this today, but there's nothing wrong with it in principle.)
    • Note that it probably doesn't make sense to support a list of ResponseFormatOutput or ContentOutput strategies (or whatever we choose to call them) since there isn't a good mechanism to specify which one is being used. But I'd be open to it if we did come up with such a mechanism. (At least for content output, it's more obvious how this could be done, but I'm not sure it's necessary.)

Eventually, I think we need a better native agent handoff, which is in many ways like a callable ToolOutput where the call is the relevant agent.run method. Maybe allowing an agent as an output strategy is enough, I'm not sure. (Maybe it requires some care to get multi-agent streaming to work nicely across handoffs.) But I wouldn't get too caught up on this now unless it's obvious how it could/should work.

@nzlzcat
Copy link

nzlzcat commented Apr 30, 2025

First of all, thanks for this effort!

Few comments partially related to the above @dmontagu , I have serious concerns about the output_type for the agent and its wrap around OutputSchema/OutputSchemaTool. Is this effort going to solve the following? 👇

If output_type is provided to the agent, and that has already been returned by one of the tool_calls, we should be directly completing the graph with that result, not creating a new request. I don't even think this should be optional or configurable, but default behavior.

Currently all our code is making one final (accumulatively-expensive) call to wrap-up and duplicate. [https://github.com//issues/127]

for call, output_tool in output_schema.find_tool(tool_calls):

I assume the behavior of the line above should be different, and it should allow matching tool result type against output_type. Maybe not for str if you want to have some additional control... but at least for pydantic types.


Alternatively, exposing the agent graph and allowing to manually iterate nodes would be very useful. This is currently not possible and makes building custom solutions around the pydantic_ai Agent very hard.


In the following example, the agent.run should be able to respond with a single model request.

class Response(BaseModel):
    response: str

my_agent = Agent(
    model="openai:gpt-4o",
    instrument=True,
    output_type=Response,
    system_prompt="Make the operation the user wants",
    model_settings={"temperature": 0.0},
)

@my_agent.tool_plain
async def sum_numbers(args: Foo) -> Response:
    result = args.num1 + args.num2
    # Already provides the result in the final output type
    return Response(response="The sum is: " + str(result))

@DouweM
Copy link
Contributor

DouweM commented Apr 30, 2025

Alternatively, exposing the agent graph and allowing to manually iterate nodes would be very useful. This is currently not possible and makes building custom solutions around the pydantic_ai Agent very hard.

@nzlzcat Have you seen https://ai.pydantic.dev/agents/#iterating-over-an-agents-graph?

In the following example, the agent.run should be able to respond with a single model request.

With the callable output_type feature being implemented in #1463, you'd be able to pass output_type=sum_numbers (or even a list of multiple functions), that the LLM would call as a final response, with the actual agent run output being the Response returned by that function. Would that work for your use case?

@nzlzcat
Copy link

nzlzcat commented May 2, 2025

@nzlzcat Have you seen https://ai.pydantic.dev/agents/#iterating-over-an-agents-graph?

Hadn't seen that particular option, thanks!

... Would that work for your use case?

Waiting for changes at #1628 . What is the reasoning behind solving it via output_type callables and not via tools? Do you envision 4 ways to declare tools?

  • tool
  • tool_plain
  • mcp_server_tool
  • output_tool <-- new?

Quote

Tools or functions are also used to define the schema(s) for structured responses, thus a model might have access to many tools, some of which call function tools while others end the run and produce a final output.

Do these 2 "types" have, or will have different decorators?

Or that is supposed to be handled via StructuredOutput? Will StructuredOutput(type_=Response) also avoid the additional wrap-up request?

@DouweM
Copy link
Contributor

DouweM commented May 2, 2025

@nzlzcat output_type specifies what the model should do as its final action that completes the agent run, while tools (defined using the decorators, MCP, or tools argument on the agent) are things the model can use during its run to help it in its goal (specified by the prompts and output_type). So if we want its final action to be to call a function with some arguments, that is (from PydanticAI's perspective) an output constraint rather than "just" a tool the model has available to it.

Behind the scenes, we currently use the model's tool call functionality to enable it to return JSON matching some type (or to be more precise, the type's JSON schema) by calling a special output tool named final_result, but in the new PR we're also going to be using the model's native structured JSON output functionality when available, either based on the model implementation's heuristics for which approach to prefer, or through the user forcing a specific output mode using a marker class like ToolOutput(type_=...) or StructuredOutput(type_=...) (name likely to change).

Which output mode is used is not tied to whether your output type is a model or a function, in both cases the model will just generate JSON that's pared out by PydanticAI and passed to the model constructor or the custom function.

So in your scenario, output_type=sum_numbers, output_type=ToolOutput(type_=sum_numbers) and output_type=StructuredOutput(type_=sum_numbers) will all work, but the first of those lets the model implementation change the output mode, while in the latter two you're forcing a specific mode and would get an error if the model doesn't support that particular mode.

There will be no new way of defining tools, as this is more about a new type of output (call a function) rather than a new thing the model can do during its run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants