Skip to content

Support structured and manual JSON output_type modes in addition to tool calls #1628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

DouweM
Copy link
Contributor

@DouweM DouweM commented May 2, 2025

Right now, this only implements response_format=json_schema structured output for OpenAI chat completions in non-streaming mode.

This test illustrates the behavior, including the fact that OpenAI supports tool calls alongside structured output (as noted in #582 (comment)):

async def test_openai_structured_output(allow_model_requests: None, openai_api_key: str):
    m = OpenAIModel('gpt-4o', provider=OpenAIProvider(api_key=openai_api_key))

    class CityLocation(BaseModel):
        city: str
        country: str

    agent = Agent(m, output_type=StructuredOutput(type_=CityLocation))

    @agent.tool_plain
    async def get_user_country() -> str:
        return 'Mexico'

    result = await agent.run('What is the largest city in the user country?')
    assert result.output == snapshot(CityLocation(city='Mexico City', country='Mexico'))

    assert result.all_messages() == snapshot(
        [
            ModelRequest(
                parts=[
                    UserPromptPart(
                        content='What is the largest city in the user country?',
                        timestamp=IsDatetime(),
                    )
                ]
            ),
            ModelResponse(
                parts=[ToolCallPart(tool_name='get_user_country', args='{}', tool_call_id=IsStr())],
                model_name='gpt-4o-2024-08-06',
                timestamp=IsDatetime(),
            ),
            ModelRequest(
                parts=[
                    ToolReturnPart(
                        tool_name='get_user_country',
                        content='Mexico',
                        tool_call_id=IsStr(),
                        timestamp=IsDatetime(),
                    )
                ]
            ),
            ModelResponse(
                parts=[StructuredOutputPart(content='{"city":"Mexico City","country":"Mexico"}')],
                model_name='gpt-4o-2024-08-06',
                timestamp=IsDatetime(),
            ),
        ]
    )

Copy link

github-actions bot commented May 2, 2025

Docs Preview

commit: 64a3ea6
Preview URL: https://3f6a4636-pydantic-ai-previews.pydantic.workers.dev

@@ -43,6 +43,7 @@
'RunContext',
# result
'ToolOutput',
'StructuredOutput',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not necessarily opposed to this choice of name, but I feel like the term "Structured Output" is often used to refer to the more general idea, which includes output produced via tool calls and output produced using response_format etc.

Comment on lines 428 to 431
elif structured_outputs:
# No events are emitted during the handling of structured outputs, so we don't need to yield anything
self._next_node = await self._handle_structured_outputs(ctx, structured_outputs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this code is assuming that you don't have both tool calls and structured outputs, but I would think that this is possible. Obviously you aren't going to have output tool calls, but you could have non-output tool calls right? My intuition is that we should rework it so that we process any tool calls and then handle the structured outputs.

If it's impossible to use tool calls while also using structured output then maybe this is fine but I don't see why that should be the case in principle. (But maybe it's the case in all existing APIs? Or maybe it isn't?)

try:
result_data = output_schema.validate(structured_output)
result_data = await _validate_output(result_data, ctx, None)
except _output.ToolRetryError as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunate exception name now

Copy link
Contributor

@dmontagu dmontagu May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use ModelRetry directly here? Maybe that's hard/annoying if things are set up to convert those into ToolRetryError (because it was assumed to be handled in a tool). I don't remember the details..

@@ -83,13 +90,18 @@ class OutputSchema(Generic[OutputDataT]):
Similar to `Tool` but for the final output of running an agent.
"""

# TODO: Since this is currently called "preferred", models that don't have structured output implemented yet ignore it and use tools (except for Mistral).
# We should likely raise an error if an unsupported mode is used, _and_ allow the model to pick its own preferred mode if none is forced.
preferred_mode: Literal['tool', 'structured'] | None # TODO: Add mode for manual JSON
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing needs to change now, but I'll note this may be a good place to specify how to handle unions — i.e., single tool with union schema (possibly wrapped into a single field for model compatibillity) vs. one tool per union case

tools: dict[str, OutputSchemaTool[OutputDataT]]
allow_text_output: bool
allow_text_output: bool # TODO: Verify structured output works correctly with string as a union member
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always found the "allow_text_output" stuff to be kind of janky. I am hopeful we can come up with a better design here that feels less hacked together; but maybe it's unavoidable.

But basically what I think is going on is that this is implicitly a third type of output — ContentText. And it is possible for a model to produce output that matches ContentText or output that matches ToolOutput in response to a single request. I think in principle it is possible for model responses to call tools even when using structured output, so I guess in principle it's possible to have ToolOutput and StructuredOutput as options in the response to a single request (I guess we aren't doing that today but in principle..).

But basically this just makes me sort of feel like we should maybe force agents to use a single approach to producing output — (unstructured) content, tool calls, or structured content. (Or maybe others later.) While this might theoretically lead to worse behavior in some use cases (in particular, where you might want text output or might want a tool call), I think it would end up being a lot easier to reason about and probably handled equally well in most cases. (I.e., if you want to optionally get a string back, or optionally use other output tools, we just create a tool for returning text.)

That said, while I do think that would be reasonable, it wouldn't surprise me if it just behaved worse in some cases (e.g., if models were pretty dumb about using a tool call to return text as its final output). 🤷‍♂️. But I would love to have a cleaner, and more general/unified output implementation with fewer edge cases to worry about.


@classmethod
def build(
cls: type[OutputSchema[T]],
output_type: type[T] | ToolOutput[T],
output_type: type[T] | ToolOutput[T] | StructuredOutput[T], # TODO: Support a list of output types/markers
Copy link
Contributor

@dmontagu dmontagu May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is a response to your # TODO comment:)

Not super important, but I'll note that I think we only need (aspire?) to support list[ToolOutput[T]] actually, not arbitrary lists of output markers.

Well, at least, I see some tangible value in supporting list[ToolOutput[T]], but I don't currently see a very compelling use case for supporting list[StructuredOutput[T]] or list[StructuredOutput[T] | ToolOutput[T]] or similar.

@@ -57,12 +58,12 @@ class ModelResponsePartsManager:
"""Maps a vendor's "part" ID (if provided) to the index in `_parts` where that part resides."""

def get_parts(self) -> list[ModelResponsePart]:
"""Return only model response parts that are complete (i.e., not ToolCallPartDelta's).
"""Return only model response parts that are complete (i.e., not ToolCallPartDelta's or StructuredOutputPartDelta's).
Copy link
Contributor

@dmontagu dmontagu May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to (or should) list StructuredOutputPartDeltas here.

The reason we have to do something special for ToolCallPartDeltas is because it's possible to have a ToolCallPartDelta that fundamentally can't be converted into a ToolCallPart. You can see the details for this in pydantic_ai.messages.ToolCallPartDelta._apply_to_delta, in particular, the fact that that function returns a ToolCallPart | ToolCallPartDelta shows that in some cases you can't get a Part from just tool call deltas.

However, the fact that a StructuredOutputPartDelta can always be converted into a StructuredOutputPart (which is related to the fact that StructuredOutputPartDelta.apply always returns a StructuredOutputPart) means that the parts manager should not hold StructuredOutputPartDeltas — just StructuredOutputParts. And here and below we can remove mentions of this, and related filtering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it looks like StructuredOutputPartDelta can't even be in pydantic_ai._parts_manager.ModelResponsePartsManager._parts, so this seems like either an unnecessary (possibly AI-generated?) logic update, or maybe something leftover from mid-implementation refactoring that didn't get cleaned up? Either way, I think you don't need to worry about filtering them out of the ._parts field.

@@ -91,6 +92,8 @@ def handle_text_delta(
"""
existing_text_part_and_index: tuple[TextPart, int] | None = None

# TODO: Parse out structured output or manual JSON, with a separate message?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you mean by this is that you want to support another form of structured output where it is read from just regular content, without using vendor API features. Is that right? If so, can you update the comment to be more clear? (I'm worried I'm misunderstanding and actually there's something that needs to be done here even for the approach implemented in this PR; regardless of which is the case, a clearer comment would help understand.)

@@ -387,7 +388,7 @@ async def run(
self,
user_prompt: str | Sequence[_messages.UserContent] | None = None,
*,
output_type: type[RunOutputDataT] | ToolOutput[RunOutputDataT] | None = None,
output_type: type[RunOutputDataT] | ToolOutput[RunOutputDataT] | StructuredOutput[RunOutputDataT] | None = None,
Copy link
Contributor

@dmontagu dmontagu May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should make an alias for this:

type Output[T] = type[T] | ToolOutput[T] | StructuredOutput[T]

you'd need to use typing_extensions.TypeAliasType for backwards compatibility with older python versions but the goal would be to have Output be a type alias with the same semantics implied by the above.

(Could use a different name than Output but you get the point.)

I'm not sure if it's better, but I think if we tack any more items onto this union, it might be more readable to replace the union with an alias.

@DouweM DouweM force-pushed the output-modes branch 2 times, most recently from f305aa5 to b892fc8 Compare May 15, 2025 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for returning response directly from tool Structured outputs as an alternative to Tool Calling
2 participants