You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Model classes were built for a specific API and model family, encoding not just the hard API spec but also various behaviors related to those specific models' capabilities and limitations, e.g. what JSON schemas or multi-modal input types are supported.
This breaks down when a model class is used with a different API or a different family of models that don't match all of those hard and soft assumptions:
BedrockModel and GroqModel are used with a wide variety of models.
So far, the biggest class of issues (at least one filed a day) has been in JSON schema handling. OpenAIModel and GeminiModel both implement their own transformers (OpenAI, Gemini), but people using OpenAIModel and BedrockModel with other models have been running into API errors when using particular models, even if others on the same provider may work fine (suggesting this really is model-specific, and not something that an OpenAI-compatible API "should" handle consistently across all models). Resolving this currently requires manually defining a model subclass and applying one of the existing JSON schema transformer, sometimes with tweaks:
Built-in tools may also be different between providers used with the same model class (e.g. OpenAIModel + OpenRouter), or between models on the same provider (as some models may not support tool calls at all):
That's not the end of it, unfortunately, as we've already seen some other axes where different models may need different handling to get the most out of them:
With model classes that cover many models, like OpenAIModel and BedrockModel, not all will support all multi-modal input types (video, audio, image, docs).
Claude doesn't natively do parallel tool calls and recommends providing an explicit batch tool New Common Tool: Batch #1769
I think it's time to pull some model and model family-specific details out of the model classes, generalize them, and allow them to be tweaked on a model-by-model basis.
This'll be somewhat similar to ModelSettings, but instead of properties to be passed directly to the model API, these new properties will determine how PydanticAI builds its request payload to get the most out of each specific model and work around limitations.
There'd be global defaults, model class/family defaults layered on top of that, model-specific overrides provided by the model class file, and the ability for users to tweak the settings further, or even use the settings defined by one model class (e.g. GeminiModel's specification for 2.5 Pro) with another model class (like OpenAIModel + OpenRouter + Gemini).
Because we're basically describing how the model likes to be talked to, I'm leaning towards the name ModelProfile or ModelSpec or something similar -- but very open to other suggestions.
It'd look something like this:
@dataclassclassModelProfile:
json_schema_transformer: Literal['openai', 'gemini'] |type[WalkJsonSchema]
supported_output_modes: set[Literal['tool', 'json_schema', 'json_object', 'manual_json']]
default_output_mode: Literal['tool', 'json_schema', 'json_object', 'manual_json']
# definitely not all necessary right away, but to give you an ideabuilt_in_tools: dict[str, dict[str, Any]]
manual_json_prompt: strtool_use: boolstrict_tools: booltool_choice: booltool_result_type: Literal['string', 'object']
multi_modal_input_types: set[Literal['video', 'audio', 'image', 'docs']]
offer_batch_tool: bool# models/__init__.pyDEFAULT_PROFILE=ModelProfile(...)
# models/openai.pyDEFAULT_OPENAI_PROFILE=replace(DEFAULT_PROFILE, json_schema_transformer='openai', ...)
OPENAI_PROFILES= {}
OPENAI_PROFILES['gpt-4'] =replace(DEFAULT_OPENAI_PROFILE, supported_output_modes={'tool', 'json_object', 'manual_json'})
OPENAI_PROFILES['gpt-4o'] =replace(OPENAI_PROFILES['gpt-4'], supported_output_modes={'tool', 'json_schema', 'manual_json'})
# models/gemini.pyDEFAULT_GEMINI_PROFILE=replace(DEFAULT_PROFILE, json_schema_transformer='gemini', ...)
GEMINI_PROFILES= {}
GEMINI_PROFILES['gemini-2.0-flash-001'] =replace(DEFAULT_GEMINI_PROFILE)
# models/anthropic.pyDEFAULT_ANTHROPIC_PROFILE=replace(DEFAULT_PROFILE, ...)
ANTHROPIC_PROFILES= {}
ANTHROPIC_PROFILES['claude-3-5-sonnet-20240620'] =replace(DEFAULT_ANTHROPIC_PROFILE, ...)
# models/bedrock.pyDEFAULT_BEDROCK_PROFILE=replace(DEFAULT_PROFILE)
BEDROCK_PROFILES= {}
BEDROCK_PROFILES['us.anthropic.claude-3-5-sonnet-20240620'] =ANTHROPIC_PROFILES['claude-3-5-sonnet-20240620'] # or some cleverer way to read these automatically based on name# my_agent.pymodel=OpenAIModel(model_name='gpt-4o')
openrouter_provider=model=OpenAIModel(
"google/gemini-2.0-flash-001",
provider=OpenAIProvider(base_url="https://openrouter.ai/api/v1", ...),
profile=GEMINI_PROFILES['gemini-2.0-flash-001']
)
model=AnthropicModel(model_name='claude-3-5-sonnet-20240620')
model=BedrockModel(model_name='llama3.3', profile=replace(DEFAULT_PROFILE, json_schema_transformer='gemini'))
# could also work, if we merge in the defaults (or just set those on the dataclass/pydantic model?)model=BedrockModel(model_name='llama3.3', profile=ModelProfile(json_schema_transformer='gemini'))
I'd start by implementing this for json_schema_transformer as that's the main one causing issues today, but since we have the output modes in the pipeline, I'd rather implement this with a new class from the get-go rather than with a json_schema_transformer argument directly set on Model.
This looks good. I think the GEMINI_PROFILES etc. should not include duplicate values for each model name, and instead should just include one key for each distinct profile value, and under the hood we should use a function that selects the appropriate profile as a function of the model name.
I would imagine then that we could allow users to pass either an explicit ModelProfile or a Callable[[str], ModelProfile] into the profile argument of Model.
Uh oh!
There was an error while loading. Please reload this page.
Description
Model classes were built for a specific API and model family, encoding not just the hard API spec but also various behaviors related to those specific models' capabilities and limitations, e.g. what JSON schemas or multi-modal input types are supported.
This breaks down when a model class is used with a different API or a different family of models that don't match all of those hard and soft assumptions:
OpenAIModel
is used with various ostensibly-OpenAI-compatible APIs and a wide variety of models.BedrockModel
andGroqModel
are used with a wide variety of models.So far, the biggest class of issues (at least one filed a day) has been in JSON schema handling.
OpenAIModel
andGeminiModel
both implement their own transformers (OpenAI, Gemini), but people usingOpenAIModel
andBedrockModel
with other models have been running into API errors when using particular models, even if others on the same provider may work fine (suggesting this really is model-specific, and not something that an OpenAI-compatible API "should" handle consistently across all models). Resolving this currently requires manually defining a model subclass and applying one of the existing JSON schema transformer, sometimes with tweaks:OpenAIModel
+ Together.xyz + Qwen fails, but works with Llama Erratic performance when using nested schemas #1659OpenAIModel
+ OpenRouter + Gemini fails Different behaviour with Gemini models using OpenAI+OpenRouter #1735BedrockModel
+ Nova fails, but works with others Amazon Nova (Bedrock) limitations with tool schema #1623We're going to run into something similar with Structured Output Modes:
OpenAIModel
+ pre-4o doesn't supportjson_schema
, onlyjson_object
(and tool calls and manual JSON)json_schema
orjson_object
, only tool calls or manual JSONGeminiModel
(and presumablyBedrockModel
+ Gemini) doesn't supportjson_schema
alongside toolsBuilt-in tools may also be different between providers used with the same model class (e.g.
OpenAIModel
+ OpenRouter), or between models on the same provider (as some models may not support tool calls at all):That's not the end of it, unfortunately, as we've already seen some other axes where different models may need different handling to get the most out of them:
BedrockModel
+ some models don't support tool use https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference-supported-models-features.htmlOpenAIModel
+ PPInfra + Qwen doesn't supportstrict
on tool definitions Inferred strict=true may cause compatibility issues with OpenAI-compatible servers #1561OpenAIModel
+ Grok doesn't supportstrict
on tool definitions strict` mode on function call is currently not supported for grok models #1846BedrockModel
+ Claude doesn't support tool choice https://github.com/pydantic/pydantic-ai/blob/main/pydantic_ai_slim/pydantic_ai/models/bedrock.py#L305BedrockModel
+ Mistral (and others?) require tool result to be passed as an object instead of string Support Tool Calling with Llama 3.3 on Bedrock #1649OpenAIModel
andBedrockModel
, not all will support all multi-modal input types (video, audio, image, docs).batch
tool New Common Tool: Batch #1769I think it's time to pull some model and model family-specific details out of the model classes, generalize them, and allow them to be tweaked on a model-by-model basis.
This'll be somewhat similar to
ModelSettings
, but instead of properties to be passed directly to the model API, these new properties will determine how PydanticAI builds its request payload to get the most out of each specific model and work around limitations.There'd be global defaults, model class/family defaults layered on top of that, model-specific overrides provided by the model class file, and the ability for users to tweak the settings further, or even use the settings defined by one model class (e.g.
GeminiModel
's specification for 2.5 Pro) with another model class (likeOpenAIModel
+ OpenRouter + Gemini).Because we're basically describing how the model likes to be talked to, I'm leaning towards the name
ModelProfile
orModelSpec
or something similar -- but very open to other suggestions.It'd look something like this:
I'd start by implementing this for
json_schema_transformer
as that's the main one causing issues today, but since we have the output modes in the pipeline, I'd rather implement this with a new class from the get-go rather than with ajson_schema_transformer
argument directly set onModel
.@dmontagu @Kludex Thoughts? :)
References
No response
The text was updated successfully, but these errors were encountered: