fix(llm-sdk): structured output across providers + max_completion_tokens for o-series/gpt-5 (closes #93)#95
Merged
Merged
Conversation
…ken params (#93) Closes #93. Three related fixes for the LLM SDK so structured output requests (JSON schema / json_mode) work end-to-end and newer OpenAI reasoning models stop returning 400s. 1. Forward `responseFormat` through every adapter and provider. Added a unified `ResponseFormat` type (OpenAI shape) on `RequestLLMConfig`, `ChatRequest.config`, `DoGenerateParams`, and `GenerateTextParams`. Each adapter translates it to its provider's native field via per-provider sanitizers in `adapters/base.ts`: - OpenAI / Azure / xAI / Together / Fireworks / OpenRouter: `response_format` - OpenAI Responses API: `text.format` - Anthropic Claude 3.5+: `output_config.format` (schema sanitized — strips numeric/length constraints Anthropic rejects, forces `additionalProperties: false`) - Google Gemini: `responseJsonSchema` (schema sanitized — strips `oneOf`/`anyOf`/`$ref`/`pattern`) - Ollama 0.5+: `format` (string `"json"` or schema object) 2. Newer OpenAI models (`o1`/`o3`/`o4`/`gpt-5`) require `max_completion_tokens` instead of `max_tokens` and reject `temperature`. Added `isOpenAIReasoningModel()` and `buildOpenAITokenParams()` helpers; wired into both the legacy adapter (`adapters/openai.ts` complete + stream) and the modern provider (`providers/openai/provider.ts` doGenerate + doStream). The user-facing `maxTokens` field is unchanged; we rename to the correct provider field internally. 3. Capability gating. `supportsJsonMode` was set in the model registries but never read; now `generateText` and `streamText` warn when `responseFormat` is requested on a model that doesn't support it. Flipped Anthropic / xAI / Ollama from `false` to `true` since their structured-output is now GA. Demo: added `/chat/structured` route in `examples/fallback-demo` that exercises the OpenAI → Anthropic → Gemini fallback chain with a JSON schema — the exact scenario that broke in production. Verification: - `pnpm --filter @yourgpt/llm-sdk typecheck` clean - `pnpm --filter @yourgpt/llm-sdk build` clean - Manual smoke via fallback-demo route pending live API keys Note: overlaps with PR #92's gpt-5 handling (which routes reasoning models through the Responses API). Coordinating with @ankushchhabra on which approach lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Anthropic's structured-output schema subset accepts `anyOf` but not `oneOf`. Previously we passed `oneOf` through unchanged (Anthropic rejects) or relied on the user not using it. Now we rename `oneOf` → `anyOf` recursively in `toAnthropicOutputConfig`. Matches Vercel AI SDK's `sanitize-json-schema` behavior. Required for any user-supplied schema with discriminated unions to round-trip correctly through the Anthropic adapter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- New page: docs/llm-sdk/structured-output covering ResponseFormat shape, per-provider native field translation, and gotchas (Anthropic schema sanitization, Gemini OpenAPI subset, xAI additionalProperties default, Vertex AI gap, reasoning model token semantics) - Cross-references from generate-text and stream-text Parameters sections via Callout - Updated meta.json page order Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #93. Three related fixes so structured-output requests (JSON schema / json mode) work end-to-end across every adapter and so newer OpenAI reasoning models stop returning 400s.
responseFormatforwarded everywhere. UnifiedResponseFormattype (OpenAI shape) added toRequestLLMConfig,ChatRequest.config,DoGenerateParams,GenerateTextParams. Every adapter translates it to its provider's native field via per-provider sanitizers inadapters/base.ts.max_completion_tokensfor o-series + gpt-5. NewisOpenAIReasoningModel()+buildOpenAITokenParams()helpers. The user-facingmaxTokensfield is unchanged; we rename to the correct provider field internally and droptemperature(these models reject it).supportsJsonModewas set in registries but never read. NowgenerateText/streamTextwarn whenresponseFormatis requested on a model that doesn't support it. Flipped Anthropic / xAI / Ollama fromfalse→truesince their structured output is now GA.Per-provider translation
response_formattext.format(different shape)output_config.format(schema sanitized — strips numeric/length constraints, forcesadditionalProperties: false)responseJsonSchema(schema sanitized — stripsoneOf/anyOf/$ref/pattern)format("json"or schema)Demo
Added
/chat/structuredinexamples/fallback-demoexercising the OpenAI → Anthropic → Gemini fallback chain with a JSON schema — the exact scenario that broke in production.Overlap with PR #92
PR #92 (cc @ankushchhabradelta4infotech-ai) addresses the same gpt-5 / o-series problem by routing those models through OpenAI's Responses API instead. Both approaches solve item 1 of #93 differently. Items 2 (
responseFormatforwarding) and 3 (Anthropicoutput_config) are not in #92 and are only addressed here. Happy to drop themax_completion_tokenschange from this PR if #92 lands first.Test plan
pnpm --filter @yourgpt/llm-sdk typecheckcleanpnpm --filter @yourgpt/llm-sdk buildclean/chat/structuredroute with live OpenAI / Anthropic / Google keysresponse_format: { type: "json_schema" }failure that triggered [Feature] Add response_format support, fix max_completion_tokens for gpt-5.x/o-series, and add Anthropic structured output in fallback chain #93Out of scope
🤖 Generated with Claude Code