fix: omit temperature and use max_completion_tokens for o-series models#51
Closed
clifton wants to merge 1 commit into
Closed
fix: omit temperature and use max_completion_tokens for o-series models#51clifton wants to merge 1 commit into
clifton wants to merge 1 commit into
Conversation
OpenAI o-series reasoning models (o1, o1-pro, o3, o3-mini, o4-mini) reject the `temperature` parameter with a 400 error and require `max_completion_tokens` instead of `max_tokens`, so every request to them failed: the shared OpenAI-compatible request struct serialized `temperature` unconditionally and only ever sent `max_tokens`. - Add `OpenAIModel::is_reasoning_model()`, matching all o-series variants (Custom identifiers are matched by prefix, so e.g. `o3-pro` is detected too). - Make `temperature` optional on the shared request struct (`skip_serializing_if`) and add an optional `max_completion_tokens` field. - Centralize per-model parameter shaping in `OpenAIClient::request_tuning()`, used by materialize, generate, and streaming request builders: o-series models omit `temperature` and send the configured limit as `max_completion_tokens`; they also receive `reasoning_effort` from the configured thinking level (except `Off`/"none", which o-series models reject). GPT-5.x and all other models keep their exact previous behavior. - Grok shares the request struct and has no o-series models: its requests still always send `temperature` and `max_tokens`. - Tests: update the "temperature must always be present" contract test, add serialization unit tests for the optional fields, and add offline mockito tests asserting the request bodies for o3/o4-mini (no temperature/max_tokens, max_completion_tokens present), gpt-4o (unchanged), and Grok (unchanged). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Owner
Author
|
Closing in favor of removing the o-series models from the Model enum entirely — they're deprecated, and special-casing temperature/max_completion_tokens for them isn't worth carrying. Replacement PR incoming. Custom model IDs remain available for anyone who still needs to call these models. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The OpenAI o-series reasoning models (
O4Mini,O3,O3Mini,O1,O1Pro) could not actually be called. Every request through the shared OpenAI-compatible request struct unconditionally serializedtemperatureand only ever usedmax_tokens— but on/v1/chat/completionsthe o-series models return 400 fortemperatureand requiremax_completion_tokensinstead ofmax_tokens. There was even a unit test asserting "temperature must always be present", locking in the broken contract.Fix
OpenAIModel::is_reasoning_model()— new capability predicate matching all o-series variants. Custom identifiers are matched by prefix (o1-/o3-/o4-), so models likeo3-prowithout a named variant are detected too.src/backend/openai_compatible.rs):temperatureis nowOption<f32>withskip_serializing_if, and a new optionalmax_completion_tokens: Option<u32>field was added.OpenAIClient::request_tuning()— centralizes per-model parameter shaping, used by all three OpenAI request builders (materialize, generate, streaming):temperatureomitted entirely; the configuredmax_tokensis sent asmax_completion_tokens;reasoning_effortis sent from the configured thinking level (Offis omitted rather than sent as"none", which o-series models reject).reasoning_effortfrom the thinking level,temperatureforced to 1.0 while reasoning is enabled,max_tokensas before.temperatureandmax_tokenssent exactly as configured.temperatureandmax_tokens, and nevermax_completion_tokens.toolsfeature) builds its own JSON body intools.rsand is untouched (out of scope).Tests (all offline — no live API calls)
temperatureandmax_completion_tokens.tests/http_mock_tests.rs:o3(materialize): notemperature, nomax_tokens,max_completion_tokens: 1234,reasoning_effort: "medium"(default thinking level).o4-miniwithThinkingLevel::Off: noreasoning_effort/temperature/max_tokens/max_completion_tokens.gpt-4o:temperatureandmax_tokensunchanged, nomax_completion_tokens.temperature+max_tokensalways present).Verification
cargo fmt— cleancargo clippy --all-targets(default features, matching CI) — clean; also clean with--all-featurescargo test— full suite passes (146 lib unit tests, all integration tests, 63 doctests)🤖 Generated with Claude Code