fix: omit temperature and use max_completion_tokens for o-series models by clifton · Pull Request #51 · clifton/rstructor

clifton · 2026-06-09T20:35:10Z

Problem

The OpenAI o-series reasoning models (O4Mini, O3, O3Mini, O1, O1Pro) could not actually be called. Every request through the shared OpenAI-compatible request struct unconditionally serialized temperature and only ever used max_tokens — but on /v1/chat/completions the o-series models return 400 for temperature and require max_completion_tokens instead of max_tokens. There was even a unit test asserting "temperature must always be present", locking in the broken contract.

Fix

OpenAIModel::is_reasoning_model() — new capability predicate matching all o-series variants. Custom identifiers are matched by prefix (o1-/o3-/o4-), so models like o3-pro without a named variant are detected too.
Shared request struct (src/backend/openai_compatible.rs): temperature is now Option<f32> with skip_serializing_if, and a new optional max_completion_tokens: Option<u32> field was added.
OpenAIClient::request_tuning() — centralizes per-model parameter shaping, used by all three OpenAI request builders (materialize, generate, streaming):
- o-series: temperature omitted entirely; the configured max_tokens is sent as max_completion_tokens; reasoning_effort is sent from the configured thinking level (Off is omitted rather than sent as "none", which o-series models reject).
- GPT-5.x: unchanged — reasoning_effort from the thinking level, temperature forced to 1.0 while reasoning is enabled, max_tokens as before.
- All other models: unchanged — temperature and max_tokens sent exactly as configured.
Grok shares the request struct but has no o-series models: its requests still always send temperature and max_tokens, and never max_completion_tokens.
The tool-calling loop (tools feature) builds its own JSON body in tools.rs and is untouched (out of scope).

Tests (all offline — no live API calls)

Updated the "temperature must always be present" serialization test to the new contract; added unit tests for omit/include of temperature and max_completion_tokens.
New mockito request-body tests in tests/http_mock_tests.rs:
- o3 (materialize): no temperature, no max_tokens, max_completion_tokens: 1234, reasoning_effort: "medium" (default thinking level).
- o4-mini with ThinkingLevel::Off: no reasoning_effort/temperature/max_tokens/max_completion_tokens.
- gpt-4o: temperature and max_tokens unchanged, no max_completion_tokens.
- Grok: request body unchanged (temperature + max_tokens always present).

Verification

cargo fmt — clean
cargo clippy --all-targets (default features, matching CI) — clean; also clean with --all-features
cargo test — full suite passes (146 lib unit tests, all integration tests, 63 doctests)

🤖 Generated with Claude Code

OpenAI o-series reasoning models (o1, o1-pro, o3, o3-mini, o4-mini) reject the `temperature` parameter with a 400 error and require `max_completion_tokens` instead of `max_tokens`, so every request to them failed: the shared OpenAI-compatible request struct serialized `temperature` unconditionally and only ever sent `max_tokens`. - Add `OpenAIModel::is_reasoning_model()`, matching all o-series variants (Custom identifiers are matched by prefix, so e.g. `o3-pro` is detected too). - Make `temperature` optional on the shared request struct (`skip_serializing_if`) and add an optional `max_completion_tokens` field. - Centralize per-model parameter shaping in `OpenAIClient::request_tuning()`, used by materialize, generate, and streaming request builders: o-series models omit `temperature` and send the configured limit as `max_completion_tokens`; they also receive `reasoning_effort` from the configured thinking level (except `Off`/"none", which o-series models reject). GPT-5.x and all other models keep their exact previous behavior. - Grok shares the request struct and has no o-series models: its requests still always send `temperature` and `max_tokens`. - Tests: update the "temperature must always be present" contract test, add serialization unit tests for the optional fields, and add offline mockito tests asserting the request bodies for o3/o4-mini (no temperature/max_tokens, max_completion_tokens present), gpt-4o (unchanged), and Grok (unchanged). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

clifton · 2026-06-10T04:36:16Z

Closing in favor of removing the o-series models from the Model enum entirely — they're deprecated, and special-casing temperature/max_completion_tokens for them isn't worth carrying. Replacement PR incoming. Custom model IDs remain available for anyone who still needs to call these models.

clifton closed this Jun 10, 2026

clifton deleted the fix/o-series-request-params branch June 10, 2026 04:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: omit temperature and use max_completion_tokens for o-series models#51

fix: omit temperature and use max_completion_tokens for o-series models#51
clifton wants to merge 1 commit into
mainfrom
fix/o-series-request-params

clifton commented Jun 9, 2026

Uh oh!

clifton commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

clifton commented Jun 9, 2026

Problem

Fix

Tests (all offline — no live API calls)

Verification

Uh oh!

clifton commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant