Skip to content

fix: omit temperature and use max_completion_tokens for o-series models#51

Closed
clifton wants to merge 1 commit into
mainfrom
fix/o-series-request-params
Closed

fix: omit temperature and use max_completion_tokens for o-series models#51
clifton wants to merge 1 commit into
mainfrom
fix/o-series-request-params

Conversation

@clifton

@clifton clifton commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Problem

The OpenAI o-series reasoning models (O4Mini, O3, O3Mini, O1, O1Pro) could not actually be called. Every request through the shared OpenAI-compatible request struct unconditionally serialized temperature and only ever used max_tokens — but on /v1/chat/completions the o-series models return 400 for temperature and require max_completion_tokens instead of max_tokens. There was even a unit test asserting "temperature must always be present", locking in the broken contract.

Fix

  • OpenAIModel::is_reasoning_model() — new capability predicate matching all o-series variants. Custom identifiers are matched by prefix (o1-/o3-/o4-), so models like o3-pro without a named variant are detected too.
  • Shared request struct (src/backend/openai_compatible.rs): temperature is now Option<f32> with skip_serializing_if, and a new optional max_completion_tokens: Option<u32> field was added.
  • OpenAIClient::request_tuning() — centralizes per-model parameter shaping, used by all three OpenAI request builders (materialize, generate, streaming):
    • o-series: temperature omitted entirely; the configured max_tokens is sent as max_completion_tokens; reasoning_effort is sent from the configured thinking level (Off is omitted rather than sent as "none", which o-series models reject).
    • GPT-5.x: unchanged — reasoning_effort from the thinking level, temperature forced to 1.0 while reasoning is enabled, max_tokens as before.
    • All other models: unchanged — temperature and max_tokens sent exactly as configured.
  • Grok shares the request struct but has no o-series models: its requests still always send temperature and max_tokens, and never max_completion_tokens.
  • The tool-calling loop (tools feature) builds its own JSON body in tools.rs and is untouched (out of scope).

Tests (all offline — no live API calls)

  • Updated the "temperature must always be present" serialization test to the new contract; added unit tests for omit/include of temperature and max_completion_tokens.
  • New mockito request-body tests in tests/http_mock_tests.rs:
    • o3 (materialize): no temperature, no max_tokens, max_completion_tokens: 1234, reasoning_effort: "medium" (default thinking level).
    • o4-mini with ThinkingLevel::Off: no reasoning_effort/temperature/max_tokens/max_completion_tokens.
    • gpt-4o: temperature and max_tokens unchanged, no max_completion_tokens.
    • Grok: request body unchanged (temperature + max_tokens always present).

Verification

  • cargo fmt — clean
  • cargo clippy --all-targets (default features, matching CI) — clean; also clean with --all-features
  • cargo test — full suite passes (146 lib unit tests, all integration tests, 63 doctests)

🤖 Generated with Claude Code

OpenAI o-series reasoning models (o1, o1-pro, o3, o3-mini, o4-mini)
reject the `temperature` parameter with a 400 error and require
`max_completion_tokens` instead of `max_tokens`, so every request to
them failed: the shared OpenAI-compatible request struct serialized
`temperature` unconditionally and only ever sent `max_tokens`.

- Add `OpenAIModel::is_reasoning_model()`, matching all o-series
  variants (Custom identifiers are matched by prefix, so e.g. `o3-pro`
  is detected too).
- Make `temperature` optional on the shared request struct
  (`skip_serializing_if`) and add an optional `max_completion_tokens`
  field.
- Centralize per-model parameter shaping in
  `OpenAIClient::request_tuning()`, used by materialize, generate, and
  streaming request builders: o-series models omit `temperature` and
  send the configured limit as `max_completion_tokens`; they also
  receive `reasoning_effort` from the configured thinking level
  (except `Off`/"none", which o-series models reject). GPT-5.x and all
  other models keep their exact previous behavior.
- Grok shares the request struct and has no o-series models: its
  requests still always send `temperature` and `max_tokens`.
- Tests: update the "temperature must always be present" contract test,
  add serialization unit tests for the optional fields, and add offline
  mockito tests asserting the request bodies for o3/o4-mini (no
  temperature/max_tokens, max_completion_tokens present),
  gpt-4o (unchanged), and Grok (unchanged).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@clifton

clifton commented Jun 10, 2026

Copy link
Copy Markdown
Owner Author

Closing in favor of removing the o-series models from the Model enum entirely — they're deprecated, and special-casing temperature/max_completion_tokens for them isn't worth carrying. Replacement PR incoming. Custom model IDs remain available for anyone who still needs to call these models.

@clifton clifton closed this Jun 10, 2026
@clifton clifton deleted the fix/o-series-request-params branch June 10, 2026 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant