fix(auto-run): unwedge experiment loop when provider rejects temperature#78
Open
ChenglongWang wants to merge 2 commits into
Open
Conversation
…ture` PRJ-0002 (2026-05-14) wedged on `azure/anthropic/claude-opus-4-7` returning `invalid_request_error: \`temperature\` is deprecated for this model.` Every auto-run round re-hit the same parameter error until `_AUTO_MAX_ROUNDS` (20) was exhausted, surfacing the error to the user instead of any research result. Three independent bugs collided: 1. `temperature` was unconditionally attached to outbound requests for models that no longer accept it. Centralise the rule in a new `providers.model_compat` registry (currently lists `claude-opus-4-7`) and gate temperature emission on it in the Azure, LiteLLM, and Anthropic providers. Azure's existing `_supports_temperature` rule for `gpt-5`/`o*` deployments is preserved on top. 2. `RoutedProviderManager.chat` blindly walked the fallback chain when a provider RAISED rather than returned an error response, so a permanent 4xx burned every remaining candidate. Apply the same `_should_retry_with_fallback` classification used on the response path; non-retryable exceptions now short-circuit immediately. 3. `_evaluate_continuation` only stopped auto mode on failure responses when `strictHeuristics` was on. LLM-provider errors are NOT experiment outcomes — they mean the model never produced a turn, so the next round will hit the same error. Add an unconditional `_looks_like_llm_provider_error` check that halts auto mode with `stop_reason="llm provider error"` regardless of policy. Tests cover the model_compat blocklist under every provider prefix, the Azure body builder dropping temperature for `claude-opus-4-7`, non-retryable raised exceptions not burning the fallback chain, and auto-run halting on the exact error text observed in PRJ-0002.
…rovider too
Second occurrence at 2026-05-14 21:09: the temperature error reappeared
even after the first PR fix. Root cause: `OpenAICompatProvider`
(used by `custom` provider configs and by `GitHubCopilotProvider` via
inheritance) keeps its own `_supports_temperature` rule that only
blocked GPT-5 / o1 / o3 / o4 deployments. When a user's OpenAI-compatible
endpoint proxies to Azure-hosted `claude-opus-4-7`, this path still
attached `temperature` and Azure 400'd with
`invalid_request_error: \`temperature\` is deprecated for this model.`
Have `_supports_temperature` also consult the shared
`providers.model_compat` registry. Same pattern as Azure / LiteLLM /
Anthropic providers from the parent commit. The error-format trail
(`Error: {'message':...}`) comes from `_handle_error` in
`openai_compat_provider.py:811`, which confirms this code path is the
one the user's config hits.
Adds two regression tests:
- `_supports_temperature` returns False for `claude-opus-4-7` under
every provider prefix.
- `_build_kwargs` AND `_build_responses_body` both omit `temperature`
from the outbound request body for `azure/anthropic/claude-opus-4-7`.
e133e75 to
2dd9140
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Production incident PRJ-0002 (2026-05-14): every
azure/anthropic/claude-opus-4-7call returnedEvery auto-run round re-hit the same parameter error until
_AUTO_MAX_ROUNDS(20) was exhausted, and the error string was surfaced to the user instead of any research output.Logs that triggered this PR:
~/Shared/agent-service.log(19 identical_run_agent_loop:580errors in 13 seconds) and~/Shared/actions.jsonl(auto-run round 1..20all firing on the same model with no recovery).Three independent bugs collided. This PR fixes each in isolation.
Bug 1 —
temperatureattached to models that reject itmira_engine/providers/model_compat.pycentralises the rule (currentlyclaude-opus-4-7).AzureOpenAIProvider._supports_temperatureconsults it on top of its existinggpt-5/o1/o3/o4blocklist.LiteLLMProvider.chatpopstemperaturefrom the request kwargs before model-specific overrides apply.AnthropicProvider._build_kwargsconditionally skipstemperatureon every code path (adaptive thinking, enabled thinking, plain).Adding a model to the blocklist is now a one-line registry change.
Bug 2 — non-retryable raised exceptions burned the fallback chain
RoutedProviderManager.chatalready classified non-retryable error responses via_should_retry_with_fallback, but theexcept Exceptionbranch unconditionally walked to the next candidate. A raisedBadRequestErrortherefore tried every candidate even though they all fail identically. Apply the same classifier onstr(exc)and re-raise immediately on non-retryable errors.Bug 3 — auto-run kept spinning on LLM provider errors
_evaluate_continuationonly stopped on failure responses whenstrictHeuristicswas on. In PRJ-0002 there was noautomation_policy, so it defaulted to relaxed heuristics and kept calling the LLM 20 times with the same parameter error. Add an unconditional_looks_like_llm_provider_errorcheck that halts auto mode withstop_reason="llm provider error"regardless of policy. Detects markers from every provider wrapper (Error calling LLM,Error calling Azure OpenAI,litellm.BadRequestError,Azure_aiException,invalid_request_error,All candidate models failed for this turn, …).Files
Test plan
pytest tests/providers/test_model_compat.py tests/providers/test_azure_openai_provider.py— 31 passed (coversclaude-opus-4-7under every provider prefix and Azure body builder droppingtemperature)pytest tests/test_model_routing.py— 12 passed (new: retryable-raised falls back; non-retryable-raised does not invoke fallback candidate)pytest tests/test_research_loop_core.py— 14 passed (new:_looks_like_llm_provider_errormarkers;_evaluate_continuationhalts with"llm provider error"even under relaxed heuristics)tests/providers/,tests/test_model_routing.py,tests/test_research_loop_core.py,tests/test_agent_loop_core.py,tests/test_agent_loop.py— 272 passedruff checkon newly touched files — clean (pre-existing W293/F841/I001 in unrelated lines confirmed via stash diff)claude-opus-4-7auto session against staging to confirm the surface error stays a single round and the loop halts withstop_reason="llm provider error"