Skip to content

fix(auto-run): unwedge experiment loop when provider rejects temperature#78

Open
ChenglongWang wants to merge 2 commits into
feat/project-registry-runtime-isolationfrom
fix/auto-run-temperature-and-error-loop
Open

fix(auto-run): unwedge experiment loop when provider rejects temperature#78
ChenglongWang wants to merge 2 commits into
feat/project-registry-runtime-isolationfrom
fix/auto-run-temperature-and-error-loop

Conversation

@ChenglongWang
Copy link
Copy Markdown
Contributor

Summary

Production incident PRJ-0002 (2026-05-14): every azure/anthropic/claude-opus-4-7 call returned

litellm.BadRequestError: Azure_aiException - invalid_request_error: `temperature` is deprecated for this model.

Every auto-run round re-hit the same parameter error until _AUTO_MAX_ROUNDS (20) was exhausted, and the error string was surfaced to the user instead of any research output.

Logs that triggered this PR: ~/Shared/agent-service.log (19 identical _run_agent_loop:580 errors in 13 seconds) and ~/Shared/actions.jsonl (auto-run round 1..20 all firing on the same model with no recovery).

Three independent bugs collided. This PR fixes each in isolation.

Bug 1 — temperature attached to models that reject it

  • New mira_engine/providers/model_compat.py centralises the rule (currently claude-opus-4-7).
  • AzureOpenAIProvider._supports_temperature consults it on top of its existing gpt-5 / o1 / o3 / o4 blocklist.
  • LiteLLMProvider.chat pops temperature from the request kwargs before model-specific overrides apply.
  • AnthropicProvider._build_kwargs conditionally skips temperature on every code path (adaptive thinking, enabled thinking, plain).

Adding a model to the blocklist is now a one-line registry change.

Bug 2 — non-retryable raised exceptions burned the fallback chain

RoutedProviderManager.chat already classified non-retryable error responses via _should_retry_with_fallback, but the except Exception branch unconditionally walked to the next candidate. A raised BadRequestError therefore tried every candidate even though they all fail identically. Apply the same classifier on str(exc) and re-raise immediately on non-retryable errors.

Bug 3 — auto-run kept spinning on LLM provider errors

_evaluate_continuation only stopped on failure responses when strictHeuristics was on. In PRJ-0002 there was no automation_policy, so it defaulted to relaxed heuristics and kept calling the LLM 20 times with the same parameter error. Add an unconditional _looks_like_llm_provider_error check that halts auto mode with stop_reason="llm provider error" regardless of policy. Detects markers from every provider wrapper (Error calling LLM, Error calling Azure OpenAI, litellm.BadRequestError, Azure_aiException, invalid_request_error, All candidate models failed for this turn, …).

Files

mira_engine/providers/model_compat.py              (new)
mira_engine/providers/azure_openai_provider.py
mira_engine/providers/litellm_provider.py
mira_engine/providers/anthropic_provider.py
mira_engine/agent/routing.py
mira_engine/agent/research_loop.py
tests/providers/test_model_compat.py               (new)
tests/providers/test_azure_openai_provider.py
tests/test_model_routing.py
tests/test_research_loop_core.py

Test plan

  • pytest tests/providers/test_model_compat.py tests/providers/test_azure_openai_provider.py — 31 passed (covers claude-opus-4-7 under every provider prefix and Azure body builder dropping temperature)
  • pytest tests/test_model_routing.py — 12 passed (new: retryable-raised falls back; non-retryable-raised does not invoke fallback candidate)
  • pytest tests/test_research_loop_core.py — 14 passed (new: _looks_like_llm_provider_error markers; _evaluate_continuation halts with "llm provider error" even under relaxed heuristics)
  • Full regression on tests/providers/, tests/test_model_routing.py, tests/test_research_loop_core.py, tests/test_agent_loop_core.py, tests/test_agent_loop.py272 passed
  • ruff check on newly touched files — clean (pre-existing W293/F841/I001 in unrelated lines confirmed via stash diff)
  • Re-run a claude-opus-4-7 auto session against staging to confirm the surface error stays a single round and the loop halts with stop_reason="llm provider error"

…ture`

PRJ-0002 (2026-05-14) wedged on `azure/anthropic/claude-opus-4-7`
returning `invalid_request_error: \`temperature\` is deprecated for this
model.` Every auto-run round re-hit the same parameter error until
`_AUTO_MAX_ROUNDS` (20) was exhausted, surfacing the error to the user
instead of any research result.

Three independent bugs collided:

1. `temperature` was unconditionally attached to outbound requests for
   models that no longer accept it. Centralise the rule in a new
   `providers.model_compat` registry (currently lists `claude-opus-4-7`)
   and gate temperature emission on it in the Azure, LiteLLM, and
   Anthropic providers. Azure's existing `_supports_temperature` rule
   for `gpt-5`/`o*` deployments is preserved on top.

2. `RoutedProviderManager.chat` blindly walked the fallback chain when a
   provider RAISED rather than returned an error response, so a
   permanent 4xx burned every remaining candidate. Apply the same
   `_should_retry_with_fallback` classification used on the response
   path; non-retryable exceptions now short-circuit immediately.

3. `_evaluate_continuation` only stopped auto mode on failure responses
   when `strictHeuristics` was on. LLM-provider errors are NOT
   experiment outcomes — they mean the model never produced a turn, so
   the next round will hit the same error. Add an unconditional
   `_looks_like_llm_provider_error` check that halts auto mode with
   `stop_reason="llm provider error"` regardless of policy.

Tests cover the model_compat blocklist under every provider prefix,
the Azure body builder dropping temperature for `claude-opus-4-7`,
non-retryable raised exceptions not burning the fallback chain, and
auto-run halting on the exact error text observed in PRJ-0002.
…rovider too

Second occurrence at 2026-05-14 21:09: the temperature error reappeared
even after the first PR fix. Root cause: `OpenAICompatProvider`
(used by `custom` provider configs and by `GitHubCopilotProvider` via
inheritance) keeps its own `_supports_temperature` rule that only
blocked GPT-5 / o1 / o3 / o4 deployments. When a user's OpenAI-compatible
endpoint proxies to Azure-hosted `claude-opus-4-7`, this path still
attached `temperature` and Azure 400'd with
`invalid_request_error: \`temperature\` is deprecated for this model.`

Have `_supports_temperature` also consult the shared
`providers.model_compat` registry. Same pattern as Azure / LiteLLM /
Anthropic providers from the parent commit. The error-format trail
(`Error: {'message':...}`) comes from `_handle_error` in
`openai_compat_provider.py:811`, which confirms this code path is the
one the user's config hits.

Adds two regression tests:
- `_supports_temperature` returns False for `claude-opus-4-7` under
  every provider prefix.
- `_build_kwargs` AND `_build_responses_body` both omit `temperature`
  from the outbound request body for `azure/anthropic/claude-opus-4-7`.
@ChenglongWang ChenglongWang force-pushed the fix/auto-run-temperature-and-error-loop branch from e133e75 to 2dd9140 Compare May 14, 2026 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant