Skip to content

Conversation

@IgorWarzocha
Copy link

@IgorWarzocha IgorWarzocha commented Jan 18, 2026

Summary

  • add configurable fallback cooldown controls to keep cross-provider fallback sticky and reduce flapping
  • surface fallback cooldown settings in the provider TUI and defaults for gemini_cli/antigravity, with provider overrides
  • harden fallback model resolution with whitelist/ignore and availability checks, plus provider-wide cooldown after switch
  • include a dry-run harness to audit key rotation and provider fallback flow without hitting real providers
  • goal: ensure agentic sessions keep running when a compatible provider is available

Testing

  • python scripts/fallback_dry_run.py
  • real provider tests not run (timeouts/rate limits are infrequent and hard to trigger)

Important

Adds cross-provider fallback with configurable cooldown to the rotator library, enhancing session stability.

  • Behavior:
    • Adds cross-provider fallback with configurable cooldown in client.py.
    • Introduces FALLBACK_COOLDOWN_MULTIPLIER and FALLBACK_COOLDOWN_MIN_SECONDS in .env.example and settings_tool.py.
    • Implements _get_fallback_model() and _apply_fallback_cooldown() in client.py for fallback logic.
  • Testing:
    • Adds fallback_dry_run.py for testing fallback logic without real provider impact.
  • Configuration:
    • Updates provider_interface.py to include default fallback cooldown settings.
    • Modifies defaults.py to define default values for fallback cooldown settings.

This description was created by Ellipsis for 6e74205. You can customize this summary. It will automatically update as commits are pushed.

@mirrobot-agent
Copy link
Contributor

I'm diving into the cross-provider fallback implementation! I'll be analyzing the changes across the rotator library and the new dry-run harness to ensure everything is sticky and robust. Back with a full review soon.

Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Assessment

This PR introduces a solid cross-provider fallback mechanism that significantly improves the resilience of the rotator library. The addition of configurable cooldowns to prevent provider flapping is a well-thought-out architectural detail. The implementation is clean and integrates well with the existing UsageManager and CooldownManager.

Architectural Feedback

  • Compatibility Mapping: The hardcoded compatibility between gemini_cli and antigravity is fine for now but should be externalized to configuration as the number of providers grows.
  • Fallback Logic: The fallback logic is soundly implemented with recursion-prevention (_is_fallback_attempt), which is critical for stability.

Key Suggestions

  • Code Duplication: Consolidating the fallback logic in the streaming completion method will reduce the risk of divergent behavior in future updates.
  • Dry-Run Harness: The new script is very useful; just a few minor cleanups needed regarding imports and cleanup logic.

Questions for the Author

  • Did you consider making the provider_compatibility map configurable via environment variables in this PR, or is that planned for a later iteration?

This review was generated by an AI assistant.

@@ -0,0 +1,259 @@
import asyncio
import asyncio
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: Duplicated import of asyncio.

finally:
litellm.acompletion = original_acompletion

original_streaming = litellm.acompletion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This saves the currently mocked acompletion instead of the original one. Since original_acompletion was already saved and restored properly in the previous block (lines 224-225), this variable and its corresponding restoration in the finally block (line 247) are redundant.

Comment on lines +599 to +602
self.provider_compatibility = {
"gemini_cli": "antigravity",
"antigravity": "gemini_cli",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provider compatibility map is currently hardcoded in __init__. To make this more maintainable and extensible, consider moving this mapping to a configuration file or environment variable (e.g., PROVIDER_COMPATIBILITY="gemini_cli:antigravity,antigravity:gemini_cli").

Comment on lines 2902 to 2958
# [Cross-Provider Fallback]
fallback_model = await self._get_fallback_model(model)
if fallback_model and not kwargs.get("_is_fallback_attempt"):
fallback_provider = fallback_model.split("/", 1)[0]
lib_logger.warning(
f"Cross-Provider Fallback: {model} exhausted (all keys failed). "
f"Switching to {fallback_model}."
)
await self._apply_fallback_cooldown(provider, fallback_provider)
kwargs["model"] = fallback_model
kwargs["_is_fallback_attempt"] = True

async for chunk in self._streaming_acompletion_with_retry(
request, pre_request_callback, **kwargs
):
yield chunk
return

# Build structured error response for client
error_response = error_accumulator.build_client_error_response()
error_data = error_response
else:
# Fallback if no errors were recorded (shouldn't happen)
final_error_message = (
"Request failed: No available API keys after rotation or timeout."
)
if last_exception:
final_error_message = (
f"Request failed. Last error: {str(last_exception)}"
)
error_data = {
"error": {"message": final_error_message, "type": "proxy_error"}
}
lib_logger.error(final_error_message)

yield f"data: {json.dumps(error_data)}\n\n"
yield "data: [DONE]\n\n"

except NoAvailableKeysError as e:
# [Cross-Provider Fallback]
fallback_model = await self._get_fallback_model(model)
if fallback_model and not kwargs.get("_is_fallback_attempt"):
fallback_provider = fallback_model.split("/", 1)[0]
lib_logger.warning(
f"Cross-Provider Fallback: {model} exhausted (NoAvailableKeysError). "
f"Switching to {fallback_model}."
)
await self._apply_fallback_cooldown(provider, fallback_provider)
kwargs["model"] = fallback_model
kwargs["_is_fallback_attempt"] = True

# Delegate to new stream generator
async for chunk in self._streaming_acompletion_with_retry(
request, pre_request_callback, **kwargs
):
yield chunk
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cross-provider fallback logic is duplicated across the retry loop (lines 2902-2918) and the NoAvailableKeysError handler (lines 2941-2958) in _streaming_acompletion_with_retry. Consolidating this into a private helper method would improve maintainability and ensure fallback behavior remains consistent regardless of how the exhaustion is detected.

)
return None

available_models = await self.get_available_models(fallback_provider)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If get_available_models returns an empty list (perhaps due to a transient API failure), _get_fallback_model will fall through and return the fallback_model ID anyway. It might be safer to return None if the provider explicitly reports no models available, to avoid sending requests to a potentially misconfigured or broken provider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant