-
-
Notifications
You must be signed in to change notification settings - Fork 53
feat(rotator_library): add configurable cross-provider fallback #85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
feat(rotator_library): add configurable cross-provider fallback #85
Conversation
|
I'm diving into the cross-provider fallback implementation! I'll be analyzing the changes across the rotator library and the new dry-run harness to ensure everything is sticky and robust. Back with a full review soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall Assessment
This PR introduces a solid cross-provider fallback mechanism that significantly improves the resilience of the rotator library. The addition of configurable cooldowns to prevent provider flapping is a well-thought-out architectural detail. The implementation is clean and integrates well with the existing UsageManager and CooldownManager.
Architectural Feedback
- Compatibility Mapping: The hardcoded compatibility between
gemini_cliandantigravityis fine for now but should be externalized to configuration as the number of providers grows. - Fallback Logic: The fallback logic is soundly implemented with recursion-prevention (
_is_fallback_attempt), which is critical for stability.
Key Suggestions
- Code Duplication: Consolidating the fallback logic in the streaming completion method will reduce the risk of divergent behavior in future updates.
- Dry-Run Harness: The new script is very useful; just a few minor cleanups needed regarding imports and cleanup logic.
Questions for the Author
- Did you consider making the
provider_compatibilitymap configurable via environment variables in this PR, or is that planned for a later iteration?
This review was generated by an AI assistant.
| @@ -0,0 +1,259 @@ | |||
| import asyncio | |||
| import asyncio | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: Duplicated import of asyncio.
scripts/fallback_dry_run.py
Outdated
| finally: | ||
| litellm.acompletion = original_acompletion | ||
|
|
||
| original_streaming = litellm.acompletion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This saves the currently mocked acompletion instead of the original one. Since original_acompletion was already saved and restored properly in the previous block (lines 224-225), this variable and its corresponding restoration in the finally block (line 247) are redundant.
| self.provider_compatibility = { | ||
| "gemini_cli": "antigravity", | ||
| "antigravity": "gemini_cli", | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The provider compatibility map is currently hardcoded in __init__. To make this more maintainable and extensible, consider moving this mapping to a configuration file or environment variable (e.g., PROVIDER_COMPATIBILITY="gemini_cli:antigravity,antigravity:gemini_cli").
| # [Cross-Provider Fallback] | ||
| fallback_model = await self._get_fallback_model(model) | ||
| if fallback_model and not kwargs.get("_is_fallback_attempt"): | ||
| fallback_provider = fallback_model.split("/", 1)[0] | ||
| lib_logger.warning( | ||
| f"Cross-Provider Fallback: {model} exhausted (all keys failed). " | ||
| f"Switching to {fallback_model}." | ||
| ) | ||
| await self._apply_fallback_cooldown(provider, fallback_provider) | ||
| kwargs["model"] = fallback_model | ||
| kwargs["_is_fallback_attempt"] = True | ||
|
|
||
| async for chunk in self._streaming_acompletion_with_retry( | ||
| request, pre_request_callback, **kwargs | ||
| ): | ||
| yield chunk | ||
| return | ||
|
|
||
| # Build structured error response for client | ||
| error_response = error_accumulator.build_client_error_response() | ||
| error_data = error_response | ||
| else: | ||
| # Fallback if no errors were recorded (shouldn't happen) | ||
| final_error_message = ( | ||
| "Request failed: No available API keys after rotation or timeout." | ||
| ) | ||
| if last_exception: | ||
| final_error_message = ( | ||
| f"Request failed. Last error: {str(last_exception)}" | ||
| ) | ||
| error_data = { | ||
| "error": {"message": final_error_message, "type": "proxy_error"} | ||
| } | ||
| lib_logger.error(final_error_message) | ||
|
|
||
| yield f"data: {json.dumps(error_data)}\n\n" | ||
| yield "data: [DONE]\n\n" | ||
|
|
||
| except NoAvailableKeysError as e: | ||
| # [Cross-Provider Fallback] | ||
| fallback_model = await self._get_fallback_model(model) | ||
| if fallback_model and not kwargs.get("_is_fallback_attempt"): | ||
| fallback_provider = fallback_model.split("/", 1)[0] | ||
| lib_logger.warning( | ||
| f"Cross-Provider Fallback: {model} exhausted (NoAvailableKeysError). " | ||
| f"Switching to {fallback_model}." | ||
| ) | ||
| await self._apply_fallback_cooldown(provider, fallback_provider) | ||
| kwargs["model"] = fallback_model | ||
| kwargs["_is_fallback_attempt"] = True | ||
|
|
||
| # Delegate to new stream generator | ||
| async for chunk in self._streaming_acompletion_with_retry( | ||
| request, pre_request_callback, **kwargs | ||
| ): | ||
| yield chunk | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cross-provider fallback logic is duplicated across the retry loop (lines 2902-2918) and the NoAvailableKeysError handler (lines 2941-2958) in _streaming_acompletion_with_retry. Consolidating this into a private helper method would improve maintainability and ensure fallback behavior remains consistent regardless of how the exhaustion is detected.
| ) | ||
| return None | ||
|
|
||
| available_models = await self.get_available_models(fallback_provider) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If get_available_models returns an empty list (perhaps due to a transient API failure), _get_fallback_model will fall through and return the fallback_model ID anyway. It might be safer to return None if the provider explicitly reports no models available, to avoid sending requests to a potentially misconfigured or broken provider.
Summary
Testing
Important
Adds cross-provider fallback with configurable cooldown to the rotator library, enhancing session stability.
client.py.FALLBACK_COOLDOWN_MULTIPLIERandFALLBACK_COOLDOWN_MIN_SECONDSin.env.exampleandsettings_tool.py._get_fallback_model()and_apply_fallback_cooldown()inclient.pyfor fallback logic.fallback_dry_run.pyfor testing fallback logic without real provider impact.provider_interface.pyto include default fallback cooldown settings.defaults.pyto define default values for fallback cooldown settings.This description was created by
for 6e74205. You can customize this summary. It will automatically update as commits are pushed.