Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does aibrix support to do load balance against managed model endpoints #784

Open
Colstuwjx opened this issue Mar 3, 2025 · 3 comments
Open
Labels
area/gateway triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@Colstuwjx
Copy link

For example, is it possible to use aibrix gateway to do load balance for azure openai endpoints so that we can take advantage of the gateway features like prefix cache load balance.

@Jeffwan
Copy link
Collaborator

Jeffwan commented Mar 3, 2025

@Colstuwjx

A quick question, for manage endpoints, we do not have any control on the behavior. Do you make the assumption that Azure deployment will automatically cache it for you if aibrix gateway route to Azure openai endpoint? We'd love to hear more on this case. Thanks!

@Jeffwan Jeffwan added area/gateway triage/needs-information Indicates an issue needs more information in order to work on it. labels Mar 3, 2025
@Colstuwjx
Copy link
Author

Hi @Jeffwan

I'm just wondering if we can try out part of features of the aibrix like gateway / autoscaling, without blocking by the engine runtime. For example, just use aibrix gateway as an individual component to resolve the gateway layer requirements like route requests by least conn or round-robin.

@kerthcet
Copy link
Collaborator

kerthcet commented Mar 3, 2025

Some ai gateways support similar behaviors as well, like kong ai gateway supporting ai providers + local models, but as a platform, I'm not convinced at this moment that this is a necessary feature, we may need more feedbacks. Most of the time, this looks like a client feature, and AIBrix could behave as an alternative backend, but who knows. My two cents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/gateway triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

3 participants