You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, vLLM only supports a single API key for authentication, making it difficult to share the inference engine across multiple tenants. Extending vLLM to support multiple keys is an option, but this would be a static solution. A more flexible approach is needed to handle multi-tenant API key management dynamically.
Proposed Solutions
Option 1: Extend vLLM to Support External Authentication
vLLM integrates with an external authentication server to validate API keys dynamically.
This approach allows for greater flexibility but introduces external dependencies. overhead is another concern
Option 2: Manage API Keys Outside of vLLM
Option 2a: User-Managed Authentication (Bring Your Own Stack)
Users adopt an external authentication solution (e.g., Istio, OAuth, or API gateways) to manage API keys.
Option 2b: Extend AIBrix Gateway for Multi-Tenant API Key Management
AIBrix Gateway already has a basic user concept and rate-limiting control.
The extension would associate users with API keys, providing built-in multi-tenancy support.
Future Considerations
In addition to authentication, we want to support tenant-aware optimizations within vLLM. The gateway should attach tenant metadata (e.g., X-Tenant-ID, X-Priority, JWT claims) before forwarding the request to vLLM. This would enable the inference engine to make tenant-aware optimizations, such as priority-based scheduling or resource allocation.
Open Questions
Which approach aligns best with the vLLM architecture?
Should vLLM natively support dynamic authentication, or should this be handled externally?
How can we ensure a smooth integration between vLLM and the authentication layer without introducing significant overhead?
🚀 Feature Description and Motivation
Background
Currently, vLLM only supports a single API key for authentication, making it difficult to share the inference engine across multiple tenants. Extending vLLM to support multiple keys is an option, but this would be a static solution. A more flexible approach is needed to handle multi-tenant API key management dynamically.
Proposed Solutions
Option 1: Extend vLLM to Support External Authentication
Option 2: Manage API Keys Outside of vLLM
Option 2a: User-Managed Authentication (Bring Your Own Stack)
Option 2b: Extend AIBrix Gateway for Multi-Tenant API Key Management
Future Considerations
In addition to authentication, we want to support tenant-aware optimizations within vLLM. The gateway should attach tenant metadata (e.g., X-Tenant-ID, X-Priority, JWT claims) before forwarding the request to vLLM. This would enable the inference engine to make tenant-aware optimizations, such as priority-based scheduling or resource allocation.
Open Questions
/cc @simon-mo @robertgshaw2-redhat @gaocegege @kerthcet
Use Case
Support multi-tenancy for vLLM
Proposed Solution
No response
The text was updated successfully, but these errors were encountered: