Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support per user api-key for multi-tenant use case #753

Open
Jeffwan opened this issue Feb 26, 2025 · 1 comment
Open

Support per user api-key for multi-tenant use case #753

Jeffwan opened this issue Feb 26, 2025 · 1 comment

Comments

@Jeffwan
Copy link
Collaborator

Jeffwan commented Feb 26, 2025

🚀 Feature Description and Motivation

Background

Currently, vLLM only supports a single API key for authentication, making it difficult to share the inference engine across multiple tenants. Extending vLLM to support multiple keys is an option, but this would be a static solution. A more flexible approach is needed to handle multi-tenant API key management dynamically.

Proposed Solutions

Option 1: Extend vLLM to Support External Authentication

  • vLLM integrates with an external authentication server to validate API keys dynamically.
  • This approach allows for greater flexibility but introduces external dependencies. overhead is another concern

Option 2: Manage API Keys Outside of vLLM

Option 2a: User-Managed Authentication (Bring Your Own Stack)

  • Users adopt an external authentication solution (e.g., Istio, OAuth, or API gateways) to manage API keys.

Option 2b: Extend AIBrix Gateway for Multi-Tenant API Key Management

  • AIBrix Gateway already has a basic user concept and rate-limiting control.
  • The extension would associate users with API keys, providing built-in multi-tenancy support.

Future Considerations

In addition to authentication, we want to support tenant-aware optimizations within vLLM. The gateway should attach tenant metadata (e.g., X-Tenant-ID, X-Priority, JWT claims) before forwarding the request to vLLM. This would enable the inference engine to make tenant-aware optimizations, such as priority-based scheduling or resource allocation.

Open Questions

  • Which approach aligns best with the vLLM architecture?
  • Should vLLM natively support dynamic authentication, or should this be handled externally?
  • How can we ensure a smooth integration between vLLM and the authentication layer without introducing significant overhead?

/cc @simon-mo @robertgshaw2-redhat @gaocegege @kerthcet

Use Case

Support multi-tenancy for vLLM

Proposed Solution

No response

@jolfr
Copy link
Contributor

jolfr commented Mar 1, 2025

One use-case I'd love to see supported as a tenant-aware optimization is tenant-based LoRA adapters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants