-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: Model not found when enable vllm api key #150
Comments
hey @JustinDuy we are unable to reproduce your error using your querying command. Could you provide details on how you started vLLM engines? |
@YuhanLiu11 : I start vllm serve with LLVM_API_KEY env variable set from k8s secret |
@YuhanLiu11 : have you taken a look at the models endpoint request inside service discovery that i posted above? I just wonder how does it work when the vllm server is secured by api key (see https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py#L745) and the key itself is not passed through header? something like this would make sense: headers = {"Authorization": f"Bearer {VLLM_API_KEY}"} response = requests.get(url, headers) |
The core problem is that k8s service discovery hinges on a model list API. However, currently, there's no way to obtain an authorization token to access the model through this API. It appears that the authorization token has been set manually. Notably, I've observed that the Helm chart doesn't have settings for configuring this token. I put forward two viable solutions:
|
Yes this can be a quick fix to this bug 😄, but we'll still need something like what's brought up by @ggaaooppeenngg to let the router be aware of the API key. I can take a stab once I have bandwidth. |
Describe the bug
I am using lmstack-router as a load balancer for my vllm server, it is not working when i serve openai vllm with an api key with a 404 Unauthorized error. I believe the problem is '/v1/models' endpoint does not take a Bearer token at the moment to be able to verify against with openai vllm server. https://github.com/vllm-project/production-stack/blob/main/src/vllm_router/service_discovery.py#L136
To Reproduce
Enable vllm api key by setting VLLM_API_KEY in the deployment. call curl cmd after port-forwarding from k8s service: curl -X POST http://localhost:30080/completions
-H "Content-Type: application/json"
-H "Authorization: Bearer $VLLM_API_KEY"
-d '{
"model": "/model/qwen/Qwen2-VL-7B-Instruct",
"prompt": "Once upon a time,",
"max_tokens": 10
}'
Expected behavior
router's service_discovery can list all models
Additional context
No response
The text was updated successfully, but these errors were encountered: