Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Model not found when enable vllm api key #150

Open
JustinDuy opened this issue Feb 18, 2025 · 6 comments
Open

bug: Model not found when enable vllm api key #150

JustinDuy opened this issue Feb 18, 2025 · 6 comments
Labels
bug Something isn't working

Comments

@JustinDuy
Copy link

JustinDuy commented Feb 18, 2025

Describe the bug

I am using lmstack-router as a load balancer for my vllm server, it is not working when i serve openai vllm with an api key with a 404 Unauthorized error. I believe the problem is '/v1/models' endpoint does not take a Bearer token at the moment to be able to verify against with openai vllm server. https://github.com/vllm-project/production-stack/blob/main/src/vllm_router/service_discovery.py#L136

To Reproduce

Enable vllm api key by setting VLLM_API_KEY in the deployment. call curl cmd after port-forwarding from k8s service: curl -X POST http://localhost:30080/completions
-H "Content-Type: application/json"
-H "Authorization: Bearer $VLLM_API_KEY"
-d '{
"model": "/model/qwen/Qwen2-VL-7B-Instruct",
"prompt": "Once upon a time,",
"max_tokens": 10
}'

Expected behavior

router's service_discovery can list all models

Additional context

No response

@JustinDuy JustinDuy added the bug Something isn't working label Feb 18, 2025
@vllm-project vllm-project deleted a comment from Shaoting-Feng Feb 19, 2025
@YuhanLiu11
Copy link
Collaborator

hey @JustinDuy we are unable to reproduce your error using your querying command. Could you provide details on how you started vLLM engines?

@JustinDuy
Copy link
Author

@YuhanLiu11 : I start vllm serve with LLVM_API_KEY env variable set from k8s secret Image

@JustinDuy
Copy link
Author

JustinDuy commented Feb 19, 2025

@YuhanLiu11 : have you taken a look at the models endpoint request inside service discovery that i posted above? I just wonder how does it work when the vllm server is secured by api key (see https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py#L745) and the key itself is not passed through header? something like this would make sense: headers = {"Authorization": f"Bearer {VLLM_API_KEY}"}

response = requests.get(url, headers)

@gaocegege
Copy link
Collaborator

cc @ggaaooppeenngg

@ggaaooppeenngg
Copy link
Contributor

The core problem is that k8s service discovery hinges on a model list API. However, currently, there's no way to obtain an authorization token to access the model through this API.

It appears that the authorization token has been set manually. Notably, I've observed that the Helm chart doesn't have settings for configuring this token.

I put forward two viable solutions:

  1. First, we can enhance the Helm chart by adding relevant settings to support the configuration of the vLLM token during deployment. Simultaneously, when setting up the router, we should include a token argument that will be used as the authorization token in the header. Another approach could be to annotate the token in the pod so that it can be retrieved when the list model API is called.
  2. As an alternative, we can choose to leave the vLLM instance without token - based security and instead implement authorization solely at the router level.

@YuhanLiu11
Copy link
Collaborator

YuhanLiu11 commented Feb 20, 2025

@YuhanLiu11 : have you taken a look at the models endpoint request inside service discovery that i posted above? I just wonder how does it work when the vllm server is secured by api key (see https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py#L745) and the key itself is not passed through header? something like this would make sense: headers = {"Authorization": f"Bearer {VLLM_API_KEY}"}

response = requests.get(url, headers)

Yes this can be a quick fix to this bug 😄, but we'll still need something like what's brought up by @ggaaooppeenngg to let the router be aware of the API key. I can take a stab once I have bandwidth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants