-
Notifications
You must be signed in to change notification settings - Fork 405
Open
Labels
Description
Describe the bug
I am evaluating a model deployed with vllm locally and I want to increase the concurrent_requests to 128 to speed up the evaluation. However, it seems there are some limitation and the number of concurrent requests sent to the vllm server can not surpass 100. I am sure that the deployed vllm server can handle 128 concurrent request, so I wondering if there are any configs that I missed in lighteval or litellm?
To Reproduce
My config:
model_parameters:
provider: "hosted_vllm"
model_name: "hosted_vllm/qwen"
base_url: "http://localhost:8000/v1"
api_key: ""
timeout: 100000000
concurrent_requests: 128
max_model_length: 38912
generation_parameters:
temperature: 0.6
top_p: 0.9
top_k: 20
seed: 42
My conmand:
lighteval endpoint litellm lighteval_vllm_config.yaml $DATASET --output-dir ./results --save-details
Expected behavior
The vllm server can receive 128 concurrent requests.
Version info
lighteval version: 0.13.0
litellm version: 1.80.8