Skip to content

[BUG] concurrent_requests in litellm backend limited to 100 #1100

@yt0428

Description

@yt0428

Describe the bug

I am evaluating a model deployed with vllm locally and I want to increase the concurrent_requests to 128 to speed up the evaluation. However, it seems there are some limitation and the number of concurrent requests sent to the vllm server can not surpass 100. I am sure that the deployed vllm server can handle 128 concurrent request, so I wondering if there are any configs that I missed in lighteval or litellm?

To Reproduce

My config:

model_parameters:
    provider: "hosted_vllm"
    model_name: "hosted_vllm/qwen"
    base_url: "http://localhost:8000/v1" 
    api_key: "" 
    timeout: 100000000
    concurrent_requests: 128
    max_model_length: 38912
    generation_parameters:
      temperature: 0.6
      top_p: 0.9
      top_k: 20
      seed: 42

My conmand:
lighteval endpoint litellm lighteval_vllm_config.yaml $DATASET --output-dir ./results --save-details

Expected behavior

The vllm server can receive 128 concurrent requests.

Version info

lighteval version: 0.13.0
litellm version: 1.80.8

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions