Skip to content

Feature Request: Set maximum number of in flight #412

@TheCodeWrangler

Description

@TheCodeWrangler

When unexpected large bursts in requests come to my application I would like to be able to limit the number of requests that will be accepted by trtllm backend. I would like to be able to REJECT future requests if the number of active requests for a specific backend exceeds a threshold

I have tried with

dynamic_batching {
  default_queue_policy {
    timeout_action: REJECT
    max_queue_size: 30  
  }
}

But would like to achieve this behavior so that i can better balance my load (and not have one instance with a large backlog)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions