How to configure vllm to gracefully mark the too-long inputs without throwing? #16730

vadimkantorov · 2025-04-16T15:35:07Z

vadimkantorov
Apr 16, 2025

I'm using vllm to batch-process quite a lot of inputs. Currently LLM(...).generate(...) throws when it encounters some text-input which causes too-many-input-tokens after tokenization.

Is it possible to configure vllm to gracefully return this error in model_outputs and still process as usual all other inputs?

Alternatively, it might be good to allow specifying some "mitigation strategies": keep only first self.model_config.max_model_len tokens or keep only last self.model_config.max_model_len tokens.

I'm providing text input to .generate(...), so to be able to filter out bad requests I would need to invoke the tokenization logic prior to sending the input to the model - this is quite cumbersome and error-prone

Thanks!

File "...", line ..., in <module>                                                        
model_outputs = model.generate(                                                                                                    
File "/home/inferencer/.local/lib/python3.10/site-packages/vllm/utils.py", line 1131, in inner                                         
return fn(*args, **kwargs)                                                                                                         
File "/home/inferencer/.local/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 457, in generate                             
self._validate_and_add_requests(                                                                                                   
File "/home/inferencer/.local/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 1317, in _validate_and_add_requests          
self._add_request(
  
File "/home/inferencer/.local/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 1335, in _add_request                        
self.llm_engine.add_request(                                                                                                       
File "/home/inferencer/.local/lib/python3.10/site-packages/vllm/v1/engine/llm_engine.py", line 186, in add_request
    request = self.processor.process_inputs(request_id, prompt, params,
  
File "/home/inferencer/.local/lib/python3.10/site-packages/vllm/v1/engine/processor.py", line 209, in process_inputs
    self._validate_model_inputs(processed_inputs, lora_request)                                                                        
File "/home/inferencer/.local/lib/python3.10/site-packages/vllm/v1/engine/processor.py", line 308, in _validate_model_inputs
    raise ValueError(                                                                                                                
ValueError: Prompt length of 4576 is longer than the maximum model length of 4096.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to configure vllm to gracefully mark the too-long inputs without throwing? #16730

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

How to configure vllm to gracefully mark the too-long inputs without throwing? #16730

Uh oh!

Uh oh!

vadimkantorov Apr 16, 2025

Replies: 0 comments

vadimkantorov
Apr 16, 2025