How to configure vllm to gracefully mark the too-long inputs without throwing? #16730
vadimkantorov
announced in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm using vllm to batch-process quite a lot of inputs. Currently
LLM(...).generate(...)
throws when it encounters some text-input which causes too-many-input-tokens after tokenization.Is it possible to configure vllm to gracefully return this error in
model_outputs
and still process as usual all other inputs?Alternatively, it might be good to allow specifying some "mitigation strategies": keep only first
self.model_config.max_model_len
tokens or keep only lastself.model_config.max_model_len
tokens.I'm providing text input to
.generate(...)
, so to be able to filter out bad requests I would need to invoke the tokenization logic prior to sending the input to the model - this is quite cumbersome and error-proneThanks!
Beta Was this translation helpful? Give feedback.
All reactions