-
Notifications
You must be signed in to change notification settings - Fork 26
Don't allow warmup shapes that exceed the max sequence length #185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Max de Bayser <[email protected]>
👋 Hi! Thank you for contributing to vLLM support on Spyre.
Or this can be done with
Now you are good to go 🚀 |
vllm_spyre/platform.py
Outdated
max_seq_len = max(max_seq_len, | ||
shape["prompt_length"] + shape["new_tokens"]) | ||
if max_seq_len > max_model_len: | ||
raise RuntimeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this check be moved into get_warmup_shapes
where other validations on the warmup shapes occur?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this sounds reasonable. other than that it looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did move it into get_warmup_shapes, but now it requires an extra parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, when moving check into cls.get_warmup_shapes()
Signed-off-by: Max de Bayser <[email protected]>
In V0, warmup shapes that result in sequence lengths longer than the maximum sequence length that the model supports are not validated. When a request that is between the two values comes in, it results in a server crash: