Skip to content

Conversation

maxdebayser
Copy link
Collaborator

In V0, warmup shapes that result in sequence lengths longer than the maximum sequence length that the model supports are not validated. When a request that is between the two values comes in, it results in a server crash:

WARNING 04-23 02:30:31 [scheduler.py:717] Input prompt (306 tokens) is too long and exceeds limit of 256
CRITICAL 04-23 02:30:31 [launcher.py:116] MQLLMEngine is already dead, terminating server process
INFO:     127.0.0.1:54294 - "POST /v1/embeddings HTTP/1.1" 500 Internal Server Error
ERROR 04-23 02:30:31 [engine.py:160] ValueError('Sampling parameters are missing for a CompletionRequest.')
ERROR 04-23 02:30:31 [engine.py:160] Traceback (most recent call last):
ERROR 04-23 02:30:31 [engine.py:160]   File "/opt/vllm/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 158, in start
ERROR 04-23 02:30:31 [engine.py:160]     self.run_engine_loop()
ERROR 04-23 02:30:31 [engine.py:160]   File "/opt/vllm/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 221, in run_engine_loop
ERROR 04-23 02:30:31 [engine.py:160]     request_outputs = self.engine_step()
ERROR 04-23 02:30:31 [engine.py:160]                       ^^^^^^^^^^^^^^^^^^
ERROR 04-23 02:30:31 [engine.py:160]   File "/opt/vllm/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 247, in engine_step
ERROR 04-23 02:30:31 [engine.py:160]     raise e
ERROR 04-23 02:30:31 [engine.py:160]   File "/opt/vllm/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 230, in engine_step
ERROR 04-23 02:30:31 [engine.py:160]     return self.engine.step()
ERROR 04-23 02:30:31 [engine.py:160]            ^^^^^^^^^^^^^^^^^^
ERROR 04-23 02:30:31 [engine.py:160]   File "/opt/vllm/lib64/python3.11/site-packages/vllm/engine/llm_engine.py", line 1493, in step
ERROR 04-23 02:30:31 [engine.py:160]     self._process_model_outputs(ctx=ctx)
ERROR 04-23 02:30:31 [engine.py:160]   File "/opt/vllm/lib64/python3.11/site-packages/vllm/engine/llm_engine.py", line 1220, in _process_model_outputs
ERROR 04-23 02:30:31 [engine.py:160]     request_output = RequestOutputFactory.create(
ERROR 04-23 02:30:31 [engine.py:160]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-23 02:30:31 [engine.py:160]   File "/opt/vllm/lib64/python3.11/site-packages/vllm/outputs.py", line 392, in create
ERROR 04-23 02:30:31 [engine.py:160]     return RequestOutput.from_seq_group(seq_group, use_cache,
ERROR 04-23 02:30:31 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-23 02:30:31 [engine.py:160]   File "/opt/vllm/lib64/python3.11/site-packages/vllm/outputs.py", line 181, in from_seq_group
ERROR 04-23 02:30:31 [engine.py:160]     raise ValueError(
ERROR 04-23 02:30:31 [engine.py:160] ValueError: Sampling parameters are missing for a CompletionRequest.
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [705]

Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

max_seq_len = max(max_seq_len,
shape["prompt_length"] + shape["new_tokens"])
if max_seq_len > max_model_len:
raise RuntimeError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this check be moved into get_warmup_shapes where other validations on the warmup shapes occur?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this sounds reasonable. other than that it looks good to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did move it into get_warmup_shapes, but now it requires an extra parameter.

Copy link
Collaborator

@yannicks1 yannicks1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, when moving check into cls.get_warmup_shapes()

@joerunde joerunde enabled auto-merge (squash) June 5, 2025 16:23
@github-actions github-actions bot added the ready label Jun 5, 2025
@joerunde joerunde merged commit c0269a3 into main Jun 5, 2025
22 checks passed
@joerunde joerunde deleted the model_len_constraint branch June 5, 2025 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants