Skip to content

Conversation

yannicks1
Copy link
Collaborator

@yannicks1 yannicks1 commented Sep 17, 2025

[CB][FP8] fix batch size 1

We require an effective batch size of >= 2 for warmup and decodes when running fp8 continuous batching. This PR allows --max_num_seqs 1 still to be set and served.

As we pad single sequences, which may also occur when serving --max_num_seqs>=2, to batch size 2 anyway, the only change needed was to have the input_batch hold two sequences for warmup.

We had the exact same issue for fp16 in the past, see #347 and #312

UPDATE: upstream logic has changed since, fix is more involved. De-prioritizing this and throw an error instead (#467)

Signed-off-by: Yannick Schnider <[email protected]>
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

@wallashss
Copy link
Collaborator

I think this won't work. I tested something similar and it crashed.

I think the problem is here:

        logits_processors = get_builtin_logits_processors(self.vllm_config)

Probably this method reads the scheduler config, and you'll got this crash in warmup:

  File "/home/senuser/vllm-spyre/vllm_spyre/v1/worker/spyre_input_batch.py", line 580, in refresh_metadata
    logit_proc.update_state(batch_update)
  File "/home/senuser/my-vllm/lib64/python3.12/site-packages/vllm/v1/sample/logits_processor/builtin.py", line 56, in update_state
    if self.min_p_cpu[index] != min_p:
       ~~~~~~~~~~~~~~^^^^^^^
IndexError: index 1 is out of bounds for axis 0 with size 1

@joerunde
Copy link
Collaborator

agreed, I see that error too.

@yannicks1 I don't think we need to fix this, we don't really expect users to need to run with batch size 1. But if you do want to do it for completeness, we should probably have a test to cover it.

@yannicks1
Copy link
Collaborator Author

ahh, I see. I believe we did not have the logits_processor logic in upstream vLLM when we solved it for fp16... too bad.
Agree with @joerunde, not worth the additional effort. Lets merge #467 instead and circle back when torch 2.8 is supported. Thanks a lot for trying this @wallashss !

@yannicks1 yannicks1 changed the title [CB][FP8] fix batch size 1 [WIP][CB][FP8] fix batch size 1 Sep 18, 2025
joerunde pushed a commit that referenced this pull request Sep 18, 2025
### [CB][FP8] throw error for batch size 1

intermediate solution until #466 can be merged safely.

Signed-off-by: Yannick Schnider <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants