[SB] relax constraint on min number of new tokens #322

yannicks1 · 2025-07-17T11:56:19Z

[SB] relax constraint on min number of new tokens

this is relaxing an old constraint on the number of requested new tokens having to be a min of 3. Turns out it is only important that during the warmup there is at least one decode forward pass. Requesting 1 token runs prefill only during warmup -> compiler crashes (I guess it is expecting two graphs, prefill and decode) . Requesting 2+ tokens does at least one decode during warmup and thus produces a decode graph too -> things run smoothly for 2+ tokens ...

Signed-off-by: Yannick Schnider <[email protected]>

github-actions · 2025-07-17T11:56:30Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

yannicks1 · 2025-07-17T11:58:49Z

To my knowledge we previously didn't have a test for the case num_decode_tokens=3, hence I didn't add one for num_decode_tokens=2. Do we want such a test?

sducouedic

LGTM, thanks for fixing this

Edit 1: check code change suggestion before merging
Edit 2: feel free to add such a test, it wouldn't hurt to have one. I guess it would belong to test_spyre_warmup_shapes.py?

tests/e2e/test_spyre_basic.py

joerunde

lpgtm!

This was confusing to at least one user already who thought that meant that you also had to request at least 3 tokens in each api call, but I don't think we should focus too much on this anyway since continuous batching is almost ready

yannicks1 · 2025-07-21T06:47:34Z

yeah, it was just something I noticed when having a look at another piece of the code..

yannicks1 added 2 commits July 17, 2025 13:49

relax constraint of min num output tokens from 3 to 2

b104329

Signed-off-by: Yannick Schnider <[email protected]>

refactor test script with for loop

243d571

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 requested review from nikolaospapandreou, prashantgupta24, rafvasq, sducouedic and tdoublep as code owners July 17, 2025 11:56

yannicks1 requested a review from joerunde July 17, 2025 11:59

sducouedic approved these changes Jul 18, 2025

View reviewed changes

tests/e2e/test_spyre_basic.py Show resolved Hide resolved

joerunde approved these changes Jul 18, 2025

View reviewed changes

yannicks1 enabled auto-merge (squash) July 21, 2025 06:47

github-actions bot added the ready label Jul 21, 2025

yannicks1 force-pushed the ysc-relax-constraint-num-tokens branch from be3b54a to 243d571 Compare July 21, 2025 06:58

yannicks1 merged commit a2f96ba into main Jul 21, 2025
18 of 19 checks passed

yannicks1 deleted the ysc-relax-constraint-num-tokens branch July 21, 2025 06:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SB] relax constraint on min number of new tokens #322

[SB] relax constraint on min number of new tokens #322

Uh oh!

yannicks1 commented Jul 17, 2025

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

yannicks1 commented Jul 17, 2025

Uh oh!

sducouedic left a comment •

edited

Loading

Uh oh!

Uh oh!

joerunde left a comment

Uh oh!

yannicks1 commented Jul 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SB] relax constraint on min number of new tokens #322

[SB] relax constraint on min number of new tokens #322

Uh oh!

Conversation

yannicks1 commented Jul 17, 2025