Skip to content

Conversation

@yannicks1
Copy link
Collaborator

[SB] relax constraint on min number of new tokens

this is relaxing an old constraint on the number of requested new tokens having to be a min of 3. Turns out it is only important that during the warmup there is at least one decode forward pass. Requesting 1 token runs prefill only during warmup -> compiler crashes (I guess it is expecting two graphs, prefill and decode) . Requesting 2+ tokens does at least one decode during warmup and thus produces a decode graph too -> things run smoothly for 2+ tokens ...

@github-actions
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

@yannicks1
Copy link
Collaborator Author

To my knowledge we previously didn't have a test for the case num_decode_tokens=3, hence I didn't add one for num_decode_tokens=2. Do we want such a test?

@yannicks1 yannicks1 requested a review from joerunde July 17, 2025 11:59
Copy link
Collaborator

@sducouedic sducouedic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing this

Edit 1: check code change suggestion before merging
Edit 2: feel free to add such a test, it wouldn't hurt to have one. I guess it would belong to test_spyre_warmup_shapes.py?

Copy link
Collaborator

@joerunde joerunde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lpgtm!

This was confusing to at least one user already who thought that meant that you also had to request at least 3 tokens in each api call, but I don't think we should focus too much on this anyway since continuous batching is almost ready

@yannicks1 yannicks1 enabled auto-merge (squash) July 21, 2025 06:47
@yannicks1
Copy link
Collaborator Author

yeah, it was just something I noticed when having a look at another piece of the code..

@github-actions github-actions bot added the ready label Jul 21, 2025
@yannicks1 yannicks1 force-pushed the ysc-relax-constraint-num-tokens branch from be3b54a to 243d571 Compare July 21, 2025 06:58
@yannicks1 yannicks1 merged commit a2f96ba into main Jul 21, 2025
18 of 19 checks passed
@yannicks1 yannicks1 deleted the ysc-relax-constraint-num-tokens branch July 21, 2025 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants