[CB] Support batch size 1 for decode, simplify warmup #312

yannicks1 · 2025-07-16T08:39:16Z

[CB] Support batch size 1 for decode, simplify warmup

As we moved to torch 2.7.1 in #307 , dynamic dimension of size 1 are supported by pytorch. Hence, batch size 1 for decode produces the same graph as batch size >= 2.
This PR relaxes the min batch size 2 constraint for decode and adapts the warmup. Previously warmup consisted of two prefills and one decode of batch size 2. The new warmup only features one prefill and one decode of batch size 1.

Signed-off-by: Yannick Schnider <[email protected]>

github-actions · 2025-07-16T08:39:25Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

yannicks1 · 2025-07-16T08:42:12Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 · 2025-07-16T11:01:30Z

bot test failed in warmup decode.

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 · 2025-07-16T11:04:31Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

yannicks1 · 2025-07-16T11:25:08Z

Note: CPU failures is expected for BS 1 (didnt adapt warmup as in #287 )

Spyre card: reverting the warmup changes results in a runtime error: compile graph failed

yannicks1 · 2025-07-16T11:27:39Z

Looks like batch size 1 for decode is not supported by the compiler yet... Priority of this is low as performance advantage is marginal paired with a limited use case.

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 · 2025-07-21T07:58:22Z

update: i tried Joshs suggestion, so far without success

yannicks1 · 2025-07-25T15:11:21Z

Note: as soon as we get this version working, I will redo the reverted warmup changes. In theory we should get away with only one prefill batch size 1 and one decode batch size 1. I did revert warmup changes (back to batch size 2 for decode as it is on main) as a stepping stone to get this working.

Signed-off-by: Yannick Schnider <[email protected]>

vllm_spyre/v1/worker/spyre_worker.py

Signed-off-by: Yannick Schnider <[email protected]>

### [CB] refactoring warmup for batch size 1 From #312 (comment) there is a request for a nicer integration of batch size 1 support during warmup. Most of the code is already on main, thus this PR. Signed-off-by: Yannick Schnider <[email protected]>

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 · 2025-08-04T16:16:05Z

bot:test
MARKERS="spyre and not quantized"

Signed-off-by: Yannick Schnider <[email protected]>

waleedqk · 2025-08-11T16:33:18Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=0

waleedqk · 2025-08-11T17:50:06Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=0

waleedqk · 2025-08-11T18:05:30Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=0

joerunde · 2025-08-11T23:19:33Z

We may need to think about this for a bit- if this is causing problems with the cache then merging it will cause all subsequent PRs to fail testing as well until we release and use the new version to populate the cache every day. (Or we'd have to run the tests without caching, which is also not fun)

Maybe we should sync a bit with @JRosenkranz and determine if this is expected behavior. Are the graph comparison tests with aftu still passing or do they show different graphs now?

waleedqk · 2025-08-12T13:35:01Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=0
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T13:38:44Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T13:44:46Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T13:50:33Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T13:55:32Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T14:05:19Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T14:13:30Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=0
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T14:23:32Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T14:32:27Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T14:39:36Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T14:48:07Z

bot:test
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T17:29:39Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T17:33:13Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T17:40:00Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T17:47:25Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=0
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T17:57:18Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=0
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T18:02:07Z

bot:test
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=0
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T18:24:53Z

bot:test
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=0
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T18:40:00Z

TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=1
export TORCH_SENDNN_CACHE_DIR=/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T19:04:49Z

TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=0
export TORCH_SENDNN_CACHE_DIR=/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T19:05:50Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=0
export TORCH_SENDNN_CACHE_DIR=/models/wqk_tmp_delete_cache

waleedqk · 2025-08-12T19:15:47Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=0
export TORCH_SENDNN_CACHE_DIR=/models/wqk_tmp_delete_cache

waleedqk · 2025-08-14T16:47:42Z

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=1
export TORCH_SENDNN_CACHE_DIR_NAME=wqk_tmp_delete_cache_1

relax decode batch size > 1 constraint, adapt warmup

419c471

Signed-off-by: Yannick Schnider <[email protected]>

enable batch size 1 tests

2d0e675

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 changed the title ~~[CB] Support batch size 1 for decode, simplify warmup~~ [CB][do not merge] Support batch size 1 for decode, simplify warmup Jul 16, 2025

revert warmup changes.

149f96f

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 added 2 commits July 18, 2025 11:22

Merge branch 'main' into ysc-batch-1

2f11200

Signed-off-by: Yannick Schnider <[email protected]>

fix warmup for batch size 1

293c33c

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 mentioned this pull request Jul 18, 2025

[CB] Support pseudo batch size 1 for decode, adjust warmup #287

Merged

yannicks1 self-assigned this Jul 21, 2025

Merge branch 'main' into ysc-batch-1

00a2fbe

Merge branch 'main' into ysc-batch-1

0cb280a

Signed-off-by: Yannick Schnider <[email protected]>

maxdebayser reviewed Jul 29, 2025

View reviewed changes

vllm_spyre/v1/worker/spyre_worker.py Outdated Show resolved Hide resolved

Merge branch 'main' into ysc-batch-1

04ff0a9

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 mentioned this pull request Jul 30, 2025

[CB] refactoring warmup for batch size 1 #347

Merged

yannicks1 added 3 commits August 4, 2025 17:05

Merge branch 'main' into ysc-batch-1

aad5e91

setting torch.fx.experimental config

ce6cf2b

Signed-off-by: Yannick Schnider <[email protected]>

adapt warmup to do only one prefill and one decode

2104ba7

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 added 2 commits August 5, 2025 10:09

add comment, small refactor

12a045b

Signed-off-by: Yannick Schnider <[email protected]>

update comment

1efc54a

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 changed the title ~~[CB][do not merge] Support batch size 1 for decode, simplify warmup~~ [CB] Support batch size 1 for decode, simplify warmup Aug 5, 2025

yannicks1 marked this pull request as ready for review August 5, 2025 09:02

joerunde merged commit fa00675 into main Aug 13, 2025
23 checks passed

joerunde deleted the ysc-batch-1 branch August 13, 2025 15:39

yannicks1 mentioned this pull request Sep 17, 2025

[WIP][CB][FP8] fix batch size 1 #466

Draft

[CB] Support batch size 1 for decode, simplify warmup #312

[CB] Support batch size 1 for decode, simplify warmup #312

Uh oh!

Conversation

yannicks1 commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!