Skip to content

Conversation

@yannicks1
Copy link
Collaborator

@yannicks1 yannicks1 commented Jul 16, 2025

[CB] Support batch size 1 for decode, simplify warmup

As we moved to torch 2.7.1 in #307 , dynamic dimension of size 1 are supported by pytorch. Hence, batch size 1 for decode produces the same graph as batch size >= 2.
This PR relaxes the min batch size 2 constraint for decode and adapts the warmup. Previously warmup consisted of two prefills and one decode of batch size 2. The new warmup only features one prefill and one decode of batch size 1.

@github-actions
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

@yannicks1
Copy link
Collaborator Author

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

Signed-off-by: Yannick Schnider <[email protected]>
@yannicks1 yannicks1 changed the title [CB] Support batch size 1 for decode, simplify warmup [CB][do not merge] Support batch size 1 for decode, simplify warmup Jul 16, 2025
@yannicks1
Copy link
Collaborator Author

bot test failed in warmup decode.

Signed-off-by: Yannick Schnider <[email protected]>
@yannicks1
Copy link
Collaborator Author

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

@yannicks1
Copy link
Collaborator Author

Note: CPU failures is expected for BS 1 (didnt adapt warmup as in #287 )

Spyre card: reverting the warmup changes results in a runtime error: compile graph failed

@yannicks1
Copy link
Collaborator Author

Looks like batch size 1 for decode is not supported by the compiler yet... Priority of this is low as performance advantage is marginal paired with a limited use case.

Signed-off-by: Yannick Schnider <[email protected]>
@yannicks1
Copy link
Collaborator Author

update: i tried Joshs suggestion, so far without success

@yannicks1
Copy link
Collaborator Author

Note: as soon as we get this version working, I will redo the reverted warmup changes. In theory we should get away with only one prefill batch size 1 and one decode batch size 1. I did revert warmup changes (back to batch size 2 for decode as it is on main) as a stepping stone to get this working.

joerunde pushed a commit that referenced this pull request Jul 30, 2025
### [CB] refactoring warmup for batch size 1

From
#312 (comment)
there is a request for a nicer integration of batch size 1 support
during warmup. Most of the code is already on main, thus this PR.

Signed-off-by: Yannick Schnider <[email protected]>
@yannicks1
Copy link
Collaborator Author

bot:test
MARKERS="spyre and not quantized"

Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
@yannicks1 yannicks1 changed the title [CB][do not merge] Support batch size 1 for decode, simplify warmup [CB] Support batch size 1 for decode, simplify warmup Aug 5, 2025
@yannicks1 yannicks1 marked this pull request as ready for review August 5, 2025 09:02
@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=0

2 similar comments
@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=0

@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=0

@joerunde
Copy link
Collaborator

We may need to think about this for a bit- if this is causing problems with the cache then merging it will cause all subsequent PRs to fail testing as well until we release and use the new version to populate the cache every day. (Or we'd have to run the tests without caching, which is also not fun)

Maybe we should sync a bit with @JRosenkranz and determine if this is expected behavior. Are the graph comparison tests with aftu still passing or do they show different graphs now?

@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=0
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

3 similar comments
@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=0
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

2 similar comments
@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

@waleedqk
Copy link
Collaborator

bot:test
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

2 similar comments
@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=1
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=0
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

1 similar comment
@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=0
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

@waleedqk
Copy link
Collaborator

bot:test
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=0
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

1 similar comment
@waleedqk
Copy link
Collaborator

bot:test
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=0
export TORCH_SENDNN_CACHE_DIR=/home/senuser/models/wqk_tmp_delete_cache

@waleedqk
Copy link
Collaborator

TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=1
export TORCH_SENDNN_CACHE_DIR=/models/wqk_tmp_delete_cache

@waleedqk
Copy link
Collaborator

TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=0
export TORCH_SENDNN_CACHE_DIR=/models/wqk_tmp_delete_cache

@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=0
export TORCH_SENDNN_CACHE_DIR=/models/wqk_tmp_delete_cache

1 similar comment
@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=0
export TORCH_SENDNN_CACHE_DIR=/models/wqk_tmp_delete_cache

@joerunde joerunde merged commit fa00675 into main Aug 13, 2025
23 checks passed
@joerunde joerunde deleted the ysc-batch-1 branch August 13, 2025 15:39
@waleedqk
Copy link
Collaborator

bot:test
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre"
export TORCH_SENDNN_CACHE_ENABLE=1
export TORCH_SENDNN_CACHE_CLEAR=1
export TORCH_SENDNN_CACHE_DIR_NAME=wqk_tmp_delete_cache_1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants