-
Notifications
You must be signed in to change notification settings - Fork 26
[CB] Support batch size 1 for decode, simplify warmup #312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yannick Schnider <[email protected]>
|
👋 Hi! Thank you for contributing to vLLM support on Spyre. Or this can be done with Now you are good to go 🚀 |
|
bot:test |
Signed-off-by: Yannick Schnider <[email protected]>
|
bot test failed in warmup decode. |
Signed-off-by: Yannick Schnider <[email protected]>
|
bot:test |
|
Note: CPU failures is expected for BS 1 (didnt adapt warmup as in #287 ) Spyre card: reverting the warmup changes results in a runtime error: |
|
Looks like batch size 1 for decode is not supported by the compiler yet... Priority of this is low as performance advantage is marginal paired with a limited use case. |
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
|
update: i tried Joshs suggestion, so far without success |
|
Note: as soon as we get this version working, I will redo the reverted warmup changes. In theory we should get away with only one prefill batch size 1 and one decode batch size 1. I did revert warmup changes (back to batch size 2 for decode as it is on main) as a stepping stone to get this working. |
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
### [CB] refactoring warmup for batch size 1 From #312 (comment) there is a request for a nicer integration of batch size 1 support during warmup. Most of the code is already on main, thus this PR. Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
|
bot:test |
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
|
bot:test |
2 similar comments
|
bot:test |
|
bot:test |
|
We may need to think about this for a bit- if this is causing problems with the cache then merging it will cause all subsequent PRs to fail testing as well until we release and use the new version to populate the cache every day. (Or we'd have to run the tests without caching, which is also not fun) Maybe we should sync a bit with @JRosenkranz and determine if this is expected behavior. Are the graph comparison tests with aftu still passing or do they show different graphs now? |
|
bot:test |
|
bot:test |
3 similar comments
|
bot:test |
|
bot:test |
|
bot:test |
|
bot:test |
|
bot:test |
|
bot:test |
2 similar comments
|
bot:test |
|
bot:test |
|
bot:test |
|
bot:test |
2 similar comments
|
bot:test |
|
bot:test |
|
bot:test |
1 similar comment
|
bot:test |
|
bot:test |
1 similar comment
|
bot:test |
|
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre" |
|
TEST_FILE=tests/e2e/test_spyre_cb_scheduler_steps.py MARKERS="spyre" |
|
bot:test |
1 similar comment
|
bot:test |
|
bot:test |
[CB] Support batch size 1 for decode, simplify warmup
As we moved to torch 2.7.1 in #307 , dynamic dimension of size 1 are supported by pytorch. Hence, batch size 1 for decode produces the same graph as batch size >= 2.
This PR relaxes the min batch size 2 constraint for decode and adapts the warmup. Previously warmup consisted of two prefills and one decode of batch size 2. The new warmup only features one prefill and one decode of batch size 1.