-
Notifications
You must be signed in to change notification settings - Fork 26
[CB] Test continuous batching through the scheduler #199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tkv) Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
…n another finishes Signed-off-by: Sophie du Couédic <[email protected]>
👋 Hi! Thank you for contributing to vLLM support on Spyre.
Or this can be done with
Now you are good to go 🚀 |
I like it! |
Signed-off-by: Sophie du Couédic <[email protected]>
101a1c3
to
2eb3ad8
Compare
Co-authored-by: Prashant Gupta <[email protected]> Signed-off-by: Sophie du Couédic <[email protected]>
tests/e2e/test_spyre_cb.py
Outdated
# Prefill sequence 2 | ||
# TODO @Yannick, should the left padding ideally already be removed? | ||
"step": 42, | ||
"tkv": 103, # <-- should be 64? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand tkv is not 64 because this is not a new batch yet since request 1 is still running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're other comment is correct, it is about calling reduce_left_padding
tests/e2e/test_spyre_cb.py
Outdated
}, | ||
{ | ||
# Prefill sequence 2 | ||
# TODO @Yannick, should the left padding ideally already be removed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The left padding is not removed because I see the reduce_left_padding
function is called when preparing decode
and not prefill
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right! I am wondering if it is the desired behaviour, and if something prevents to have the left padding already removed in the subsequent prefill. Conceptually it makes more sense to have the padding already removed if a prefill happens (ie. call reduce_left_padding
in both decode and prefill). Also supposing the tkv is too close to max_model_len
at the time of the prefill, the prompt would need to wait 1 decode, instead of being directly scheduled if the padding reduction happens directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah, that does make sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it might be counter-intuitive since the left padding is introduced when you prepare a prefill for a new request? So ideally we could add logic while adding the padding instead of adding then reducing the padding if that makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see what you mean, that would be more code instead of just calling the reduce_left_padding
function after adding the padding. Let's see what @yannicks1 has to say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree and am working on reducing the left padding in every scheduler step (prefil and decode). Thanks for pointing this out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I guess the left padding question still remains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice work @sducouedic. Super clean tests on scheduler step level. I suggest to merge this. I am already preparing the improved reducing of the left padding on another branch and would then update the tests accordingly.
tests/e2e/test_spyre_cb.py
Outdated
}, | ||
{ | ||
# Prefill sequence 2 | ||
# TODO @Yannick, should the left padding ideally already be removed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree and am working on reducing the left padding in every scheduler step (prefil and decode). Thanks for pointing this out!
Co-authored-by: Yannick Schnider <[email protected]> Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
b34d290
to
fdc8081
Compare
Uh oh!
There was an error while loading. Please reload this page.