-
Notifications
You must be signed in to change notification settings - Fork 26
🐛 implement better checking for granite #500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
👋 Hi! Thank you for contributing to vLLM support on Spyre.
Or this can be done with
Now you are good to go 🚀 |
Signed-off-by: Joe Runde <[email protected]>
# TODO: replace the hard coded NUM_BLOCKS_SPYRE by calling a function | ||
# in torch_sendnn which returns the value set by the Spyre compiler. | ||
if ('granite-3.3-8b-instruct' in self.model_config.model | ||
if (SpyrePlatform.is_granite_33_8b(self.model_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we have to account for other granite models down the road
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very likely, given current timeframes for model support
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
bot:test |
bot:test |
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Co-authored-by: Travis Johnson <[email protected]> Signed-off-by: Joe Runde <[email protected]>
Co-authored-by: Travis Johnson <[email protected]> Signed-off-by: Joe Runde <[email protected]>
13c0877
to
09735e8
Compare
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
bot:test |
@gkumbhat @tjohnson31415 I verified this with:
Seems to work! Logs are as expected as well, showing
which is 2080 * 64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Fix granite-3.3-8b detection
Fixes #498
This PR updates the logic that checks for granite-3.3-8b-instruct. Instead of matching on the model name, it checks the hf config type and checks for specific values related to the 8b granite variant.
This also consolidates all of the granite-specific overrides to one method inside
platform.py
. This should make it much easier to both:in the future.
The number of blocks override for the spyre kv cache size was particularly interesting. We were in a halfway state where we took overriddes from
--num-gpu-blocks-override
and saved them off to platform.py and then used the override to change the scheduler's behavior. I've updated things so that the scheduler no longer needs that extra override so that we can simply use the configured override everywhere. This is more robust since the config value is serialized and given to subprocesses, and more maintainable than managing two separate numbers for this purpose.