-
Notifications
You must be signed in to change notification settings - Fork 26
🐛 Fix fp8 model name check with quantization check #535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Gaurav-Kumbhat <[email protected]>
|
👋 Hi! Thank you for contributing to vLLM support on Spyre. Or this can be done with Now you are good to go 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
|
||
| # TODO: we need 2 requests for warmup on FP8+CB | ||
| is_fp8_plus_cb = 'FP8' in self.model_config.model and \ | ||
| # Check if model is quantized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about other quantizations like 4 bit or 8 bit int?
There is some more code from @joerunde and some I recently wrote to do this kind of check, which could be adapted and used here:
vllm-spyre/vllm_spyre/platform.py
Line 597 in de0e3d2
def is_granite_3_8b(cls, model_config: ModelConfig): def find_known_models_by_model_config(model_config: ModelConfig) -> list[str]:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to refactor all these checks once we support more quantization methods, today we only support fp8. I wouldn't mind a refactor to at least pull all of these fp8 checks into one helper instance method in the model class, but we don't need to block this fix on it
I do think this is a separate problem than figuring out which model we're serving though, because any fp8 model has to be handled separately here, not just granite specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a follow-up issue: #537
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm too
|
bot:test |
|
I don't believe in the bot tests any more but we might as well try to kick one off to make sure nothing barfs in a new and unusual way |
Description
Related Issues