Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about other quantizations like 4 bit or 8 bit
int?There is some more code from @joerunde and some I recently wrote to do this kind of check, which could be adapted and used here:
vllm-spyre/vllm_spyre/platform.py
Line 597 in de0e3d2
vllm-spyre/vllm_spyre/config/runtime_config_validator.py
Line 139 in de0e3d2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to refactor all these checks once we support more quantization methods, today we only support fp8. I wouldn't mind a refactor to at least pull all of these fp8 checks into one helper instance method in the model class, but we don't need to block this fix on it
I do think this is a separate problem than figuring out which model we're serving though, because any fp8 model has to be handled separately here, not just granite specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a follow-up issue: #537