-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Description
We'll have to refactor all these checks once we support more quantization methods, today we only support fp8. I wouldn't mind a refactor to at least pull all of these fp8 checks into one helper instance method in the model class, but we don't need to block this fix on it
I do think this is a separate problem than figuring out which model we're serving though, because any fp8 model has to be handled separately here, not just granite specifically.
Originally posted by @joerunde in #535 (comment)
There is some more code to do this kind of check, which could be adapted and used here:
vllm-spyre/vllm_spyre/platform.py
Line 597 in de0e3d2
def is_granite_3_8b(cls, model_config: ModelConfig): def find_known_models_by_model_config(model_config: ModelConfig) -> list[str]:
Metadata
Metadata
Assignees
Labels
No labels