Refactor checks for quantization methods

We'll have to refactor all these checks once we support more quantization methods, today we only support fp8. I wouldn't mind a refactor to at least pull all of these fp8 checks into one helper instance method in the model class, but we don't need to block this fix on it 

I do think this is a separate problem than figuring out which model we're serving though, because any fp8 model has to be handled separately here, not just granite specifically.

_Originally posted by @joerunde in https://github.com/vllm-project/vllm-spyre/pull/535#discussion_r2437894307_
            

There is some more code to do this kind of check, which could be adapted and used here:

- https://github.com/vllm-project/vllm-spyre/blob/de0e3d2c6e18ad21baca4833d947e1f9f5df82f8/vllm_spyre/platform.py#L597
- https://github.com/vllm-project/vllm-spyre/blob/de0e3d2c6e18ad21baca4833d947e1f9f5df82f8/vllm_spyre/config/runtime_config_validator.py#L139

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor checks for quantization methods #537

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor checks for quantization methods #537

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions