Skip to content

Refactor checks for quantization methods #537

@ckadner

Description

@ckadner

We'll have to refactor all these checks once we support more quantization methods, today we only support fp8. I wouldn't mind a refactor to at least pull all of these fp8 checks into one helper instance method in the model class, but we don't need to block this fix on it

I do think this is a separate problem than figuring out which model we're serving though, because any fp8 model has to be handled separately here, not just granite specifically.

Originally posted by @joerunde in #535 (comment)

There is some more code to do this kind of check, which could be adapted and used here:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions