Skip to content

Enable GPTQModel to handle GraniteMoeParallelExpertsΒ #112

@fabianlim

Description

@fabianlim

Granite MoE uses a 3D tensor to hold the expert weights, so GPTQModel does not work out of the box.

There are two options

  1. module swap GraniteMoeParallelExperts to hold a ModuleList of Linears, then AutoGPTQ will be able to detect them and replace them with QuantLinears
  2. write a custom gptq module that handles the GraniteMoeParallelExperts case

Either of the two approaches will solve both quant + inference paths. Option 1 should be easier than Option 2, but in some sense Option 2 should be more proper.

When doing option 2 we should be reusing code from the original gptq.

  • also it should be written generally, to not just only handle this particular GraniteMoeParallelExperts instance, but all cases with 3D tensors

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions