-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Labels
help wantedExtra attention is neededExtra attention is needed
Description
Granite MoE uses a 3D tensor to hold the expert weights, so GPTQModel does not work out of the box.
There are two options
- module swap
GraniteMoeParallelExpertsto hold a ModuleList of Linears, then AutoGPTQ will be able to detect them and replace them withQuantLinears - write a custom gptq module that handles the
GraniteMoeParallelExpertscase
Either of the two approaches will solve both quant + inference paths. Option 1 should be easier than Option 2, but in some sense Option 2 should be more proper.
When doing option 2 we should be reusing code from the original gptq.
- also it should be written generally, to not just only handle this particular
GraniteMoeParallelExpertsinstance, but all cases with 3D tensors
Metadata
Metadata
Assignees
Labels
help wantedExtra attention is neededExtra attention is needed