Skip to content

[Bug]: documentation around multiple modifiersย #1920

@koush

Description

@koush

โš™๏ธ Your current environment

N/A

๐Ÿ› Describe the bug

I'm trying to use the example for multiple modifiers to quantize qwen3 moe with both awq and fp8, but seem to be running into an error when performing inference (Linear key is missing). I'm not sure if vllm actually supports this yet.

https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_non_uniform/quantization_multiple_modifiers.py

To try to isolate the issue, I am trying to perform AWQ on only specific layers. However, this results in vllm failing with the Linear key missing.

(EngineCore_DP0 pid=196393) ERROR 10-13 19:28:10 [core.py:708]     config = self.quant_config.target_scheme_map["Linear"].get("weights")

Here is the recipe:

recipe = [
    AWQModifier(
        duo_scaling=False,
        scheme="W4A16",
        targets=[
            "re:.*down_proj",
            "re:.*up_proj",
            "re:.*gate_proj",
        ],
        mappings=[
            AWQMapping(
                r"re:.*?\.1\..*?up_proj",
                ["re:.*down_proj$"],
            ),
        ],
    ),
]

Does llm compressor and/or vllm currently support this?

๐Ÿ› ๏ธ Steps to reproduce

No response

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions