-
Notifications
You must be signed in to change notification settings - Fork 265
Open
Labels
bugSomething isn't workingSomething isn't working
Description
โ๏ธ Your current environment
N/A
๐ Describe the bug
I'm trying to use the example for multiple modifiers to quantize qwen3 moe with both awq and fp8, but seem to be running into an error when performing inference (Linear key is missing). I'm not sure if vllm actually supports this yet.
To try to isolate the issue, I am trying to perform AWQ on only specific layers. However, this results in vllm failing with the Linear key missing.
(EngineCore_DP0 pid=196393) ERROR 10-13 19:28:10 [core.py:708] config = self.quant_config.target_scheme_map["Linear"].get("weights")
Here is the recipe:
recipe = [
AWQModifier(
duo_scaling=False,
scheme="W4A16",
targets=[
"re:.*down_proj",
"re:.*up_proj",
"re:.*gate_proj",
],
mappings=[
AWQMapping(
r"re:.*?\.1\..*?up_proj",
["re:.*down_proj$"],
),
],
),
]Does llm compressor and/or vllm currently support this?
๐ ๏ธ Steps to reproduce
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working