Some fixes for AWQ #269

rahul-tuli · 2025-03-07T01:54:12Z

This aligns the logic for asymmetric quantization with the implementation found in AutoAWQ's pseudo_quantize_tensor function. This is some core logic here so we should all make sure we're in agreement with the changes before merging in.

To be reviewed/merged in conjunction with vllm-project/llm-compressor#1177

kylesayrs · 2025-03-07T15:48:25Z

src/compressed_tensors/quantization/quant_scheme.py

@@ -199,6 +199,18 @@ def is_preset_scheme(name: str) -> bool:
    ),
 )

+# AWQ quantization
+AWQ = dict(


Why is this needed?

Not needed; just nice to have a preset, easier to define a scheme in QuantizationModifier than specifying QuantizationArgs over and over

I talked to rahul about this, we can either add a "W4A16_ASYMMETRIC" preset or remove entirely. I am not a big fan of presets here, this seems like something we want the user to be explicit about, but we have a bunch of them. what do people think?

In general I'm in factor of creating generic and composable schemes rather than creating paper-specific presets. I'd rather remove this

Couldn't we just do this, and create an alias?

compressed-tensors/src/compressed_tensors/quantization/quant_scheme.py

Line 210 in b762e56

"INT8": INT8_W8A8, # alias for W8A8

I think W4A16_asymmetric makes sense with an AWQ alias

Moved to W4A16_ASYM preset, leaving off "AWQ" alias for now since it's orthogonal to AWQ, just what we've been using for lining up with the paper's table 4.

brian-dellabetta

Moving this to ready for review!

brian-dellabetta · 2025-03-10T21:49:16Z

src/compressed_tensors/quantization/quant_scheme.py

@@ -199,6 +199,18 @@ def is_preset_scheme(name: str) -> bool:
    ),
 )

+# AWQ quantization
+AWQ = dict(


I talked to rahul about this, we can either add a "W4A16_ASYMMETRIC" preset or remove entirely. I am not a big fan of presets here, this seems like something we want the user to be explicit about, but we have a bunch of them. what do people think?

brian-dellabetta · 2025-03-10T21:50:16Z

src/compressed_tensors/quantization/utils/helpers.py

-        scales = torch.clamp(scales, min=torch.finfo(torch.float32).eps)
+        scales = torch.clamp(scales, min=1e-5)


we pulled this to match up with AutoAWQ's pseudo_quantize_tensor logic exactly, but we can probably revert this line. I think it's preferable to a hard-coded 1e-5. What do people think?

I'm in favor of removing this line and using our own implementation. LC cannot simultaneously match all paper's implementations, so code-readability takes priority here

I agree.

For any change to asym, we should generate W8A8 int8 asym models and validate evals. These models are actively supported in vLLM atm.

dsikka · 2025-03-11T03:19:30Z

src/compressed_tensors/quantization/quant_scheme.py

@@ -199,6 +199,18 @@ def is_preset_scheme(name: str) -> bool:
    ),
 )

+# AWQ quantization
+AWQ = dict(


Couldn't we just do this, and create an alias?

compressed-tensors/src/compressed_tensors/quantization/quant_scheme.py

Line 210 in b762e56

"INT8": INT8_W8A8, # alias for W8A8

I think W4A16_asymmetric makes sense with an AWQ alias

dsikka · 2025-03-11T03:20:44Z

src/compressed_tensors/quantization/utils/helpers.py

-        scales = torch.clamp(scales, min=torch.finfo(torch.float32).eps)
+        scales = torch.clamp(scales, min=1e-5)


I agree.

For any change to asym, we should generate W8A8 int8 asym models and validate evals. These models are actively supported in vLLM atm.

Signed-off-by: Brian Dellabetta <[email protected]>

Some fixes for AWQ

5f97feb

kylesayrs reviewed Mar 7, 2025

View reviewed changes

brian-dellabetta mentioned this pull request Mar 10, 2025

Bdellabe/Rtuli awq modifier v3 vllm-project/llm-compressor#1177

Open

brian-dellabetta marked this pull request as ready for review March 10, 2025 21:46

brian-dellabetta requested review from brian-dellabetta, markurtz and dsikka March 10, 2025 21:46

brian-dellabetta reviewed Mar 10, 2025

View reviewed changes

dsikka requested changes Mar 11, 2025

View reviewed changes

brian-dellabetta added 2 commits March 11, 2025 20:18

revert clamp to 1e-5

99df681

Signed-off-by: Brian Dellabetta <[email protected]>

rename awq quant preset to W4A16_ASYM

909fd32

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta requested review from kylesayrs and dsikka March 11, 2025 20:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some fixes for AWQ #269

Some fixes for AWQ #269

rahul-tuli commented Mar 7, 2025 •

edited by brian-dellabetta

Loading

kylesayrs Mar 7, 2025

rahul-tuli Mar 7, 2025

brian-dellabetta Mar 10, 2025 •

edited

Loading

kylesayrs Mar 10, 2025

dsikka Mar 11, 2025 •

edited

Loading

brian-dellabetta Mar 11, 2025

brian-dellabetta left a comment

brian-dellabetta Mar 10, 2025 •

edited

Loading

brian-dellabetta Mar 10, 2025 •

edited

Loading

kylesayrs Mar 10, 2025

dsikka Mar 11, 2025 •

edited

Loading

dsikka Mar 11, 2025 •

edited

Loading

dsikka Mar 11, 2025 •

edited

Loading

		scales = torch.clamp(scales, min=torch.finfo(torch.float32).eps)
		scales = torch.clamp(scales, min=1e-5)

Some fixes for AWQ #269

Are you sure you want to change the base?

Some fixes for AWQ #269

Conversation

rahul-tuli commented Mar 7, 2025 • edited by brian-dellabetta Loading

kylesayrs Mar 7, 2025

Choose a reason for hiding this comment

rahul-tuli Mar 7, 2025

Choose a reason for hiding this comment

brian-dellabetta Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

kylesayrs Mar 10, 2025

Choose a reason for hiding this comment

dsikka Mar 11, 2025 • edited Loading

Choose a reason for hiding this comment

brian-dellabetta Mar 11, 2025

Choose a reason for hiding this comment

brian-dellabetta left a comment

Choose a reason for hiding this comment

brian-dellabetta Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

brian-dellabetta Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

kylesayrs Mar 10, 2025

Choose a reason for hiding this comment

dsikka Mar 11, 2025 • edited Loading

Choose a reason for hiding this comment

dsikka Mar 11, 2025 • edited Loading

Choose a reason for hiding this comment

dsikka Mar 11, 2025 • edited Loading

Choose a reason for hiding this comment

rahul-tuli commented Mar 7, 2025 •

edited by brian-dellabetta

Loading

brian-dellabetta Mar 10, 2025 •

edited

Loading

dsikka Mar 11, 2025 •

edited

Loading

brian-dellabetta Mar 10, 2025 •

edited

Loading

brian-dellabetta Mar 10, 2025 •

edited

Loading

dsikka Mar 11, 2025 •

edited

Loading

dsikka Mar 11, 2025 •

edited

Loading

dsikka Mar 11, 2025 •

edited

Loading