Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some fixes for AWQ #269

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Some fixes for AWQ #269

wants to merge 3 commits into from

Conversation

rahul-tuli
Copy link
Member

@rahul-tuli rahul-tuli commented Mar 7, 2025

This aligns the logic for asymmetric quantization with the implementation found in AutoAWQ's pseudo_quantize_tensor function. This is some core logic here so we should all make sure we're in agreement with the changes before merging in.

To be reviewed/merged in conjunction with vllm-project/llm-compressor#1177

@@ -199,6 +199,18 @@ def is_preset_scheme(name: str) -> bool:
),
)

# AWQ quantization
AWQ = dict(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed; just nice to have a preset, easier to define a scheme in QuantizationModifier than specifying QuantizationArgs over and over

Copy link
Contributor

@brian-dellabetta brian-dellabetta Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to rahul about this, we can either add a "W4A16_ASYMMETRIC" preset or remove entirely. I am not a big fan of presets here, this seems like something we want the user to be explicit about, but we have a bunch of them. what do people think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I'm in factor of creating generic and composable schemes rather than creating paper-specific presets. I'd rather remove this

Copy link
Contributor

@dsikka dsikka Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we just do this, and create an alias?

I think W4A16_asymmetric makes sense with an AWQ alias

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to W4A16_ASYM preset, leaving off "AWQ" alias for now since it's orthogonal to AWQ, just what we've been using for lining up with the paper's table 4.

Copy link
Contributor

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to ready for review!

@@ -199,6 +199,18 @@ def is_preset_scheme(name: str) -> bool:
),
)

# AWQ quantization
AWQ = dict(
Copy link
Contributor

@brian-dellabetta brian-dellabetta Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to rahul about this, we can either add a "W4A16_ASYMMETRIC" preset or remove entirely. I am not a big fan of presets here, this seems like something we want the user to be explicit about, but we have a bunch of them. what do people think?

Comment on lines 82 to 83
scales = torch.clamp(scales, min=torch.finfo(torch.float32).eps)
scales = torch.clamp(scales, min=1e-5)
Copy link
Contributor

@brian-dellabetta brian-dellabetta Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we pulled this to match up with AutoAWQ's pseudo_quantize_tensor logic exactly, but we can probably revert this line. I think it's preferable to a hard-coded 1e-5. What do people think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of removing this line and using our own implementation. LC cannot simultaneously match all paper's implementations, so code-readability takes priority here

Copy link
Contributor

@dsikka dsikka Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

For any change to asym, we should generate W8A8 int8 asym models and validate evals. These models are actively supported in vLLM atm.

@@ -199,6 +199,18 @@ def is_preset_scheme(name: str) -> bool:
),
)

# AWQ quantization
AWQ = dict(
Copy link
Contributor

@dsikka dsikka Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we just do this, and create an alias?

I think W4A16_asymmetric makes sense with an AWQ alias

Comment on lines 82 to 83
scales = torch.clamp(scales, min=torch.finfo(torch.float32).eps)
scales = torch.clamp(scales, min=1e-5)
Copy link
Contributor

@dsikka dsikka Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

For any change to asym, we should generate W8A8 int8 asym models and validate evals. These models are actively supported in vLLM atm.

Signed-off-by: Brian Dellabetta <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants