-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some fixes for AWQ #269
base: main
Are you sure you want to change the base?
Some fixes for AWQ #269
Conversation
@@ -199,6 +199,18 @@ def is_preset_scheme(name: str) -> bool: | |||
), | |||
) | |||
|
|||
# AWQ quantization | |||
AWQ = dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed; just nice to have a preset, easier to define a scheme in QuantizationModifier than specifying QuantizationArgs over and over
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked to rahul about this, we can either add a "W4A16_ASYMMETRIC"
preset or remove entirely. I am not a big fan of presets here, this seems like something we want the user to be explicit about, but we have a bunch of them. what do people think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I'm in factor of creating generic and composable schemes rather than creating paper-specific presets. I'd rather remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we just do this, and create an alias?
"INT8": INT8_W8A8, # alias for W8A8 |
I think W4A16_asymmetric makes sense with an AWQ alias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to W4A16_ASYM
preset, leaving off "AWQ"
alias for now since it's orthogonal to AWQ, just what we've been using for lining up with the paper's table 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this to ready for review!
@@ -199,6 +199,18 @@ def is_preset_scheme(name: str) -> bool: | |||
), | |||
) | |||
|
|||
# AWQ quantization | |||
AWQ = dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked to rahul about this, we can either add a "W4A16_ASYMMETRIC"
preset or remove entirely. I am not a big fan of presets here, this seems like something we want the user to be explicit about, but we have a bunch of them. what do people think?
scales = torch.clamp(scales, min=torch.finfo(torch.float32).eps) | ||
scales = torch.clamp(scales, min=1e-5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we pulled this to match up with AutoAWQ's pseudo_quantize_tensor
logic exactly, but we can probably revert this line. I think it's preferable to a hard-coded 1e-5
. What do people think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of removing this line and using our own implementation. LC cannot simultaneously match all paper's implementations, so code-readability takes priority here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
For any change to asym, we should generate W8A8 int8 asym models and validate evals. These models are actively supported in vLLM atm.
@@ -199,6 +199,18 @@ def is_preset_scheme(name: str) -> bool: | |||
), | |||
) | |||
|
|||
# AWQ quantization | |||
AWQ = dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we just do this, and create an alias?
"INT8": INT8_W8A8, # alias for W8A8 |
I think W4A16_asymmetric makes sense with an AWQ alias
scales = torch.clamp(scales, min=torch.finfo(torch.float32).eps) | ||
scales = torch.clamp(scales, min=1e-5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
For any change to asym, we should generate W8A8 int8 asym models and validate evals. These models are actively supported in vLLM atm.
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
This aligns the logic for asymmetric quantization with the implementation found in AutoAWQ's
pseudo_quantize_tensor
function. This is some core logic here so we should all make sure we're in agreement with the changes before merging in.To be reviewed/merged in conjunction with vllm-project/llm-compressor#1177