-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Fix AOPerModuleConfig name changes #18869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix AOPerModuleConfig name changes #18869
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
@@ -12,7 +12,7 @@ | |||
|
|||
@pytest.mark.skipif(not TORCHAO_AVAILABLE, reason="torchao is not available") | |||
def test_pre_quantized_model(vllm_runner): | |||
with vllm_runner("drisspg/float8_dynamic_act_float8_weight-opt-125m", | |||
with vllm_runner("drisspg/fp8-opt-125m", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have a unified place to store all the models, :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any recommendations? other quantization methods are also using random places:
vllm/tests/quantization/test_bitsandbytes.py
Lines 18 to 38 in 1661a9c
models_4bit_to_test = [ | |
("facebook/opt-125m", "quantize opt model inflight"), | |
("mistralai/Mistral-7B-Instruct-v0.3", | |
"quantize inflight model with both HF and Mistral format weights") | |
] | |
models_4bit_to_embedding_test = [ | |
("intfloat/e5-mistral-7b-instruct", "quantize embedding model inflight"), | |
] | |
models_pre_qaunt_4bit_to_test = [ | |
('PrunaAI/Einstein-v6.1-Llama3-8B-bnb-4bit-smashed', | |
'read pre-quantized 4-bit FP4 model'), | |
('poedator/opt-125m-bnb-4bit', 'read pre-quantized 4-bit NF4 opt model'), | |
] | |
models_pre_quant_8bit_to_test = [ | |
('meta-llama/Llama-Guard-3-8B-INT8', | |
'read pre-quantized llama 8-bit model'), | |
("yec019/fbopt-350m-8bit", "read pre-quantized 8-bit opt model"), | |
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about having a central repo, like torchao/fp8-opt-125m?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for test models? I feel might be a bit overkill
We do release official torchao models under pytorch, e.g.: https://huggingface.co/collections/pytorch/torchao-quantized-phi-4-mini-instruct-681566f123acc6fed345cb1a
959f5f2
to
f62f7b3
Compare
can you help merge this @mgoin the broken checks does not look relevant |
Summary: also fixed float8 and int4 tests Test Plan: python test/quantization/test_torchao.py Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Jerry Zhang <[email protected]>
Signed-off-by: Jerry Zhang <[email protected]>
4fab075
to
0d2f4cb
Compare
Can you merge from main to fix the CI failure? |
# to enable proper caching this needs standalone compile | ||
# os.environ["VLLM_TEST_STANDALONE_COMPILE"] = "1" | ||
# logger.info("Using TorchAO: Setting VLLM_TEST_STANDALONE_COMPILE=1") | ||
os.environ["VLLM_DISABLE_COMPILE_CACHE"] = "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can check the torch version, something like "if is_torch_equal_or_newer("2.8.0")"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid such BC-breaking changes in TorchAO :-)
Besides rebasing to main, could you also address inline comment? |
yeah we'll make sure not to break BC and fix the callsite first next |
3440f4c
to
3bded6f
Compare
There are some failures, could you take a look? |
Signed-off-by: Jerry Zhang <[email protected]>
ef6813d
to
53d0a63
Compare
Try merging from main branch and see if the CI failures are resolved |
This pull request has merge conflicts that must be resolved before it can be |
Summary:
also fixed float8 and int4 tests
Test Plan:
python test/quantization/test_torchao.py
Reviewers:
Subscribers:
Tasks:
Tags: