Fix AOPerModuleConfig name changes #18869

jerryzh168 · 2025-05-29T01:00:58Z

Summary:
also fixed float8 and int4 tests

Test Plan:
python test/quantization/test_torchao.py

Reviewers:

Subscribers:

Tasks:

Tags:

github-actions · 2025-05-29T01:01:06Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

houseroad · 2025-05-29T01:37:43Z

tests/quantization/test_torchao.py

@@ -12,7 +12,7 @@

 @pytest.mark.skipif(not TORCHAO_AVAILABLE, reason="torchao is not available")
 def test_pre_quantized_model(vllm_runner):
-    with vllm_runner("drisspg/float8_dynamic_act_float8_weight-opt-125m",
+    with vllm_runner("drisspg/fp8-opt-125m",


can we have a unified place to store all the models, :-)

any recommendations? other quantization methods are also using random places:

vllm/tests/quantization/test_bitsandbytes.py

Lines 18 to 38 in 1661a9c

models_4bit_to_test = [

("facebook/opt-125m", "quantize opt model inflight"),

("mistralai/Mistral-7B-Instruct-v0.3",

"quantize inflight model with both HF and Mistral format weights")

]

models_4bit_to_embedding_test = [

("intfloat/e5-mistral-7b-instruct", "quantize embedding model inflight"),

]

models_pre_qaunt_4bit_to_test = [

('PrunaAI/Einstein-v6.1-Llama3-8B-bnb-4bit-smashed',

'read pre-quantized 4-bit FP4 model'),

('poedator/opt-125m-bnb-4bit', 'read pre-quantized 4-bit NF4 opt model'),

]

models_pre_quant_8bit_to_test = [

('meta-llama/Llama-Guard-3-8B-INT8',

'read pre-quantized llama 8-bit model'),

("yec019/fbopt-350m-8bit", "read pre-quantized 8-bit opt model"),

]

how about having a central repo, like torchao/fp8-opt-125m?

just for test models? I feel might be a bit overkill

We do release official torchao models under pytorch, e.g.: https://huggingface.co/collections/pytorch/torchao-quantized-phi-4-mini-instruct-681566f123acc6fed345cb1a

vllm/model_executor/layers/quantization/torchao.py

jerryzh168 · 2025-05-30T00:16:49Z

can you help merge this @mgoin

the broken checks does not look relevant

Summary: also fixed float8 and int4 tests Test Plan: python test/quantization/test_torchao.py Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Jerry Zhang <[email protected]>

Signed-off-by: Jerry Zhang <[email protected]>

DarkLight1337 · 2025-05-30T07:41:09Z

Can you merge from main to fix the CI failure?

houseroad · 2025-05-30T07:55:46Z

vllm/model_executor/layers/quantization/torchao.py

+        # to enable proper caching this needs standalone compile
+        # os.environ["VLLM_TEST_STANDALONE_COMPILE"] = "1"
+        # logger.info("Using TorchAO: Setting VLLM_TEST_STANDALONE_COMPILE=1")
+        os.environ["VLLM_DISABLE_COMPILE_CACHE"] = "1"


we can check the torch version, something like "if is_torch_equal_or_newer("2.8.0")"

houseroad

We should avoid such BC-breaking changes in TorchAO :-)

houseroad · 2025-05-30T07:59:20Z

Besides rebasing to main, could you also address inline comment?

jerryzh168 · 2025-05-30T16:59:12Z

We should avoid such BC-breaking changes in TorchAO :-)

yeah we'll make sure not to break BC and fix the callsite first next

houseroad · 2025-06-03T02:59:00Z

There are some failures, could you take a look?

Signed-off-by: Jerry Zhang <[email protected]>

DarkLight1337 · 2025-06-03T14:21:31Z

Try merging from main branch and see if the CI failures are resolved

mergify · 2025-06-03T18:22:40Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jerryzh168.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

jerryzh168 requested review from mgoin, robertgshaw2-redhat and tlrmchlsmth as code owners May 29, 2025 01:00

houseroad reviewed May 29, 2025

View reviewed changes

vllm/model_executor/layers/quantization/torchao.py Outdated Show resolved Hide resolved

jerryzh168 force-pushed the fix-ao-per-module-config-name branch from 959f5f2 to f62f7b3 Compare May 29, 2025 02:55

mgoin approved these changes May 30, 2025

View reviewed changes

mgoin added quantization ready ONLY add when PR is ready to merge/full CI is needed labels May 30, 2025

jerryzh168 added 2 commits May 29, 2025 20:47

Fix AOPerModuleConfig name changes

c8e32d6

Summary: also fixed float8 and int4 tests Test Plan: python test/quantization/test_torchao.py Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Jerry Zhang <[email protected]>

format

d65d676

Signed-off-by: Jerry Zhang <[email protected]>

jerryzh168 force-pushed the fix-ao-per-module-config-name branch from 4fab075 to 0d2f4cb Compare May 30, 2025 03:47

mergify bot added the ci/build label May 30, 2025

houseroad reviewed May 30, 2025

View reviewed changes

houseroad approved these changes May 30, 2025

View reviewed changes

jerryzh168 force-pushed the fix-ao-per-module-config-name branch 3 times, most recently from 3440f4c to 3bded6f Compare May 30, 2025 22:02

include torchao in CI

53d0a63

Signed-off-by: Jerry Zhang <[email protected]>

jerryzh168 force-pushed the fix-ao-per-module-config-name branch from ef6813d to 53d0a63 Compare June 3, 2025 05:00

mergify bot added the needs-rebase label Jun 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix AOPerModuleConfig name changes #18869

Fix AOPerModuleConfig name changes #18869

Uh oh!

jerryzh168 commented May 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 29, 2025

Uh oh!

houseroad May 29, 2025

Uh oh!

jerryzh168 May 29, 2025

Uh oh!

houseroad May 29, 2025

Uh oh!

jerryzh168 May 29, 2025

Uh oh!

Uh oh!

jerryzh168 commented May 30, 2025

Uh oh!

DarkLight1337 commented May 30, 2025

Uh oh!

houseroad May 30, 2025

Uh oh!

houseroad left a comment

Uh oh!

houseroad commented May 30, 2025

Uh oh!

jerryzh168 commented May 30, 2025

Uh oh!

houseroad commented Jun 3, 2025

Uh oh!

DarkLight1337 commented Jun 3, 2025

Uh oh!

mergify bot commented Jun 3, 2025

Uh oh!

Uh oh!

	models_4bit_to_test = [
	("facebook/opt-125m", "quantize opt model inflight"),
	("mistralai/Mistral-7B-Instruct-v0.3",
	"quantize inflight model with both HF and Mistral format weights")
	]

	models_4bit_to_embedding_test = [
	("intfloat/e5-mistral-7b-instruct", "quantize embedding model inflight"),
	]

	models_pre_qaunt_4bit_to_test = [
	('PrunaAI/Einstein-v6.1-Llama3-8B-bnb-4bit-smashed',
	'read pre-quantized 4-bit FP4 model'),
	('poedator/opt-125m-bnb-4bit', 'read pre-quantized 4-bit NF4 opt model'),
	]

	models_pre_quant_8bit_to_test = [
	('meta-llama/Llama-Guard-3-8B-INT8',
	'read pre-quantized llama 8-bit model'),
	("yec019/fbopt-350m-8bit", "read pre-quantized 8-bit opt model"),
	]

Uh oh!

Fix AOPerModuleConfig name changes #18869

Are you sure you want to change the base?

Fix AOPerModuleConfig name changes #18869

Uh oh!

Conversation

jerryzh168 commented May 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 29, 2025

Uh oh!

houseroad May 29, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 May 29, 2025

Choose a reason for hiding this comment

Uh oh!

houseroad May 29, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 May 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jerryzh168 commented May 30, 2025

Uh oh!

DarkLight1337 commented May 30, 2025

Uh oh!

houseroad May 30, 2025

Choose a reason for hiding this comment

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

houseroad commented May 30, 2025

Uh oh!

jerryzh168 commented May 30, 2025

Uh oh!

houseroad commented Jun 3, 2025

Uh oh!

DarkLight1337 commented Jun 3, 2025

Uh oh!

mergify bot commented Jun 3, 2025

Uh oh!

Uh oh!

jerryzh168 commented May 29, 2025 •

edited by github-actions bot

Loading