Skip to content

Commit 84899e6

Browse files
committed
Turn off 2:4 sparse compression until supported in vllm (#1092)
This PR temporarily disables the newly added Sparse24 compression feature in example script, as support for this feature is not yet available in vLLM. Support for Sparse24 compression is being added in vLLM via [this PR](vllm-project/vllm#12097). Once that PR is merged, this change will be reverted to re-enable the feature. Signed-off-by: Rahul Tuli <[email protected]>
1 parent a82c9e7 commit 84899e6

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -116,5 +116,7 @@ def get_recipe(fp8_enabled):
116116
print("==========================================\n")
117117

118118
# Save compressed model and tokenizer
119-
model.save_pretrained(save_dir, save_compressed=args.fp8)
119+
model.save_pretrained(
120+
save_dir, save_compressed=args.fp8, disable_sparse_compression=True
121+
)
120122
tokenizer.save_pretrained(save_dir)

0 commit comments

Comments
 (0)