Skip to content

Commit

Permalink
update exmple
Browse files Browse the repository at this point in the history
  • Loading branch information
dsikka committed Sep 18, 2024
1 parent b8e3697 commit 1bc22e8
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion examples/quantization_w4a16/llama3_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

# Select model and load it.
MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
MODEL_ID = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
model = SparseAutoModelForCausalLM.from_pretrained(
MODEL_ID,
device_map="auto",
Expand Down Expand Up @@ -54,7 +55,6 @@ def tokenize(sample):

# Configure the quantization algorithm to run.
# * quantize the weights to 4 bit with GPTQ with a group size 128
# Note: to reduce GPU memory use `sequential_update=False`
recipe = GPTQModifier(targets="Linear", scheme="W4A16", ignore=["lm_head"])

# Apply algorithms.
Expand Down

0 comments on commit 1bc22e8

Please sign in to comment.