Remove SparseAutoModelForCausalLM #832

horheynm · 2024-10-09T16:46:38Z

SUMMARY:
Goal is to get rid of SparseAutoModelForCausalLM - users can still use but a deprecation warning pops up
Class removal causes logic change in the saving.
Two main pathway of saving - fsdp and save_pretrained - both are wrapped on save_pretrained.

Other changes:

oneshot no longer saves the models, hence example code has output_dir argument removed.
Deprecation of SparseAutoModel, bc it uses SparseAutoModelForCausalLM to load models

TEST PLAN:

Example code dry run for small, short-run time models and making sure the saved files are same as main
Example runs for the below cases.

baseline case:

from transformers import AutoModelForCausalLM, AutoTokenizer

from llmcompressor.modifiers.quantization import QuantizationModifier
from llmcompressor.transformers import oneshot

# Load model.
MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID, device_map="auto", torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

recipe = QuantizationModifier(
    targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"]
)

# Apply quantization.
# NOTE: This is where we wrap save_pretrained
oneshot(model=model, recipe=recipe)

# Save to disk in compressed-tensors format.
SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic"
model.save_pretrained(SAVE_DIR)
tokenizer.save_pretrained(SAVE_DIR)

vision:

rom transformers import (
    AutoModelForCausalLM,
    AutoProcessor,
    MllamaForConditionalGeneration,
    AutoTokenizer,
)

from llmcompressor.modifiers.quantization import QuantizationModifier
from llmcompressor.transformers import oneshot

# Load model.
MODEL_ID = "meta-llama/Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(
    MODEL_ID, device_map="auto", torch_dtype="auto"
)
# model = AutoModelForCausalLM.from_pretrained(
#     MODEL_ID, device_map="auto", torch_dtype="auto"
# )
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

processor = AutoProcessor.from_pretrained(MODEL_ID)

# Configure the quantization algorithm and scheme.
# In this case, we:
#   * quantize the weights to fp8 with per channel via ptq
#   * quantize the activations to fp8 with dynamic per token
recipe = QuantizationModifier(
    targets="Linear",
    # scheme="FP8_DYNAMIC",
    scheme="FP8",
    ignore=["re:.*lm_head", "re:multi_modal_projector.*", "re:vision_model.*"],
)

# Apply quantization and save to disk in compressed-tensors format.
SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-model_save-MllamaForConditionalGeneration"
# SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic-foo"

oneshot(
    model=model,
    recipe=recipe,
    output_dir=SAVE_DIR,
    dataset="open_platypus",
    num_calibration_samples=2,
)
processor.save_pretrained(SAVE_DIR)
model.save_pretrained(SAVE_DIR)

bert:

from transformers import AutoModelForCausalLM, AutoTokenizer

from llmcompressor.modifiers.quantization import QuantizationModifier
from llmcompressor.transformers import oneshot

# Load model.
MODEL_ID = "google-bert/bert-base-uncased"

model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID, 
    # device_map="auto", 
    torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

recipe = QuantizationModifier( 
    targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"]
)

# Apply quantization.
# NOTE: This is where we wrap save_pretrained
oneshot(model=model, recipe=recipe)

# Save to disk in compressed-tensors format.
SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic"
model.save_pretrained(SAVE_DIR)
tokenizer.save_pretrained(SAVE_DIR)

github-actions · 2024-10-09T16:46:49Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

src/llmcompressor/pytorch/model_load/helpers.py

dsikka

Can you provide examples on different loading scenarios (previously quantized models/legacy compression_config models) to understand how the interface should work?

In general I think we still have to make updates in the following areas:

Save pretrained should only be responsible for saving things to disk, the ModelCompressor pathway should be outside of this
Seems like we're still using the SparseAutoModel pathway in a couple of places?
Is there anymore clean-up we can do in loading/saving?
Confirm the loading we're doing works sufficiently for the work @kylesayrs is doing with adding modifier support for multi-modal cases

src/llmcompressor/transformers/sparsification/sparse_model.py

src/llmcompressor/transformers/sparsification/compressed_tensors_utils.py

src/llmcompressor/transformers/finetune/runner.py

src/llmcompressor/transformers/sparsification/sparse_model.py

dsikka

We should update e2e/example testing and other ci testing as well

…roject/llm-compressor into remove-sparseAutoModelForCausalLM

horheynm · 2024-10-23T14:45:29Z

With respect to the functionality, the PR is ready.

Next to dos:

Detach the compressor save_pretrained.

Strategy:
in main (shared flow for oneshot, etc), at the end, call the compressor, attach the compressed state_dict as the attribute as the model.

after oneshot is called, call save_pretrained(SAVE_DIR), and extract the statedict from the model.

Testing:

e2e
vllm
offloading
fsdp

examples/big_models_with_accelerate/cpu_offloading_fp8.py

src/llmcompressor/pytorch/model_load/helpers.py

src/llmcompressor/transformers/sparsification/compressed_tensors_utils.py

dsikka

@horheynm There are new consistent test failures across the different test cases

…move-sparseAutoModelForCausalLM

src/llmcompressor/transformers/sparsification/sparse_model.py

tests/llmcompressor/transformers/finetune/test_oneshot_and_finetune.py

src/llmcompressor/transformers/sparsification/sparse_model.py

src/llmcompressor/transformers/finetune/runner.py

kylesayrs

depreciate sparseautomodel
add back saving to oneshot pathway, or at least remove argument and remove from examples
remove breakpoint

…move-sparseAutoModelForCausalLM

dsikka

Can you please update the PR description?

Adding in a complete summary of the functionality that was changed, why, and testing that was done to verify behaviour after the changes were applied. The current description is unclear and makes the code hard to review.

It also seems like the test cases are still failing with a new test failure.

…roject/llm-compressor into remove-sparseAutoModelForCausalLM

rahul-tuli · 2024-10-30T15:55:50Z

@horheynm I have a request, could we split this into two PRs:

The actual removal of SparseAutoModelForCausalLM class
The updates to the examples and README.md

It will make this PR much easier to review

rahul-tuli

The code looks good! Did we run the examples again after latest changes?

kylesayrs

subclass automodel in sparseautomodel or raise depreciation error
add back saving to oneshot pathway, or at least remove argument from model_args and remove from examples

src/llmcompressor/transformers/sparsification/sparse_model.py

horheynm · 2024-10-30T18:32:09Z

subclass automodel in sparseautomodel or raise depreciation error

add back saving to oneshot pathway, or at least remove argument from model_args and remove from examples

we dont need this
There are entry points in the pipeline that uses output_dir to training_args, so removing that would need more validation. That would be next to dos.

kylesayrs · 2024-10-30T20:06:11Z

@horheynm On the first point, either choice is better than rather than raising a warning and then having the user encounter unexpected behavior or a nonsensical error. Either choice is a one line change.

On the second point, if we are really choosing to depreciate saving in oneshot, can you add a value error which points to the proper way to save? As it stands, many users have their own scripts they use to compress, some of which do not save the model manually. With the new update, the compression will occur but oneshot will silently fail to save, leaving users wondering where the saved model went or why it was never saved.

kylesayrs · 2024-10-30T20:07:57Z

Should we not also depreciate SparseAutoModel?

horheynm · 2024-10-31T14:03:16Z

I have broken this down into example changes and component changes, will close the PR

add save logic after oneshot is carried out

ecd0c23

dsikka reviewed Oct 9, 2024

View reviewed changes

src/llmcompressor/pytorch/model_load/helpers.py Outdated Show resolved Hide resolved

kylesayrs reviewed Oct 9, 2024

View reviewed changes

src/llmcompressor/pytorch/model_load/helpers.py Outdated Show resolved Hide resolved

horheynm added 6 commits October 11, 2024 17:21

draft

6c02347

run

926fd77

clean up

1934619

clean up

17d7b41

Merge branch 'main' into remove-sparseAutoModelForCausalLM

58fada1

merge conf

385ef1d

dsikka requested changes Oct 17, 2024

View reviewed changes

dsikka requested a review from kylesayrs October 17, 2024 14:54

robertgshaw2-redhat mentioned this pull request Oct 17, 2024

How to quantize a model not in AutoModelForCausalLM list? #851

Closed

dsikka reviewed Oct 17, 2024

View reviewed changes

src/llmcompressor/transformers/sparsification/sparse_model.py Outdated Show resolved Hide resolved

dsikka reviewed Oct 17, 2024

View reviewed changes

horheynm added 6 commits October 22, 2024 13:32

fix example scripts

8a9094f

get rid of SparseModelForCausalLM in example scripts and code

4c92ebc

clean up

b970c2e

Merge branch 'main' into remove-sparseAutoModelForCausalLM

5d4c444

fix import

dda327e

Merge branch 'remove-sparseAutoModelForCausalLM' of github.com:vllm-p…

7d44cf1

…roject/llm-compressor into remove-sparseAutoModelForCausalLM

horheynm changed the title ~~add save logic after oneshot is carried out~~ Remove SparseAutoModelForCausalLM Oct 23, 2024

fix gptq tests

1b78821

kylesayrs reviewed Oct 23, 2024

View reviewed changes

examples/big_models_with_accelerate/cpu_offloading_fp8.py Show resolved Hide resolved

include SparseAutoModelForCausalLM for backward comp

25c2983

rahul-tuli reviewed Oct 24, 2024

View reviewed changes

src/llmcompressor/pytorch/model_load/helpers.py Show resolved Hide resolved

src/llmcompressor/transformers/sparsification/compressed_tensors_utils.py Outdated Show resolved Hide resolved

dsikka reviewed Oct 24, 2024

View reviewed changes

horheynm added 2 commits October 24, 2024 17:51

move logic to diff func

d36c08f

Merge branch 'main' of github.com:vllm-project/llm-compressor into re…

2d12f20

…move-sparseAutoModelForCausalLM

conflicts

1ffa11c

kylesayrs requested changes Oct 24, 2024

View reviewed changes

src/llmcompressor/transformers/sparsification/sparse_model.py Show resolved Hide resolved

kylesayrs reviewed Oct 24, 2024

View reviewed changes

tests/llmcompressor/transformers/finetune/test_oneshot_and_finetune.py Outdated Show resolved Hide resolved

kylesayrs reviewed Oct 24, 2024

View reviewed changes

src/llmcompressor/transformers/sparsification/sparse_model.py Show resolved Hide resolved

kylesayrs reviewed Oct 24, 2024

View reviewed changes

src/llmcompressor/transformers/finetune/runner.py Show resolved Hide resolved

kylesayrs requested changes Oct 24, 2024

View reviewed changes

horheynm added 2 commits October 28, 2024 18:02

pass finetune tests and clean up

d7b9622

get rid of output_dir input arg to oneshot, apply, train, etc

39a9580

kylesayrs mentioned this pull request Oct 28, 2024

Support Model Offloading Tied Tensors Patch #872

Merged

horheynm added 3 commits October 29, 2024 16:29

Merge branch 'main' of github.com:vllm-project/llm-compressor into re…

0a6d796

…move-sparseAutoModelForCausalLM

deprecate SpaseAutoModel

378f412

remvoe breakpoint

00de40b

dsikka requested changes Oct 29, 2024

View reviewed changes

dsikka requested review from kylesayrs and rahul-tuli October 29, 2024 21:15

horheynm added 4 commits October 29, 2024 23:22

fix model reg

c2dc5ce

Merge branch 'main' into remove-sparseAutoModelForCausalLM

ae3038c

debug pytest

1aca3bc

Merge branch 'remove-sparseAutoModelForCausalLM' of github.com:vllm-p…

d5bedc1

…roject/llm-compressor into remove-sparseAutoModelForCausalLM

rahul-tuli previously approved these changes Oct 30, 2024

View reviewed changes

kylesayrs requested changes Oct 30, 2024

View reviewed changes

src/llmcompressor/transformers/sparsification/sparse_model.py Show resolved Hide resolved

rahul-tuli self-requested a review October 30, 2024 17:57

remove redudant code

904a306

horheynm dismissed rahul-tuli’s stale review via 904a306 October 30, 2024 18:31

horheynm closed this Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove SparseAutoModelForCausalLM #832

Remove SparseAutoModelForCausalLM #832

horheynm commented Oct 9, 2024 •

edited

Loading

github-actions bot commented Oct 9, 2024

dsikka left a comment •

edited

Loading

dsikka left a comment

horheynm commented Oct 23, 2024

dsikka left a comment

kylesayrs left a comment

dsikka left a comment •

edited

Loading

rahul-tuli commented Oct 30, 2024

rahul-tuli left a comment

kylesayrs left a comment •

edited

Loading

horheynm commented Oct 30, 2024

kylesayrs commented Oct 30, 2024

kylesayrs commented Oct 30, 2024

horheynm commented Oct 31, 2024

Remove SparseAutoModelForCausalLM #832

Remove SparseAutoModelForCausalLM #832

Conversation

horheynm commented Oct 9, 2024 • edited Loading

github-actions bot commented Oct 9, 2024

dsikka left a comment • edited Loading

Choose a reason for hiding this comment

dsikka left a comment

Choose a reason for hiding this comment

horheynm commented Oct 23, 2024

dsikka left a comment

Choose a reason for hiding this comment

kylesayrs left a comment

Choose a reason for hiding this comment

dsikka left a comment • edited Loading

Choose a reason for hiding this comment

rahul-tuli commented Oct 30, 2024

rahul-tuli left a comment

Choose a reason for hiding this comment

kylesayrs left a comment • edited Loading

Choose a reason for hiding this comment

horheynm commented Oct 30, 2024

kylesayrs commented Oct 30, 2024

kylesayrs commented Oct 30, 2024

horheynm commented Oct 31, 2024

horheynm commented Oct 9, 2024 •

edited

Loading

dsikka left a comment •

edited

Loading

dsikka left a comment •

edited

Loading

kylesayrs left a comment •

edited

Loading