Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove SparseAutoModelForCausalLM #832

Closed
wants to merge 30 commits into from

Conversation

horheynm
Copy link
Collaborator

@horheynm horheynm commented Oct 9, 2024

SUMMARY:
Goal is to get rid of SparseAutoModelForCausalLM - users can still use but a deprecation warning pops up
Class removal causes logic change in the saving.
Two main pathway of saving - fsdp and save_pretrained - both are wrapped on save_pretrained.

Other changes:

  • oneshot no longer saves the models, hence example code has output_dir argument removed.
  • Deprecation of SparseAutoModel, bc it uses SparseAutoModelForCausalLM to load models

TEST PLAN:

  • Example code dry run for small, short-run time models and making sure the saved files are same as main
  • Example runs for the below cases.

baseline case:

from transformers import AutoModelForCausalLM, AutoTokenizer

from llmcompressor.modifiers.quantization import QuantizationModifier
from llmcompressor.transformers import oneshot

# Load model.
MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID, device_map="auto", torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

recipe = QuantizationModifier(
    targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"]
)

# Apply quantization.
# NOTE: This is where we wrap save_pretrained
oneshot(model=model, recipe=recipe)

# Save to disk in compressed-tensors format.
SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic"
model.save_pretrained(SAVE_DIR)
tokenizer.save_pretrained(SAVE_DIR)

vision:

rom transformers import (
    AutoModelForCausalLM,
    AutoProcessor,
    MllamaForConditionalGeneration,
    AutoTokenizer,
)

from llmcompressor.modifiers.quantization import QuantizationModifier
from llmcompressor.transformers import oneshot

# Load model.
MODEL_ID = "meta-llama/Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(
    MODEL_ID, device_map="auto", torch_dtype="auto"
)
# model = AutoModelForCausalLM.from_pretrained(
#     MODEL_ID, device_map="auto", torch_dtype="auto"
# )
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

processor = AutoProcessor.from_pretrained(MODEL_ID)

# Configure the quantization algorithm and scheme.
# In this case, we:
#   * quantize the weights to fp8 with per channel via ptq
#   * quantize the activations to fp8 with dynamic per token
recipe = QuantizationModifier(
    targets="Linear",
    # scheme="FP8_DYNAMIC",
    scheme="FP8",
    ignore=["re:.*lm_head", "re:multi_modal_projector.*", "re:vision_model.*"],
)

# Apply quantization and save to disk in compressed-tensors format.
SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-model_save-MllamaForConditionalGeneration"
# SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic-foo"

oneshot(
    model=model,
    recipe=recipe,
    output_dir=SAVE_DIR,
    dataset="open_platypus",
    num_calibration_samples=2,
)
processor.save_pretrained(SAVE_DIR)
model.save_pretrained(SAVE_DIR)

bert:

from transformers import AutoModelForCausalLM, AutoTokenizer

from llmcompressor.modifiers.quantization import QuantizationModifier
from llmcompressor.transformers import oneshot

# Load model.
MODEL_ID = "google-bert/bert-base-uncased"

model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID, 
    # device_map="auto", 
    torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

recipe = QuantizationModifier( 
    targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"]
)

# Apply quantization.
# NOTE: This is where we wrap save_pretrained
oneshot(model=model, recipe=recipe)

# Save to disk in compressed-tensors format.
SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic"
model.save_pretrained(SAVE_DIR)
tokenizer.save_pretrained(SAVE_DIR)

Copy link

github-actions bot commented Oct 9, 2024

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide examples on different loading scenarios (previously quantized models/legacy compression_config models) to understand how the interface should work?

In general I think we still have to make updates in the following areas:

  1. Save pretrained should only be responsible for saving things to disk, the ModelCompressor pathway should be outside of this
  2. Seems like we're still using the SparseAutoModel pathway in a couple of places?
  3. Is there anymore clean-up we can do in loading/saving?
  4. Confirm the loading we're doing works sufficiently for the work @kylesayrs is doing with adding modifier support for multi-modal cases

Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update e2e/example testing and other ci testing as well

@horheynm horheynm changed the title add save logic after oneshot is carried out Remove SparseAutoModelForCausalLM Oct 23, 2024
@horheynm
Copy link
Collaborator Author

With respect to the functionality, the PR is ready.

Next to dos:

  • Detach the compressor save_pretrained.

Strategy:
in main (shared flow for oneshot, etc), at the end, call the compressor, attach the compressed state_dict as the attribute as the model.

after oneshot is called, call save_pretrained(SAVE_DIR), and extract the statedict from the model.

Testing:

  • e2e
  • vllm
  • offloading
  • fsdp

Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@horheynm There are new consistent test failures across the different test cases

Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. depreciate sparseautomodel
  2. add back saving to oneshot pathway, or at least remove argument and remove from examples
  3. remove breakpoint

Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please update the PR description?

Adding in a complete summary of the functionality that was changed, why, and testing that was done to verify behaviour after the changes were applied. The current description is unclear and makes the code hard to review.

It also seems like the test cases are still failing with a new test failure.

@rahul-tuli
Copy link
Collaborator

@horheynm I have a request, could we split this into two PRs:

  1. The actual removal of SparseAutoModelForCausalLM class
  2. The updates to the examples and README.md

It will make this PR much easier to review

rahul-tuli
rahul-tuli previously approved these changes Oct 30, 2024
Copy link
Collaborator

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good! Did we run the examples again after latest changes?

Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. subclass automodel in sparseautomodel or raise depreciation error
  2. add back saving to oneshot pathway, or at least remove argument from model_args and remove from examples

@rahul-tuli rahul-tuli self-requested a review October 30, 2024 17:57
@horheynm
Copy link
Collaborator Author

  1. subclass automodel in sparseautomodel or raise depreciation error
  2. add back saving to oneshot pathway, or at least remove argument from model_args and remove from examples
  1. we dont need this
  2. There are entry points in the pipeline that uses output_dir to training_args, so removing that would need more validation. That would be next to dos.

@kylesayrs
Copy link
Collaborator

@horheynm On the first point, either choice is better than rather than raising a warning and then having the user encounter unexpected behavior or a nonsensical error. Either choice is a one line change.

On the second point, if we are really choosing to depreciate saving in oneshot, can you add a value error which points to the proper way to save? As it stands, many users have their own scripts they use to compress, some of which do not save the model manually. With the new update, the compression will occur but oneshot will silently fail to save, leaving users wondering where the saved model went or why it was never saved.

@kylesayrs
Copy link
Collaborator

Should we not also depreciate SparseAutoModel?

@horheynm
Copy link
Collaborator Author

I have broken this down into example changes and component changes, will close the PR

@horheynm horheynm closed this Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants