[Bugfix] SparseGPT, Pipelines #1130

kylesayrs · 2025-02-07T20:23:23Z

Purpose

SparseGPT
- Fix behavior where targets specifies which modules to sparsity, not which layers to target
- Fix broken behavior with _infer_owl_layer_sparsity and add test
- Fix owl argument validation
- Add type hints and abstract methods for clarity
Pipelines
- Fix bug revealed by decorators added to the llama model definition in the latest transformers release
  - [Backend support] Allow num_logits_to_keep as Tensor and change it to logits_to_keep + add flag huggingface/transformers#35757
  - For the sequential pipeline, this revealed a bug in torch.fx._symbolic_trace where wrapped functions were not being handled properly
  - Future work could involve upstreaming a bug fix
- Fix issue caused by changes to llama model definition
  - 🧹 Remove deprecated RotaryEmbedding parts in the Attention layers huggingface/transformers#34858
  - For the layer sequential pipeline, this challenges the assumption that each layer input is the previous layer's output (which was known to be a fragile assumption)
- Fix issue related to basic pipeline slowdowns and inaccuracy

Changes

SparseGPT
- Fully separate targets and sequential_targets
  - Modify hooks adding logic to reflect this change
- Fix behavior of _infer_owl_layer_sparsity and add test
- Code clarity
  - Add additional type hints
  - Designate calibrate_module as an abstract method on the sgpt mixin
Pipelines
- Sequential pipeline: unwrap model forward function to avoid issues with pytorch function patching
- Layer Sequential Pipeline: Add maybe_inject_pos_embeddings to sequential pipeline to hackily support models with position_embeddings
- Basic Pipeline: Fix on_sequential_batch_end to call on the end of epoch, rather than every batch
  - Calling every batch was likely causing slowdowns

Followups

Remove deprecated sequential_update option from examples and tests

Testing

Added tests/llmcompressor/transformers/obcq/test_obcq_owl.py
Tested OBCQ+llama with sequential, layer sequential, and basic pipelines independently

Regression Evaluations

Models were compressed using examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py without fp8 option

sparsegpt

Main

vllm (pretrained=/home/kyle/llm-compressor/Meta-Llama-3-8B-InstructSparseGPTModifierMAIN,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|----------|------:|------|-----:|------|---|-----:|---|-----:|
|winogrande|      1|none  |     5|acc   |↑  |0.6243|±  |0.0136|

This branch

vllm (pretrained=/home/kyle/llm-compressor/Meta-Llama-3-8B-InstructSparseGPTModifierFEATURE,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|----------|------:|------|-----:|------|---|-----:|---|-----:|
|winogrande|      1|none  |     5|acc   |↑  |0.6306|±  |0.0136|

To test wanda, the SparseGPTModifier was replaced with the WandaPruningModifier

wanda

Main

vllm (pretrained=/home/kyle/llm-compressor/Meta-Llama-3-8B-InstructWandaPruningModifierMAIN,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|----------|------:|------|-----:|------|---|-----:|---|-----:|
|winogrande|      1|none  |     5|acc   |↑  |0.5912|±  |0.0138|

This branch

vllm (pretrained=/home/kyle/llm-compressor/Meta-Llama-3-8B-InstructWandaPruningModifierFEATURE,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|----------|------:|------|-----:|------|---|-----:|---|-----:|
|winogrande|      1|none  |     5|acc   |↑  |0.5817|±  |0.0139|

Signed-off-by: Kyle Sayers <[email protected]>

github-actions · 2025-02-07T20:23:34Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: Kyle Sayers <[email protected]>

dsikka

I don't agree with the notion that adding the same variables through inheritance helps with code clarity or user readability.

Updating the docstring in the inherited classes would be sufficient rather than relying on redundant code.

src/llmcompressor/pipelines/basic/pipeline.py

src/llmcompressor/pipelines/layer_sequential/pipeline.py

src/llmcompressor/pipelines/layer_sequential/helpers.py

src/llmcompressor/modifiers/obcq/base.py

src/llmcompressor/modifiers/pruning/wanda/base.py

src/llmcompressor/utils/helpers.py

Signed-off-by: Kyle Sayers <[email protected]>

src/llmcompressor/modifiers/obcq/sgpt_mixin.py

Signed-off-by: Kyle Sayers <[email protected]>

rahul-tuli

LGTM! (pending tests)

kylesayrs added 2 commits February 7, 2025 19:13

WIP

b9bea3c

Signed-off-by: Kyle Sayers <[email protected]>

implement sequential pipeline hack

4402121

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs marked this pull request as draft February 7, 2025 20:23

kylesayrs added 2 commits February 7, 2025 18:15

unwrap wrapping decorators

1479dc1

Signed-off-by: Kyle Sayers <[email protected]>

docstrings and comments

4f5cb61

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs marked this pull request as ready for review February 7, 2025 23:48

kylesayrs added 7 commits February 7, 2025 20:39

cleanup

9c33a06

Signed-off-by: Kyle Sayers <[email protected]>

fix matching

2b2c01a

Signed-off-by: Kyle Sayers <[email protected]>

closer match to original behavior

3c3da1b

Signed-off-by: Kyle Sayers <[email protected]>

add ignore

f6a96b3

Signed-off-by: Kyle Sayers <[email protected]>

add owl test

83fb437

Signed-off-by: Kyle Sayers <[email protected]>

Merge remote-tracking branch 'origin' into kylesayrs/fix-sgpt-targets

8d3529f

reduce diff

cacd87c

Signed-off-by: Kyle Sayers <[email protected]>

dsikka requested changes Feb 8, 2025

View reviewed changes

kylesayrs added 2 commits February 9, 2025 12:31

revert targets functionality in tests

d1fff02

Signed-off-by: Kyle Sayers <[email protected]>

remove replicated arguments

f02ec13

Signed-off-by: Kyle Sayers <[email protected]>

dsikka requested a review from rahul-tuli February 9, 2025 20:06

dsikka reviewed Feb 9, 2025

View reviewed changes

src/llmcompressor/modifiers/obcq/sgpt_mixin.py Show resolved Hide resolved

kylesayrs added the ready When a PR is ready for review label Feb 9, 2025

kylesayrs requested a review from dsikka February 9, 2025 21:31

clearer owl validation

3b577b8

Signed-off-by: Kyle Sayers <[email protected]>

rahul-tuli approved these changes Feb 10, 2025

View reviewed changes

dsikka approved these changes Feb 11, 2025

View reviewed changes

Merge branch 'main' into kylesayrs/fix-sgpt-targets

e056393

dsikka enabled auto-merge (squash) February 11, 2025 01:21

dsikka merged commit b55ec42 into main Feb 11, 2025
7 checks passed

dsikka deleted the kylesayrs/fix-sgpt-targets branch February 11, 2025 02:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] SparseGPT, Pipelines #1130

[Bugfix] SparseGPT, Pipelines #1130

kylesayrs commented Feb 7, 2025 •

edited

Loading

github-actions bot commented Feb 7, 2025

dsikka left a comment •

edited

Loading

rahul-tuli left a comment •

edited

Loading

[Bugfix] SparseGPT, Pipelines #1130

[Bugfix] SparseGPT, Pipelines #1130

Conversation

kylesayrs commented Feb 7, 2025 • edited Loading

Purpose

Changes

Followups

Testing

Regression Evaluations

github-actions bot commented Feb 7, 2025

dsikka left a comment • edited Loading

Choose a reason for hiding this comment

rahul-tuli left a comment • edited Loading

Choose a reason for hiding this comment

kylesayrs commented Feb 7, 2025 •

edited

Loading

dsikka left a comment •

edited

Loading

rahul-tuli left a comment •

edited

Loading