Cleanse state dict of shared pointers before save #159

rahul-tuli · 2024-09-10T17:52:07Z

SUMMARY:
Adapts code from https://github.com/huggingface/safetensors/blob/5db3b92c76ba293a0715b916c16b113c0b3551e9/bindings/python/py_src/safetensors/torch.py#L155 to cleanse state dict of shared pointers before saving.

Also check: https://huggingface.co/docs/safetensors/en/torch_shared_tensors

TEST PLAN:
The tests now pass on cpu and gpu

(.venv) ➜  llm-compressor git:(main) ✗ CUDA_VISIBLE_DEVICES="" pytest "tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_sparse_model_reload[True-None-dtype0]" -v
====================================================================== test session starts =======================================================================
platform linux -- Python 3.10.12, pytest-8.3.3, pluggy-1.5.0 -- /home/rahul/llm-compressor/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/rahul/llm-compressor
configfile: pyproject.toml
plugins: rerunfailures-14.0, mock-3.14.0
collected 1 item                                                                                                                                                 

tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_sparse_model_reload[True-None-dtype0] PASSED                           [100%]

======================================================================= 1 passed in 13.02s =======================================================================
(.venv) ➜  llm-compressor git:(main) ✗ CUDA_VISIBLE_DEVICES="" pytest "tests/llmcompressor/transformers/sparsification" -v                                     
====================================================================== test session starts =======================================================================
platform linux -- Python 3.10.12, pytest-8.3.3, pluggy-1.5.0 -- /home/rahul/llm-compressor/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/rahul/llm-compressor
configfile: pyproject.toml
plugins: rerunfailures-14.0, mock-3.14.0
collected 12 items                                                                                                                                               

tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_sparse_model_reload[True-None-dtype0] PASSED                           [  8%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_sparse_model_reload[False-config1-dtype1] PASSED                       [ 16%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_sparse_model_reload[True-config2-dtype2] PASSED                        [ 25%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_sparse_model_reload[False-config3-dtype3] PASSED                       [ 33%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_sparse_model_reload[False-None-dtype4] PASSED                          [ 41%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_dense_model_save[True-True] PASSED                                     [ 50%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_dense_model_save[True-False] PASSED                                    [ 58%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_dense_model_save[False-True] PASSED                                    [ 66%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_dense_model_save[False-False] PASSED                                   [ 75%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_quant_model_reload[dense-dtype0] PASSED                                [ 83%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_quant_model_reload[dense-dtype1] PASSED                                [ 91%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_quant_model_reload[int_quantized-dtype2] PASSED                        [100%]

======================================================================== warnings summary ========================================================================
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_quant_model_reload[dense-dtype0]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_quant_model_reload[dense-dtype0]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_quant_model_reload[dense-dtype1]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_quant_model_reload[dense-dtype1]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_quant_model_reload[int_quantized-dtype2]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_quant_model_reload[int_quantized-dtype2]
  /home/rahul/llm-compressor/.venv/lib/python3.10/site-packages/pydantic/main.py:1156: PydanticDeprecatedSince20: The `parse_obj` method is deprecated; use `model_validate` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.9/migration/
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================================================== 12 passed, 6 warnings in 87.25s (0:01:27) ============================================================

Also fixes the shared tensors issue seen in ex_trl_distillation

github-actions · 2024-09-10T17:52:19Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

kylesayrs · 2024-09-10T18:39:26Z

Can you please test that test_oneshot_then_finetune.py passes? I'm getting an error on that test which is related to these changes

kylesayrs · 2024-09-10T19:28:25Z

Specifically the issue I'm seeing seems to relate to loading a model which has been saved after oneshot

src/llmcompressor/transformers/sparsification/compressed_tensors_utils.py

kylesayrs · 2024-09-11T17:38:08Z

Using these three scripts to test

reload_stories_normal.py

import os, shutil
from llmcompressor.transformers import SparseAutoModelForCausalLM

output_dir = "./my_model"
if os.path.exists(output_dir):
    shutil.rmtree(output_dir)

# base
model = SparseAutoModelForCausalLM.from_pretrained("Xenova/llama2.c-stories15M", device_map="auto", torch_dtype="auto")

# save
model.save_pretrained(
    output_dir,
    save_compressed=True,
    safe_serialization=False,  # False:=pytorch_model.bin, True:=model.safetensors
)

# load normal
model = SparseAutoModelForCausalLM.from_pretrained(
    output_dir, device_map="auto"
)
print(model)

reload_stories_oneshot.py

import os, shutil
from llmcompressor.core import create_session
from llmcompressor.transformers import (
    SparseAutoModelForCausalLM,
    oneshot,
)

output_dir = "./oneshot_out"
if os.path.exists(output_dir):
    shutil.rmtree(output_dir)
recipe_str = "tests/llmcompressor/transformers/obcq/recipes/test_tiny2.yaml"
dataset = "open_platypus"
concatenate_data = False
num_calibration_samples = 64
splits = {"calibration": "train[:10%]"}


# base
model = SparseAutoModelForCausalLM.from_pretrained(
    "Xenova/llama2.c-stories15M", device_map="auto"
)

# save oneshot
with create_session():
    oneshot(
        model=model,
        dataset=dataset,
        output_dir=output_dir,
        num_calibration_samples=num_calibration_samples,
        recipe=recipe_str,
        concatenate_data=concatenate_data,
        splits=splits,
    )

# load oneshot
model = SparseAutoModelForCausalLM.from_pretrained(
    output_dir, device_map="auto"
)
print(model)

reload_stories_distill.py

import os, shutil
from llmcompressor.core import create_session
from llmcompressor.transformers import (
    SparseAutoModelForCausalLM,
    oneshot, train
)

output_dir = "./distill_out"
if os.path.exists(output_dir):
    shutil.rmtree(output_dir)
dataset = "open_platypus"
concatenate_data = False
splits = "train[:50%]"
max_steps = 50
num_calibration_samples = 64
recipe_str = "tests/llmcompressor/transformers/finetune/test_finetune_recipe.yaml"

# base
model = SparseAutoModelForCausalLM.from_pretrained(
    "Xenova/llama2.c-stories15M", device_map="auto"
)
distill_teacher = SparseAutoModelForCausalLM.from_pretrained(
    "Xenova/llama2.c-stories15M", device_map="auto"
)

# distill
with create_session():
    train(
        model=model,
        distill_teacher=distill_teacher,
        dataset=dataset,
        output_dir=output_dir,
        num_calibration_samples=num_calibration_samples,
        recipe=recipe_str,
        concatenate_data=concatenate_data,
        splits=splits,
        max_steps=max_steps,
    )

# load
model = SparseAutoModelForCausalLM.from_pretrained(
    output_dir, device_map="auto"
)

kylesayrs · 2024-09-12T13:51:05Z

Loading a model which has removed tensors leads to loading failures right now. I think we'd need to use to adapt safetensors' load_model function on the loading side in order to fully support this method

kylesayrs · 2024-09-12T13:51:51Z

I'm still investigating as to why load_model, save_model is not the default pathway. Perhaps metadata loss

kylesayrs · 2024-09-25T02:05:59Z

I think the problem with this approach is that it ignores the root problem, namely that tensors are being shared when they shouldn't be.

I've opened an issue on HF transformers here
We can use #659 as a workaround. This workaround has the advantage where if a user targets/modifies lm_head for example, embed_tokens won't be modified, whereas in this branch it would be

kylesayrs · 2024-10-10T03:16:08Z

#659

)

Cleanse state dict of shared pointers before save

7392310

dsikka requested a review from Satrat September 10, 2024 18:01

rahul-tuli requested review from Satrat and dsikka and removed request for Satrat September 10, 2024 18:01

fix style

0ddd858

kylesayrs requested changes Sep 10, 2024

View reviewed changes

src/llmcompressor/transformers/sparsification/compressed_tensors_utils.py Show resolved Hide resolved

kylesayrs reviewed Sep 11, 2024

View reviewed changes

src/llmcompressor/transformers/sparsification/compressed_tensors_utils.py Outdated Show resolved Hide resolved

rahul-tuli added 2 commits September 12, 2024 08:49

Comments!

d5418d9

Add failing tests

ee856d6

kylesayrs closed this Oct 10, 2024

markmc pushed a commit to markmc/llm-compressor that referenced this pull request Nov 13, 2024

Add: targets and ignore to sparsity compression config (vllm-project#159

83b2e7a

)

rahul-tuli deleted the shared-pointers-fix branch January 23, 2025 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanse state dict of shared pointers before save #159

Cleanse state dict of shared pointers before save #159

rahul-tuli commented Sep 10, 2024

github-actions bot commented Sep 10, 2024

kylesayrs commented Sep 10, 2024

kylesayrs commented Sep 10, 2024

kylesayrs commented Sep 11, 2024 •

edited

Loading

kylesayrs commented Sep 12, 2024

kylesayrs commented Sep 12, 2024 •

edited

Loading

kylesayrs commented Sep 25, 2024

kylesayrs commented Oct 10, 2024

Cleanse state dict of shared pointers before save #159

Cleanse state dict of shared pointers before save #159

Conversation

rahul-tuli commented Sep 10, 2024

github-actions bot commented Sep 10, 2024

kylesayrs commented Sep 10, 2024

kylesayrs commented Sep 10, 2024

kylesayrs commented Sep 11, 2024 • edited Loading

kylesayrs commented Sep 12, 2024

kylesayrs commented Sep 12, 2024 • edited Loading

kylesayrs commented Sep 25, 2024

kylesayrs commented Oct 10, 2024

kylesayrs commented Sep 11, 2024 •

edited

Loading

kylesayrs commented Sep 12, 2024 •

edited

Loading