Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanse state dict of shared pointers before save #159

Closed
wants to merge 4 commits into from

Conversation

rahul-tuli
Copy link
Collaborator

SUMMARY:
Adapts code from https://github.com/huggingface/safetensors/blob/5db3b92c76ba293a0715b916c16b113c0b3551e9/bindings/python/py_src/safetensors/torch.py#L155 to cleanse state dict of shared pointers before saving.

Also check: https://huggingface.co/docs/safetensors/en/torch_shared_tensors

TEST PLAN:
The tests now pass on cpu and gpu

(.venv) ➜  llm-compressor git:(main) ✗ CUDA_VISIBLE_DEVICES="" pytest "tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_sparse_model_reload[True-None-dtype0]" -v
====================================================================== test session starts =======================================================================
platform linux -- Python 3.10.12, pytest-8.3.3, pluggy-1.5.0 -- /home/rahul/llm-compressor/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/rahul/llm-compressor
configfile: pyproject.toml
plugins: rerunfailures-14.0, mock-3.14.0
collected 1 item                                                                                                                                                 

tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_sparse_model_reload[True-None-dtype0] PASSED                           [100%]

======================================================================= 1 passed in 13.02s =======================================================================
(.venv) ➜  llm-compressor git:(main) ✗ CUDA_VISIBLE_DEVICES="" pytest "tests/llmcompressor/transformers/sparsification" -v                                     
====================================================================== test session starts =======================================================================
platform linux -- Python 3.10.12, pytest-8.3.3, pluggy-1.5.0 -- /home/rahul/llm-compressor/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/rahul/llm-compressor
configfile: pyproject.toml
plugins: rerunfailures-14.0, mock-3.14.0
collected 12 items                                                                                                                                               

tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_sparse_model_reload[True-None-dtype0] PASSED                           [  8%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_sparse_model_reload[False-config1-dtype1] PASSED                       [ 16%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_sparse_model_reload[True-config2-dtype2] PASSED                        [ 25%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_sparse_model_reload[False-config3-dtype3] PASSED                       [ 33%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_sparse_model_reload[False-None-dtype4] PASSED                          [ 41%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_dense_model_save[True-True] PASSED                                     [ 50%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_dense_model_save[True-False] PASSED                                    [ 58%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_dense_model_save[False-True] PASSED                                    [ 66%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_dense_model_save[False-False] PASSED                                   [ 75%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_quant_model_reload[dense-dtype0] PASSED                                [ 83%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_quant_model_reload[dense-dtype1] PASSED                                [ 91%]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_quant_model_reload[int_quantized-dtype2] PASSED                        [100%]

======================================================================== warnings summary ========================================================================
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_quant_model_reload[dense-dtype0]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_quant_model_reload[dense-dtype0]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_quant_model_reload[dense-dtype1]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_quant_model_reload[dense-dtype1]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_quant_model_reload[int_quantized-dtype2]
tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_quant_model_reload[int_quantized-dtype2]
  /home/rahul/llm-compressor/.venv/lib/python3.10/site-packages/pydantic/main.py:1156: PydanticDeprecatedSince20: The `parse_obj` method is deprecated; use `model_validate` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.9/migration/
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================================================== 12 passed, 6 warnings in 87.25s (0:01:27) ============================================================

Also fixes the shared tensors issue seen in ex_trl_distillation

Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

@dsikka dsikka requested a review from Satrat September 10, 2024 18:01
@rahul-tuli rahul-tuli requested review from Satrat and dsikka and removed request for Satrat September 10, 2024 18:01
@kylesayrs
Copy link
Collaborator

Can you please test that test_oneshot_then_finetune.py passes? I'm getting an error on that test which is related to these changes

@kylesayrs
Copy link
Collaborator

Specifically the issue I'm seeing seems to relate to loading a model which has been saved after oneshot

@kylesayrs
Copy link
Collaborator

kylesayrs commented Sep 11, 2024

Using these three scripts to test

reload_stories_normal.py
import os, shutil
from llmcompressor.transformers import SparseAutoModelForCausalLM

output_dir = "./my_model"
if os.path.exists(output_dir):
    shutil.rmtree(output_dir)

# base
model = SparseAutoModelForCausalLM.from_pretrained("Xenova/llama2.c-stories15M", device_map="auto", torch_dtype="auto")

# save
model.save_pretrained(
    output_dir,
    save_compressed=True,
    safe_serialization=False,  # False:=pytorch_model.bin, True:=model.safetensors
)

# load normal
model = SparseAutoModelForCausalLM.from_pretrained(
    output_dir, device_map="auto"
)
print(model)
reload_stories_oneshot.py
import os, shutil
from llmcompressor.core import create_session
from llmcompressor.transformers import (
    SparseAutoModelForCausalLM,
    oneshot,
)

output_dir = "./oneshot_out"
if os.path.exists(output_dir):
    shutil.rmtree(output_dir)
recipe_str = "tests/llmcompressor/transformers/obcq/recipes/test_tiny2.yaml"
dataset = "open_platypus"
concatenate_data = False
num_calibration_samples = 64
splits = {"calibration": "train[:10%]"}


# base
model = SparseAutoModelForCausalLM.from_pretrained(
    "Xenova/llama2.c-stories15M", device_map="auto"
)

# save oneshot
with create_session():
    oneshot(
        model=model,
        dataset=dataset,
        output_dir=output_dir,
        num_calibration_samples=num_calibration_samples,
        recipe=recipe_str,
        concatenate_data=concatenate_data,
        splits=splits,
    )

# load oneshot
model = SparseAutoModelForCausalLM.from_pretrained(
    output_dir, device_map="auto"
)
print(model)
reload_stories_distill.py
import os, shutil
from llmcompressor.core import create_session
from llmcompressor.transformers import (
    SparseAutoModelForCausalLM,
    oneshot, train
)

output_dir = "./distill_out"
if os.path.exists(output_dir):
    shutil.rmtree(output_dir)
dataset = "open_platypus"
concatenate_data = False
splits = "train[:50%]"
max_steps = 50
num_calibration_samples = 64
recipe_str = "tests/llmcompressor/transformers/finetune/test_finetune_recipe.yaml"

# base
model = SparseAutoModelForCausalLM.from_pretrained(
    "Xenova/llama2.c-stories15M", device_map="auto"
)
distill_teacher = SparseAutoModelForCausalLM.from_pretrained(
    "Xenova/llama2.c-stories15M", device_map="auto"
)

# distill
with create_session():
    train(
        model=model,
        distill_teacher=distill_teacher,
        dataset=dataset,
        output_dir=output_dir,
        num_calibration_samples=num_calibration_samples,
        recipe=recipe_str,
        concatenate_data=concatenate_data,
        splits=splits,
        max_steps=max_steps,
    )

# load
model = SparseAutoModelForCausalLM.from_pretrained(
    output_dir, device_map="auto"
)

@kylesayrs
Copy link
Collaborator

Loading a model which has removed tensors leads to loading failures right now. I think we'd need to use to adapt safetensors' load_model function on the loading side in order to fully support this method

@kylesayrs
Copy link
Collaborator

kylesayrs commented Sep 12, 2024

I'm still investigating as to why load_model, save_model is not the default pathway. Perhaps metadata loss

@kylesayrs
Copy link
Collaborator

I think the problem with this approach is that it ignores the root problem, namely that tensors are being shared when they shouldn't be.

I've opened an issue on HF transformers here
We can use #659 as a workaround. This workaround has the advantage where if a user targets/modifies lm_head for example, embed_tokens won't be modified, whereas in this branch it would be

@kylesayrs
Copy link
Collaborator

#659

@kylesayrs kylesayrs closed this Oct 10, 2024
markmc pushed a commit to markmc/llm-compressor that referenced this pull request Nov 13, 2024
@rahul-tuli rahul-tuli deleted the shared-pointers-fix branch January 23, 2025 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants