Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedding4bit and Embedding8bit implementation #1292

Merged

Conversation

galqiwi
Copy link
Contributor

@galqiwi galqiwi commented Jul 24, 2024

Hi! I've been researching LLM quantization and found a bottleneck that I think this PR can fix.

When using extreme 1 bit and 2 bit LLM quantization (which have seen many improvements recently 1, 2, 3, 4, 5), uncompressed embeddings can start to take up too much space (in some cases more than 50%).

https://galqiwi.ru/persistent/2024-06-18/embed-1.png

I've documented this bottleneck in huggingface/transformers issue, and it looks like the bitsandbytes library can be a good place to start dealing with it.

In this PR I implement embedding modules for 4-bit and 8-bit quantizations from this library. Currently, they only support _load_from_state_dict API and can't be saved, but I think they still can be useful.

After that, I plan to integrate this functionality into the transformers library by extending HfQuantizer functionality.

What do you think?


There is also one thing I want to implement before going back to the transformers library: support for shared weights in 8-bit quantization.

While 4-bit quantization linear layer does not seem to change self.weight parameter during forward pass, 8-bit quantization linear layer changes it dramatically in init_8bit_state method.

So, while 4-bit embedding and linear layer can share the same Params4bit parameter, it is not the case for 8-bit.

I think that this patch should fix the problem, but this part of the code is very tightly coupled to everything around itself, and I need your help with advice. Do you think it can break something important I don't see?

@galqiwi
Copy link
Contributor Author

galqiwi commented Jul 30, 2024

Bump

P.S. Part about shared embeddings can be discussed later in another PR or issue

@galqiwi galqiwi force-pushed the embedding_quantization branch from 67d546e to 35fd05c Compare July 30, 2024 10:53
@matthewdouglas matthewdouglas self-assigned this Jul 30, 2024
@matthewdouglas matthewdouglas self-requested a review July 30, 2024 14:32
@matthewdouglas
Copy link
Member

Hi @galqiwi! Thank you for the PR! I think this would be a very useful addition and will review this week. I agree that the shared embeddings can be deferred to follow up discussion/PRs.

@galqiwi galqiwi force-pushed the embedding_quantization branch from 35fd05c to 811aa6c Compare August 5, 2024 14:19
@matthewdouglas matthewdouglas added the enhancement New feature or request label Aug 5, 2024
@matthewdouglas
Copy link
Member

Thanks @galqiwi! Overall, this looks great! I just left a few minor nits, but otherwise happy to merge!

@galqiwi
Copy link
Contributor Author

galqiwi commented Aug 6, 2024

Thank you for reviewing my PR, @matthewdouglas! I've fixed all the typos you found

@matthewdouglas
Copy link
Member

Thanks @galqiwi! This is a great contribution, and the unit tests here are really appreciated!

@matthewdouglas matthewdouglas merged commit 6d714a5 into bitsandbytes-foundation:main Aug 6, 2024
28 checks passed
@galqiwi
Copy link
Contributor Author

galqiwi commented Sep 11, 2024

Hi again! Are you planning on publishing new release of bnb?

matthewdouglas added a commit to matthewdouglas/bitsandbytes that referenced this pull request Oct 28, 2024
…on#1292)

* Embedding4bit and Embedding8bit implementation

* lint

* Update bitsandbytes/nn/modules.py

Co-authored-by: Matthew Douglas <[email protected]>

* Update bitsandbytes/nn/modules.py

Co-authored-by: Matthew Douglas <[email protected]>

* Update bitsandbytes/nn/modules.py

Co-authored-by: Matthew Douglas <[email protected]>

* saving -> Saving

---------

Co-authored-by: Matthew Douglas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants