Embedding4bit and Embedding8bit implementation #1292

galqiwi · 2024-07-24T10:31:33Z

Hi! I've been researching LLM quantization and found a bottleneck that I think this PR can fix.

When using extreme 1 bit and 2 bit LLM quantization (which have seen many improvements recently 1, 2, 3, 4, 5), uncompressed embeddings can start to take up too much space (in some cases more than 50%).

I've documented this bottleneck in huggingface/transformers issue, and it looks like the bitsandbytes library can be a good place to start dealing with it.

In this PR I implement embedding modules for 4-bit and 8-bit quantizations from this library. Currently, they only support _load_from_state_dict API and can't be saved, but I think they still can be useful.

After that, I plan to integrate this functionality into the transformers library by extending HfQuantizer functionality.

What do you think?

There is also one thing I want to implement before going back to the transformers library: support for shared weights in 8-bit quantization.

While 4-bit quantization linear layer does not seem to change self.weight parameter during forward pass, 8-bit quantization linear layer changes it dramatically in init_8bit_state method.

So, while 4-bit embedding and linear layer can share the same Params4bit parameter, it is not the case for 8-bit.

I think that this patch should fix the problem, but this part of the code is very tightly coupled to everything around itself, and I need your help with advice. Do you think it can break something important I don't see?

galqiwi · 2024-07-30T10:45:12Z

Bump

P.S. Part about shared embeddings can be discussed later in another PR or issue

matthewdouglas · 2024-07-30T14:35:37Z

Hi @galqiwi! Thank you for the PR! I think this would be a very useful addition and will review this week. I agree that the shared embeddings can be deferred to follow up discussion/PRs.

bitsandbytes/nn/modules.py

matthewdouglas · 2024-08-05T19:47:55Z

Thanks @galqiwi! Overall, this looks great! I just left a few minor nits, but otherwise happy to merge!

Co-authored-by: Matthew Douglas <[email protected]>

galqiwi · 2024-08-06T12:49:04Z

Thank you for reviewing my PR, @matthewdouglas! I've fixed all the typos you found

matthewdouglas · 2024-08-06T19:00:42Z

Thanks @galqiwi! This is a great contribution, and the unit tests here are really appreciated!

galqiwi · 2024-09-11T00:03:45Z

Hi again! Are you planning on publishing new release of bnb?

…on#1292) * Embedding4bit and Embedding8bit implementation * lint * Update bitsandbytes/nn/modules.py Co-authored-by: Matthew Douglas <[email protected]> * Update bitsandbytes/nn/modules.py Co-authored-by: Matthew Douglas <[email protected]> * Update bitsandbytes/nn/modules.py Co-authored-by: Matthew Douglas <[email protected]> * saving -> Saving --------- Co-authored-by: Matthew Douglas <[email protected]>

Titus-von-Koeller force-pushed the main branch from 774d065 to 9b72679 Compare July 24, 2024 12:15

galqiwi force-pushed the embedding_quantization branch from ef087bc to 67d546e Compare July 24, 2024 19:58

Titus-von-Koeller force-pushed the main branch from 9b72679 to 7800734 Compare July 27, 2024 13:16

galqiwi force-pushed the embedding_quantization branch from 67d546e to 35fd05c Compare July 30, 2024 10:53

matthewdouglas self-assigned this Jul 30, 2024

matthewdouglas self-requested a review July 30, 2024 14:32

galqiwi added 2 commits August 5, 2024 17:19

Embedding4bit and Embedding8bit implementation

a1c7c61

lint

811aa6c

galqiwi force-pushed the embedding_quantization branch from 35fd05c to 811aa6c Compare August 5, 2024 14:19

matthewdouglas added the enhancement New feature or request label Aug 5, 2024

matthewdouglas reviewed Aug 5, 2024

View reviewed changes

bitsandbytes/nn/modules.py Outdated Show resolved Hide resolved

matthewdouglas reviewed Aug 5, 2024

View reviewed changes

bitsandbytes/nn/modules.py Outdated Show resolved Hide resolved

matthewdouglas reviewed Aug 5, 2024

View reviewed changes

bitsandbytes/nn/modules.py Outdated Show resolved Hide resolved

galqiwi and others added 4 commits August 6, 2024 15:42

Update bitsandbytes/nn/modules.py

2fa01ae

Co-authored-by: Matthew Douglas <[email protected]>

Update bitsandbytes/nn/modules.py

99aaabc

Co-authored-by: Matthew Douglas <[email protected]>

Update bitsandbytes/nn/modules.py

553a27f

Co-authored-by: Matthew Douglas <[email protected]>

saving -> Saving

3ab3b12

matthewdouglas merged commit 6d714a5 into bitsandbytes-foundation:main Aug 6, 2024
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embedding4bit and Embedding8bit implementation #1292

Embedding4bit and Embedding8bit implementation #1292

galqiwi commented Jul 24, 2024

galqiwi commented Jul 30, 2024

matthewdouglas commented Jul 30, 2024

matthewdouglas commented Aug 5, 2024

galqiwi commented Aug 6, 2024

matthewdouglas commented Aug 6, 2024

galqiwi commented Sep 11, 2024

Embedding4bit and Embedding8bit implementation #1292

Embedding4bit and Embedding8bit implementation #1292

Conversation

galqiwi commented Jul 24, 2024

galqiwi commented Jul 30, 2024

matthewdouglas commented Jul 30, 2024

matthewdouglas commented Aug 5, 2024

galqiwi commented Aug 6, 2024

matthewdouglas commented Aug 6, 2024

galqiwi commented Sep 11, 2024