Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable double quant on Intel CPU and XPU #1472

Conversation

jiqing-feng
Copy link
Contributor

Enable double quant on 4bit implementation for Intel CPU and XPU.

@jiqing-feng
Copy link
Contributor Author

jiqing-feng commented Jan 9, 2025

Hi @Titus-von-Koeller . I enabled the double quant on 4bit implementation for Intel CPU/XPU, and checked the results and performance. Where should I add a test about it? Thanks!

@jiqing-feng jiqing-feng marked this pull request as draft January 9, 2025 11:49
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
@jiqing-feng jiqing-feng marked this pull request as ready for review January 10, 2025 02:32
@Titus-von-Koeller
Copy link
Collaborator

Thanks @jiqing-feng!

We'll get back to you about this soon.

@jiqing-feng
Copy link
Contributor Author

jiqing-feng commented Jan 20, 2025

Hi @Titus-von-Koeller . I made some new changes on this PR, it fixes the 4bit data format and align with cuda. For more details:

In cuda, the 4bit value will be packed into uint8 tensor like [1, 2] will be pack to18, because 1 = 0b0001, 2 = 0b0010, and 0b00010010 is 18. We literally put the first value into the left position.
In cpu and xpu, the value will be 0b00100001 = 33, because we put the first value into the right position to be compatible with our ipex API.

In this PR, we kept the 4bit format as the same as cuda on cpu/xpu and converted the format to ipex compatible format only when initializing ipex linear.

With this change, we can run a quantized model like hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4

Please let me know if I didn't make it clear. Besides, I have passed all tests in transformers. Thanks!

@jiqing-feng jiqing-feng marked this pull request as draft January 20, 2025 07:50
@jiqing-feng
Copy link
Contributor Author

jiqing-feng commented Jan 20, 2025

XPU has performance issue, I will figure it out.

Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
@jiqing-feng jiqing-feng marked this pull request as ready for review January 21, 2025 01:45
@jiqing-feng jiqing-feng marked this pull request as draft January 21, 2025 01:48
@jiqing-feng jiqing-feng marked this pull request as ready for review January 21, 2025 01:58
@jiqing-feng
Copy link
Contributor Author

jiqing-feng commented Jan 21, 2025

XPU has performance issue, I will figure it out.

The XPU issue has been fixed.
Hi @Titus-von-Koeller , the PR is prepared to be reviewed. It passed all transformers tests and I also verified it on some generation and lora finetune tasks.

@matthewdouglas matthewdouglas self-requested a review January 21, 2025 02:11
@matthewdouglas
Copy link
Member

Thanks @jiqing-feng! Really appreciate the effort to keep the serialized format compatible :)

I'll take care of reviewing more closely through this week but it looks good at first pass!

Signed-off-by: jiqing-feng <[email protected]>
@matthewdouglas matthewdouglas merged commit f6025bc into bitsandbytes-foundation:multi-backend-refactor Jan 22, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants