-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable double quant on Intel CPU and XPU #1472
Enable double quant on Intel CPU and XPU #1472
Conversation
Hi @Titus-von-Koeller . I enabled the double quant on 4bit implementation for Intel CPU/XPU, and checked the results and performance. Where should I add a test about it? Thanks! |
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Thanks @jiqing-feng! We'll get back to you about this soon. |
Hi @Titus-von-Koeller . I made some new changes on this PR, it fixes the 4bit data format and align with cuda. For more details: In cuda, the 4bit value will be packed into uint8 tensor like [1, 2] will be pack to18, because 1 = 0b0001, 2 = 0b0010, and 0b00010010 is 18. We literally put the first value into the left position. In this PR, we kept the 4bit format as the same as cuda on cpu/xpu and converted the format to ipex compatible format only when initializing ipex linear. With this change, we can run a quantized model like hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4 Please let me know if I didn't make it clear. Besides, I have passed all tests in transformers. Thanks! |
XPU has performance issue, I will figure it out. |
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
The XPU issue has been fixed. |
Thanks @jiqing-feng! Really appreciate the effort to keep the serialized format compatible :) I'll take care of reviewing more closely through this week but it looks good at first pass! |
Signed-off-by: jiqing-feng <[email protected]>
f6025bc
into
bitsandbytes-foundation:multi-backend-refactor
Enable double quant on 4bit implementation for Intel CPU and XPU.