Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable dequant+matmul 8bit path for Intel CPU and XPU #1484

Merged

Conversation

jiqing-feng
Copy link
Contributor

@jiqing-feng jiqing-feng commented Jan 23, 2025

Hi @Titus-von-Koeller @matthewdouglas . This feature enables dequant 8bit weight and using float matmul. It speed-up the lora finetune for 3x on XPU and 2x on CPU by the lora finetune script on llama3-8b by the command python olora_finetuning.py --base_model alokabhishek/Meta-Llama-3-8B-Instruct-bnb-8bit --init_lora_weights gaussian --seed 42 --torch_dtype bfloat16 --device_map cpu.

All tests in transformers have been passed, please review this PR. Thanks!

Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@matthewdouglas
Copy link
Member

On the mainline branch with the int8 refactoring that was done for v0.45.0, Linear8bitLt was simplified a bit. One of the optimizations made there was to only do the row-wise quantization for inference, and still do the "double quant" with row/col for training.

But with that said the decomposition into separate int8 and fp16 matmuls has some overhead and I assume this is where the unsafe operations come from, particularly when threshold!=0.

Happy to merge; we can revisit doing int8 computation in the future if needed.

@matthewdouglas matthewdouglas self-requested a review January 28, 2025 16:30
@matthewdouglas matthewdouglas merged commit 307fbd5 into bitsandbytes-foundation:multi-backend-refactor Jan 28, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants