Release 4-bit QLoRA, Paged Optimizers, and 8-bit Memory Leak Bugfix · bitsandbytes-foundation/bitsandbytes

This release brings 4-bit quantization support for QLoRA fine-tuning and a critical bugfix that doubled the memory cost of 8-bit models when they were serialized. Furthermore, paged optimizers are introduced, including 8-bit Lion.

0.39.1

Features:

4-bit matrix multiplication for Float4 and NormalFloat4 data types.
Added 4-bit quantization routines
Doubled quantization routines for 4-bit quantization
Paged optimizers for Adam and Lion.
bfloat16 gradient / weight support for Adam and Lion with 8 or 32-bit states.

Bug fixes:

Fixed a bug where 8-bit models consumed twice the memory as expected after serialization (thank you @mryab)

Deprecated:

Kepler binaries (GTX 700s and Tesla K40/K80) are no longer provided via pip and need to be compiled from source. Kepler support might be fully removed in the future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4-bit QLoRA, Paged Optimizers, and 8-bit Memory Leak Bugfix

0.39.1

Contributors