4-bit QLoRA, Paged Optimizers, and 8-bit Memory Leak Bugfix
This release brings 4-bit quantization support for QLoRA fine-tuning and a critical bugfix that doubled the memory cost of 8-bit models when they were serialized. Furthermore, paged optimizers are introduced, including 8-bit Lion.
0.39.1
Features:
- 4-bit matrix multiplication for Float4 and NormalFloat4 data types.
- Added 4-bit quantization routines
- Doubled quantization routines for 4-bit quantization
- Paged optimizers for Adam and Lion.
- bfloat16 gradient / weight support for Adam and Lion with 8 or 32-bit states.
Bug fixes:
- Fixed a bug where 8-bit models consumed twice the memory as expected after serialization (thank you @mryab)
Deprecated:
- Kepler binaries (GTX 700s and Tesla K40/K80) are no longer provided via pip and need to be compiled from source. Kepler support might be fully removed in the future.