Skip to content

4-bit QLoRA, Paged Optimizers, and 8-bit Memory Leak Bugfix

Compare
Choose a tag to compare
@TimDettmers TimDettmers released this 20 Jun 02:50
· 417 commits to main since this release

This release brings 4-bit quantization support for QLoRA fine-tuning and a critical bugfix that doubled the memory cost of 8-bit models when they were serialized. Furthermore, paged optimizers are introduced, including 8-bit Lion.

0.39.1

Features:

  • 4-bit matrix multiplication for Float4 and NormalFloat4 data types.
  • Added 4-bit quantization routines
  • Doubled quantization routines for 4-bit quantization
  • Paged optimizers for Adam and Lion.
  • bfloat16 gradient / weight support for Adam and Lion with 8 or 32-bit states.

Bug fixes:

  • Fixed a bug where 8-bit models consumed twice the memory as expected after serialization (thank you @mryab)

Deprecated:

  • Kepler binaries (GTX 700s and Tesla K40/K80) are no longer provided via pip and need to be compiled from source. Kepler support might be fully removed in the future.