This week's paper is QLoRA: Efficient Finetuning of Quantized LLMs. QLoRA introduced a way to save memory using quantization in LoRA training. They also introduce the Guanaco family of models, and do some analyses on fine-tuning data.
Further Reading:
- LoRA: Low-Rank Adaptation of Large Language Models - finetuning with adapters if you missed it ([arxiv.org/abs/2106.09685)
- Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning - another LoRA memory effiencient technique removing need to store activations.
- LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models - better initalization for quantized LoRA finetuning
- Accurate LoRA-Finetuning Quantization of LLMs via Information Retention - improve quantized LLMs with LoRA to be highly accurate through information retention