This project proposes an ingenious scheme, leveraging the benefits of Delta-LoRA (a modified version of LoRA: Low Rank Adaptation) and LC-checkpoint, which is a checkpointing scheme. This innovative framework aims to facilitate the training of deep neural networks by creating compressed checkpoints. These checkpoints allow the training process to resume from the last saved state in the event of failures (such as gradient explosion or division by zero), thus saving time and computational resources.
The code is driven by two main ambitions:
- To create a framework that supports the training of deep neural networks with the ability to create compressed checkpoints, enabling the resumption of fine-tuning without starting from scratch in case of failures.
- To establish a framework that also prevents data poisoning; thus, if malicious data is detected, training can resume from a model checkpoint that was last trained on clean data.
For training your models, you will need to download and install the datasets. Below are the steps to download the Stanford Cars Dataset from Kaggle and set it up in your environment.
Download the Dataset:
- Visit the Kaggle page for the Stanford Cars Dataset: Stanford Cars Dataset.
- Decompress the dataset and name it
data_stanfordcars
.
For a fresh environment setup, follow these steps:
- Install the latest version of Python.
- Install the latest version of Visual Studio Code.
- Install Python extension, Jupyter Notebook on Visual Studio Code.
- Install Anaconda.
- Open a terminal and run the following commands:
conda create --name py310 python=3.10 conda activate py310 conda install cudatoolkit -c anaconda -y nvidia-smi conda install pytorch-cuda=11.8 -c pytorch -c nvidia -y conda install pytorch torchvision torchaudio -c pytorch -c nvidia -y pip install pandas scipy matplotlib pathos wandb
- These installation steps are primarily for Windows but can be easily adapted for Linux and macOS by modifying the commands accordingly.
Alternatively, you can use the predefined environment file to set up your environment more quickly:
- Clone the repository from GitHub.
- Open a terminal in the cloned repository directory.
- Run the following command:
conda env create -f environment.yml conda activate py310
- This will create a new conda environment named
py310
and install all the necessary packages.
Once installed, you can run the scripts inside the project directory to start the training process and utilize the checkpointing mechanisms.
- Link to the project report in French
- Link to the project report in English
- Access to the slides presentation in French
Kindly be aware that the code has been crafted with maximum flexibility in mind. Nevertheless, there's a possibility that you may need to customize it to suit your particular use case and circumstances.
- bryan [dot] chen [at] etu [dot] toulouse-inp [dot] com / t0934135 [at] u [dot] nus [dot] edu
This project was built with guidance and support from:
- Assoc Prof Ooi Wei Tsang (NUS)
- Asst Prof Axel Carlier (INP-ENSEEIHT)
- PhD Student Yannis Montreuil (UPMC, Sorbonne University)
- Scientist Lai Xing Ng (A*STAR Institute for Infocomm Research)
Special thanks to CNRS@Create for supporting this research project.
We express our gratitude to all their contributors and maintainers!
-
Yu Chen, Zhenming Liu, Bin Ren & Xin Jin's On Efficient Construction of Checkpoints.
-
Shuyu Zhang, Donglei Wu, Haoyu Jin, Xiangyu Zou, Wen Xia & Xiaojia Huang's QD-Compressor: a Quantization-based Delta Compression Framework for Deep Neural Networks
-
Amey Agrawal, Sameer Reddy, Satwik Bhattamishra, Venkata Prabhakara Sarath Nookala, Vidushi Vashishth, Kexin Rong & Alexey Tumanov's DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
-
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang & Weizhu Chen's LoRA: Low-Rank Adaptation of Large Language Models
-
Bojia Zi, Xianbiao Qi, Lingzhi Wang, Jianan Wang, Kam-Fai Wong, Lei Zhang's Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices