NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning

This is the code repository for the paper "NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning"(ICML 2023).

Main Idea

We proposed to create a light-weighted pretrained language model by clustering sub-MLPs into centroids that could be restored as a compressed MLP, which well approximates the NTK (neural tangent kernel) of the original MLP. We validated our method on both natural language understanding (NLU) and generation (NLG) tasks.

Run Code

NLU Tasks

Navigate to "scripts/run_glue.sh", and edit the parameters, including but limited to:

"model_name_or_path": select a model
"task_name": NLU benchmark's name (e.g. "sst2", "mnli").
"distill": whether to enable distillation.
"max_seq_length": maximum sequence length(to be padded/truncated to).
"ffn_mode": chosen reduction/sketching method (eg. "sketch", "cluster", "mmd").
"sketch_layers": MLP layers that the sketching/reduction methods applied to.
"ffn_bn": feedforward network's bottleneck layer's dimension.
"mid_dim": intermediate dimension.
"re_init_layers": chosen re-initialize layers.
"seed": chosen random seed.
"num_train_epochs": number of training epochs.
"learning_rate": chosen learning rate.
"metric_for_best_model": metric for the chosen NLU benchmark.

Then, type into command prompt:

sh run_glue.sh

It will run the "run_glue.py" with the set of modified configurations, which validates the chosen reduction/sketching method on the selected NLU benchmark.

NLG Tasks

To choose a set of configurations for the task, navigate to the file "nlg/scripts/run_nlg.sh". Within this file, you can choose to use any of configurations available in "nlg/configs" by modifying the parameters. In case you want to create your own set of configurations, you can do so by creating a new .json file within the "nlg/configs" directory. This way, you can customize the configurations according to your specific requirements.

Some important parameters in configurations:

"seed": random seed.
"task": task to be executed.
"model_name": model's name.
"n_epochs": number of training epochs.
"train_batch_size/valid_batch_size": batch size for training/validation.
"lr": learning rate.
"ffn_mode": chosen reduction/sketching method (e.g. "sketch", "cluster", "mmd").
"sketch_layers": MLP layers that the sketching/reduction methods applied to.
"ffn_bn": feedforward network's bottleneck layer's dimension.
"mid_dim": intermediate dimension.

After adding configurations, simply execute the task by typing into command prompt:

sh run_nlg.sh

It will run "train.py" and "evaluate.py" with whatever configurations assigned to trainings and validations.

MoE

For switch transformers, please refer to ./switch, and execute the code similarly through 'sft_run.sh'.

Citation

Please kindly cite our paper if you find the code/paper useful.

@InProceedings{pmlr-v202-wei23b,
  title = 	 {NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning},
  author =       {Wei, Tianxin and Guo, Zeming and Chen, Yifan and He, Jingrui},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {36821--36838},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
figs		figs
models/roberta		models/roberta
nlg		nlg
petl		petl
scripts		scripts
switch		switch
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning

Main Idea

Run Code

NLU Tasks

NLG Tasks

MoE

Citation

About

Releases

Packages

Contributors 3

Languages

weitianxin/MLP_Fusion

Folders and files

Latest commit

History

Repository files navigation

NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning

Main Idea

Run Code

NLU Tasks

NLG Tasks

MoE

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages