Update dependency peft to v0.18.0 #61
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==0.3.0->==0.18.0Warning
Some dependencies could not be looked up. Check the warning logs for more information.
Release Notes
huggingface/peft (peft)
v0.18.0: 0.18.0: RoAd, ALoRA, Arrow, WaveFT, DeLoRA, OSF, and moreCompare Source
Highlights
FIXME update list of all changes, so some more commits were added
New Methods
RoAd
@ppetrushkov added RoAd: 2D Rotary Adaptation to PEFT in #2678. RoAd learns 2D rotation matrices that are applied using only element-wise multiplication, thus promising very fast inference with adapters in unmerged state.
Remarkably, besides LoRA, RoAd is the only PEFT method that supports mixed adapter batches. This means that when you have loaded a model with multiple RoAd adapters, you can use all of them for different samples in the same batch, which is much more efficient than switching adapters between batches:
ALoRA
Activated LoRA is a technique added by @kgreenewald in #2609 for causal language models, allowing to selectively enable LoRA adapters depending on a specific token invocation sequence in the input. This has the major benefit of being able to re-use most of the KV cache during inference when the adapter is only used to generate part of the response, after which the base model takes over again.
Arrow & GenKnowSub
@TheTahaaa contributed not only support for Arrow, a dynamic routing algorithm between multiple loaded LoRAs in #2644, but also GenKnowSub, a technique built upon Arrow where the 'library' of LoRAs available to Arrow is first modified by subtracting general knowledge adapters (e.g., trained on subsets of Wikipedia) to enhance task-specific performance.
WaveFT
Thanks to @Bilican, Wavelet Fine-Tuning (WaveFT) was added to PEFT in #2560. This method trains sparse updates in the wavelet domain of residual matrices, which is especially parameter efficient. It is very interesting for image generation, as it promises to generate diverse outputs while preserving subject fidelity.
DeLoRA
Decoupled Low-rank Adaptation (DeLoRA) was added by @mwbini in #2780. This new PEFT method is similar to DoRA in so far as it decouples the angle and magnitude of the learned adapter weights. However, DeLoRA implements this in a way that promises to better prevent divergence. Moreover, it constrains the deviation of the learned weight by imposing an upper limit of the norm, which can be adjusted via the
delora_lambdaparameter.OSF
Orthogonal Fine-Tuning (OSF) was added by @NikhilNayak-debug in #2685. By freezing the high-rank subspace of the targeted weight matrices and projecting gradient updates to a low-rank subspace, OSF achieves good performance on continual learning tasks. While it is a bit memory intensive for standard fine-tuning processes, it is definitely worth checking out on tasks where performance degradation of previously learned tasks is a concern.
Enhancements
Text generation benchmark
In #2525, @ved1beta added the text generation benchmark to PEFT. This is a framework to determine and compare metrics with regard to text generation of different PEFT methods, e.g. runtime and memory usage. Right now, this benchmark is still lacking experimental settings and a visualization, analogous to what we have in the MetaMathQA benchmark. If this is something that interests you, we encourage you to let us know or, even better, contribute to this benchmark.
Reliable interface for integrations
PEFT has integrations with other libraries like Transformers and Diffusers. To facilitate this integration, PEFT now provides a stable interface of functions that should be used if applicable. For example, the
set_adapterfunction can be used to switch between PEFT adapters on the model, even if the model is not aPeftModelinstance. We commit to keeping these functions backwards compatible, so it's safe for other libraries to build on top of those.Handling of weight tying
Some Transformers models can have tied weights. This is especially prevalent when it comes to the embedding and the LM head. Currently, the way that this is handled in PEFT is not obvious. We thus drafted an issue to illustrate the intended behavior in #2864. This shows what our goal is, although not everything is implemented yet.
In #2803, @romitjain added the
ensure_weight_tyingargument toLoraConfig. This argument, if set toTrue, enforces weight tying of the modules targeted withmodules_to_save. Thus, if embedding and LM head are tied, they will share weights, which is important to allow, for instance, weight merging. Therefore, for most users, we recommend to enable this setting if they want to fully fine-tune the embedding and LM head. For backwards compatability, the setting is off by default though.Note that in accordance with #2864, the functionality of
ensure_weight_tying=Truewill be expanded to also include trainable tokens (#2870) and LoRA (tbd.) in the future.Support Conv1d and 1x1 Conv2 layers in LoHa and LoKr
@grewalsk extended LoHa and LoKr to support
nn.Conv1dlayers, as well asnn.Conv2dwith 1x1 kernels, in #2515.New prompt tuning initialization
Thanks to @macmacmacmac, we now have a new initialization option for prompt tuning, random discrete initialization (#2815). This option should generally work better than random initialization, as corroborated on our PEFT method comparison suite. Give it a try if you use prompt tuning.
Combining LoRA adapters with negative weights
If you use multiple LoRA adapters, you can merge them into a single adapter using
model.add_weighted_adapter. However, so far, this only worked with positive weights per adapter. Thanks to @sambhavnoobcoder and @valteu, it is now possible to pass negative weights too.Changes
Transformers compatibility
At the time of writing, the Transformers v5 release is imminent. This Transformers version will be incomptabile with PEFT < 0.18.0. If you plan to use Transformers v5 with PEFT, please upgrade PEFT to 0.18.0+.
Python version
This PEFT version no longer supports Python 3.9, which has reached its end of life. Please use Python 3.10+.
Updates to OFT
The OFT method has been updated to make it slightly faster and to stabilize the numerics in #2805. This means, however, that existing checkpoints may give slightly different results after upgrading to PEFT 0.18.0. Therefore, if you use OFT, we recommend to retrain the adapter.
All Changes
hub_online_oncein trainable token tests by @githubnemo in #2701toissue for 8-bit model by @yao-matrix in #2797trainable_token_indicesforlm_headby @aflueckiger in #2863max_lengthto replacemax_seq_length; correct README for by @kaixuanliu in #2862New Contributors
Full Changelog: huggingface/peft@v0.17.1...v0.18.0
v0.17.1: 0.17.1Compare Source
This patch release contains a few fixes (via #2710) for the newly introduced
target_parametersfeature, which allows LoRA to targetnn.Parameters directly (useful for mixture of expert layers). Most notably:model.add_adapterormodel.load_adapter) did not work correctly. Since a solution is not trivial, PEFT now raises an error to prevent this situation.v0.17.0: 0.17.0: SHiRA, MiSS, LoRA for MoE, and moreCompare Source
Highlights
New Methods
SHiRA
@kkb-code contributed Sparse High Rank Adapters (SHiRA, paper) which promise to offer a potential gain in performance over LoRAs - especially the concept loss when using multiple adapters is improved. Since the adapters only train on 1-2% of the weights and are inherently sparse, switching between adapters may be cheaper than with LoRAs. (#2584)
MiSS
@JL-er added a new PEFT method, MiSS (Matrix Shard Sharing) in #2604. This method is an evolution of Bone, which, according to our PEFT method comparison benchmark, gives excellent results when it comes to performance and memory efficiency. If you haven't tried it, you should do so now.
At the same time, Bone will be deprecated in favor of MiSS and will be removed in PEFT v0.19.0. If you already have a Bone checkpoint, you can use
scripts/convert-bone-to-miss.pyto convert it into a MiSS checkpoint and proceed with training using MiSS.Enhancements
LoRA for
nn.ParameterLoRA is now able to target
nn.Parameterdirectly (#2638, #2665)! Ever had this complicatednn.Modulewith promising parameters inside but it was too custom to be supported by your favorite fine-tuning library? No worries, now you can targetnn.Parametersdirectly using thetarget_parametersconfig attribute which works similarly totarget_modules.This option can be especially useful for models with Mixture of Expert (MoE) layers, as those often use
nn.Parameters directly and cannot be targeted withtarget_modules. For example, for the Llama4 family of models, use the following config to target the MoE weights:Note that this feature is still experimental as it comes with a few caveats and therefore might change in the future. Also, MoE weights with many experts can be quite huge, so expect a higher memory usage than compared to targeting normal
nn.Linearlayers.Injecting adapters based on a
state_dictSometimes, it is possible that there is a PEFT adapter checkpoint but the corresponding PEFT config is not known for whatever reason. To inject the PEFT layers for this checkpoint, you would usually have to reverse-engineer the corresponding PEFT config, most notably the
target_modulesargument, based on thestate_dictfrom the checkpoint. This can be cumbersome and error prone. To avoid this, it is also possible to callinject_adapter_in_modeland pass the loadedstate_dictas an argument:Find more on
state_dictbased injection in the docs.Changes
Compatibility
A bug in prompt learning methods caused
modules_to_saveto be ignored. Especially classification tasks are affected since they usually add the classification/score layer tomodules_to_save. In consequence, these layers were neither trained nor stored after training. This has been corrected now. (#2646)All Changes
New Contributors
Full Changelog: huggingface/peft@v0.16.0...v0.17.0
v0.16.0: 0.16.0: LoRA-FA, RandLoRA, C³A, and much moreCompare Source
Highlights
New Methods
LoRA-FA
In #2468, @AaronZLT added the LoRA-FA optimizer to PEFT. This optimizer is based on
AdamWand it increases memory efficiency of LoRA training. This means that you can train LoRA with less memory, or, with the same memory budget, use higher LoRA ranks, potentially getting better results.RandLoRA
Thanks to @PaulAlbert31, a new PEFT method called
RandLoRAwas added to PEFT (#2464). Similarly to VeRA, it uses non-learnable random low rank matrices that are combined through learnable matrices. This way, RandLoRA can approximate full rank updates of the weights. Training models quantized with bitsandbytes is supported.C³A
@Phoveran added Circular Convolution Adaptation, C3A, in #2577. This new PEFT method can overcome the limit of low rank adaptations as seen e.g. in LoRA while still promising to be fast and memory efficient.
Enhancements
Thanks to @gslama12 and @SP1029, LoRA now supports
Conv2dlayers withgroups != 1. This requires the rankrbeing divisible bygroups. See #2403 and #2567 for context.@dsocek added support for Intel Neural Compressor (INC) quantization to LoRA in #2499.
DoRA now supports
Conv1dlayers thanks to @EskildAndersen (#2531).Passing
init_lora_weights="orthogonal"now enables orthogonal weight initialization for LoRA (#2498).@gapsong brought us Quantization-Aware LoRA training in #2571. This can make QLoRA training more efficient, please check the included example. Right now, only GPTQ is supported.
There has been a big refactor of Orthogonal Finetuning, OFT, thanks to @zqiu24 (#2575). This makes the PEFT method run more quickly and require less memory. It is, however, incompatible with old OFT checkpoints. If you have old OFT checkpoints, either pin the PEFT version to
<0.16.0or retrain it with the new PEFT version.Thanks to @keepdying, LoRA hotswapping with compiled models no longer leads to CUDA graph re-records (#2611).
Changes
Compatibility
required_grads_ofmodules_to_saveis now set toTruewhen used directly withinject_adapter. This is relevant for PEFT integrations, e.g. Transformers or Diffusers.vlm.language_model, it will no longer work, please apply it tovlmdirectly (see #2554 for context). Morever, the refactor results in different checkpoints. We managed to ensure backwards compatability in PEFT, i.e. old checkpoints can be loaded successfully. There is, however, no forward compatibility, i.e. loading checkpoints trained after the refactor is not possible with package versions from before the refactor. In this case, you need to upgrade PEFT and transformers. More context in #2574.<0.16.0and<4.52.0, respectively).All Changes
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about these updates again.
To execute skipped test pipelines write comment
/ok-to-test.Documentation
Find out how to configure dependency updates in MintMaker documentation or see all available configuration options in Renovate documentation.