-
Hi. So first let me say that I started messing around with training Lora about 2 months ago, and have done about 50 training runs in total. This is so you can judge my current experience level :D. Not much but not a total newbie. Here is my general problem, I cannot get an extracted Lora from a finetune which can capture the subject (I am training a character with multiple outfits). I have also trained a Lora (without finetune) with the exact same dataset and captioning which turned out good (It's up on CivitAI right now). The training set and captions are the same for all trainings. 22 images manually captioned and masked. Finetune General Type 1: With about 15 training runs with mixed settings in between. tried a higher learning rate but already at 0.00002 it gets totally broken really fast, the highest LR I could put was about 0.000017 before it gets broken. Also tried training with D-ADAPT-Lion with LR at 1, same thing, it wasn't able to reproduce the outfits at all. If you have an idea what might be the problem please share. I guess I didn't think finetune training would be that different to Lora training. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Generally speaking, updating the gradients during a finetune is a destructive event. A LoRA at batch 1 is possible but incredibly ill advised (due to incredibly noisy gradients, please use BS 2 at least) because it does not touch the original model weights. Fine tuning should only be considered if you have a 24GB vram card and tens of thousands of images. If you want to try and do this with batch 1, you can try adding accumulation steps to even out the updates but this is also not advised because gradient accumulation has inferior results to plain BS and lowers perf. Using accumulation steps will also allow higher learning rates to be used without breaking the model quickly as well. For example, using accumulation steps of 10 will not update the gradients until every 10th step. There are also new optimizers coming out. The Facebook schedule free optimizer may help when ready. It seemed to be really good at learning details in my initial testing. The adaptive optimizers do not work well with finetunes with any settings I have found. Since a LoRA is empty at creation, it is much easier for the adaptive optimizers to work with them in that regard. You also may need to balance your input set to give more focus on the outfit it should be learning that it does not seem to. I tried this once on a 16GB card for SDXL, and just went back to LoRAs. I did not see the benefit and it took too long on my card and you really should use the highest quality settings when doing a finetune, where a LoRA can have decent quality even at FP8 weights. One suggestion I have seen on the discord is training a LoRA at high network rank (as you have done) and then using the kohya tools to trim it down to a smaller network rank. This can remove a lot of the extra data you may not want and lets the LoRA focus on the key aspects you trained. |
Beta Was this translation helpful? Give feedback.
Generally speaking, updating the gradients during a finetune is a destructive event. A LoRA at batch 1 is possible but incredibly ill advised (due to incredibly noisy gradients, please use BS 2 at least) because it does not touch the original model weights.
Fine tuning should only be considered if you have a 24GB vram card and tens of thousands of images.
If you want to try and do this with batch 1, you can try adding accumulation steps to even out the updates but this is also not advised because gradient accumulation has inferior results to plain BS and lowers perf. Using accumulation steps will also allow higher learning rates to be used without breaking the model quickly as well. For e…