-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM error while running LoRa #5
Comments
Try turning You can also try to lower the Your Note that all this will reduce the quality of the fine-tuned model but you will atleast have a baseline and can work from there. |
Tried it, Still getting the OOM error. I went till 8 for |
I have the same problem. I think 24GB memory is not enough for this. |
Did you try QLoRA to fine-tune? I guess quantising to LoRA weights to 4 bits might help. Another suggestion would be to use SGD optimiser instead of the AdamW. Adam optimiser maintains two states per trainable parameters requiring double the memory. Using SGD might help. You can change the optimiser in this line of code: https://github.com/ayulockin/lit-gpt/blob/b6829289f977e65c3588bbb28737986fe38f8ec1/finetune/lora.py#L154 |
I have tried both QLoRA and SGD but no luck. The 3b model runs perfectly |
@ayulockin is your A100 40GB or 80GB? |
@bmanikan which 3b model you used? |
Got hold of an A100 with 40GB memory. Getting into the same issue. I tried everything that @ayulockin suggested My parameters are
OOM error message
|
I have a 40gb A100. Is your flash attention correctly installed? |
@ayulockin I validated the setup with the command you shared and it seemed fine.
|
My Parameters are:
Error Trace is:
I went all the way down to '1' batch_size and reduced all the parameters, but still getting this OOM error.
I have RTX 4090 GPU with 24Gb Vram
Can anyone help?
The text was updated successfully, but these errors were encountered: