Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix type error in extract_lora.py : SVD only supports float32 #510

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

goliaro
Copy link

@goliaro goliaro commented Feb 8, 2025

Running extract_lora.py on any model that does not use float32 as its default tensor type currently results in a RuntimeError. This is a significant limitation, as many recent models default to bfloat16 or float16 (half precision). The error arises from the use of the torch.linalg.svd API, which only supports torch.float32. This pull request (PR) addresses the issue by converting tensors to float32 (full precision) before executing torch.linalg.svd, and then converting them back to their original data type afterwards.

@goliaro goliaro changed the title Fix type error in extract_lora.py : SVD only supports float32 Fix type error in extract_lora.py : SVD only supports float32 Feb 8, 2025
@goliaro
Copy link
Author

goliaro commented Feb 17, 2025

@cg123

@David-AU-github
Copy link

David-AU-github commented Feb 20, 2025

@goliaro @cg123

Thank you - this worked perfectly ;

Quick update: This works great, but on CPU I get Blue screen of Death* after running it (overheat). Compute is too high.
I have a high end/multi core cpu with "eco cores" - but it does have drawbacks.
(note: I can run on GPU/ --cuda => works fine this way)

With Llamacpp + Quantizing -> I set the max number of "cores" to use, this fixes the issue.

Would be great if Mergekit has this option - if it does , please advise . I find slight differences between CPU / GPU "math" , with CPU preferred , even if it takes longer.

This would be great especially on "mergekit-yaml" ; which I need to use "--cuda" - otherwise "too many cores" activate on CPU => Blue screen of death*. (the other option: pause build ... cool ... continue... cool... continue)

Example: When making MOEs I usually do these in float32 because I have found the MOEs operate between when source and "master gguf" are both in float 32 regardless of MOE source models precision.

Also; is there any doc(s) about LORA Extract's new options?

I found the "save" and a few others but the tech papers about " --distribute-scale " && " --sv-epsilon FLOAT " are ahh... really hard to "gauge" if I should use/set them.

  • I have a detailed hardware monitor which helped trace down this issue. I know this is likely computer specific, and if there is a way to set limits on number of cpu cores to activate aside from program specific - please advise.

Thanks - Mergekit is fantastic. 1000+ models built and counting with it...

Copy link


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


1 out of 2 committers have signed the CLA.
✅ (cg123)[https://github.com/cg123]
@goliaro
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants