-
Notifications
You must be signed in to change notification settings - Fork 185
Add unit test to show cpu compile error from foundation models #554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
CI is erroring out because the ruff linter is failing btw. Anyway, @kavanase could you please try packaging on a logic node that is CPU only to see if this fixes this problem. |
|
Thanks for the catch. I made some tweaks to the pre-commit hooks to made it consistent with the lint in CI because I was having issues with a black autoformatting loop when I initially tried to call the pre-commit to fix the lint issue I missed. |
|
Thanks! I completely forgot to update our pre-commit hooks since our migration to ruff |
Here's the |
|
Thanks! can these more portable models be uploaded to the endpoint for "nequip.net:mir-group/NequIP-OAM-L:0.1"? |
|
I'm only guessing that packaging on CPU-only devices can fix this problem, would need to check first before we update the website links, etc |
|
There was an issue before with (accelerated) GPU inference when packaged on CPU-only devices though right? @cw-tan |
|
When there were similar issues with MACE in the past the solution was to make the default that it always casts the model to CPU before saving regardless of device. I don't think you need to be on a machine that specifically doesn't have access to a GPU. I am not sure if the inductor stage might change or complicate any of this |
@kavanase Yes, that's resolved. The problem was more for Allegro, see mir-group/allegro@1b1b230
@CompRhys Good point, potentially worth sending all models to CPU before packaging in https://github.com/mir-group/nequip/blob/main/nequip/scripts/package.py . The checkpoint loading is a bit automagical since we just depend on Lightning (
.to("cpu") in the package script is enough.
|
See #553