Add unit test to show cpu compile error from foundation models #554

CompRhys · 2025-10-04T00:37:17Z

cw-tan · 2025-10-07T00:51:52Z

CI is erroring out because the ruff linter is failing btw. Anyway, @kavanase could you please try packaging on a logic node that is CPU only to see if this fixes this problem.

… for SSOT

CompRhys · 2025-10-07T01:29:42Z

Thanks for the catch. I made some tweaks to the pre-commit hooks to made it consistent with the lint in CI because I was having issues with a black autoformatting loop when I initially tried to call the pre-commit to fix the lint issue I missed.

cw-tan · 2025-10-07T01:30:48Z

Thanks! I completely forgot to update our pre-commit hooks since our migration to ruff

…it properly

kavanase · 2025-10-08T18:20:53Z

Anyway, @kavanase could you please try packaging on a logic node that is CPU only to see if this fixes this problem
Sorry for the delay! Got held up with travel.

Here's the NequIP package file from a login node without access to GPU: (too large to directly upload here; 72 Mb)
https://drive.google.com/file/d/1CR2uxLanZdorZyhYuVUcKuf9Ku1rMPvR/view?usp=sharing

CompRhys · 2025-10-08T20:43:43Z

Thanks! can these more portable models be uploaded to the endpoint for "nequip.net:mir-group/NequIP-OAM-L:0.1"?

cw-tan · 2025-10-08T20:49:17Z

I'm only guessing that packaging on CPU-only devices can fix this problem, would need to check first before we update the website links, etc

kavanase · 2025-10-08T20:55:40Z

There was an issue before with (accelerated) GPU inference when packaged on CPU-only devices though right? @cw-tan
Is that avoided now?

CompRhys · 2025-10-08T20:57:02Z

When there were similar issues with MACE in the past the solution was to make the default that it always casts the model to CPU before saving regardless of device. I don't think you need to be on a machine that specifically doesn't have access to a GPU. I am not sure if the inductor stage might change or complicate any of this

cw-tan · 2025-10-08T21:05:07Z

There was an issue before with (accelerated) GPU inference when packaged on CPU-only devices though right? @cw-tan Is that avoided now?

@kavanase Yes, that's resolved. The problem was more for Allegro, see mir-group/allegro@1b1b230

When there were similar issues with MACE in the past the solution was to make the default that it always casts the model to CPU before saving regardless of device. I don't think you need to be on a machine that specifically doesn't have access to a GPU. I am not sure if the inductor stage might change or complicate any of this

@CompRhys Good point, potentially worth sending all models to CPU before packaging in https://github.com/mir-group/nequip/blob/main/nequip/scripts/package.py . The checkpoint loading is a bit automagical since we just depend on Lightning (

nequip/nequip/model/saved_models/checkpoint.py

Line 78 in 2ccb487

lightning_module = training_module.load_from_checkpoint(checkpoint_path)

), so I'm unsure if it does some magic device handling when loading. Regardless, hopefully just manual .to("cpu") in the package script is enough.

CompRhys added 2 commits October 3, 2025 20:36

test: add unit test to show cpu compile error

02aa159

fea: extend test to also check torchscript compile

8054682

CompRhys added 3 commits October 6, 2025 21:20

lint: delete extra line swap pre-commit to ruff

f0b0c70

lint: add yamllint to pre-commit, swap test linitng to use pre-commit…

d63d0ed

… for SSOT

lint: newline changes from precommit run --all-files

82f8e51

fix: split out lint action so we don't need to deal with installing g…

128764f

…it properly

CompRhys force-pushed the main branch from 10000d1 to 128764f Compare October 7, 2025 01:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add unit test to show cpu compile error from foundation models #554

Add unit test to show cpu compile error from foundation models #554

CompRhys commented Oct 4, 2025

Uh oh!

cw-tan commented Oct 7, 2025

Uh oh!

CompRhys commented Oct 7, 2025

Uh oh!

cw-tan commented Oct 7, 2025

Uh oh!

kavanase commented Oct 8, 2025

Uh oh!

CompRhys commented Oct 8, 2025

Uh oh!

cw-tan commented Oct 8, 2025

Uh oh!

kavanase commented Oct 8, 2025

Uh oh!

CompRhys commented Oct 8, 2025

Uh oh!

cw-tan commented Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add unit test to show cpu compile error from foundation models #554

Are you sure you want to change the base?

Add unit test to show cpu compile error from foundation models #554

Conversation

CompRhys commented Oct 4, 2025

Uh oh!

cw-tan commented Oct 7, 2025

Uh oh!

CompRhys commented Oct 7, 2025

Uh oh!

cw-tan commented Oct 7, 2025

Uh oh!

kavanase commented Oct 8, 2025

Uh oh!

CompRhys commented Oct 8, 2025

Uh oh!

cw-tan commented Oct 8, 2025

Uh oh!

kavanase commented Oct 8, 2025

Uh oh!

CompRhys commented Oct 8, 2025

Uh oh!

cw-tan commented Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants