Add TorchNEP: a pure-PyTorch NEP4 training framework by mushroomfire · Pull Request #1574 · brucefan1983/GPUMD

mushroomfire · 2026-06-26T13:21:25Z

Summary

This PR adds TorchNEP, a pure-PyTorch implementation of the NEP4 training framework, as a self-contained torchnep/ subproject under the GPUMD repository. It produces GPUMD-compatible nep.txt potentials and is fully interoperable with GPUMD (a model trained by TorchNEP loads and runs in GPUMD, and vice versa).

Key features:

GPUMD-compatible nep.txt output, bit-for-bit descriptor parity with GPUMD (verified against baked GPUMD references in the test suite)
Two-stage training (force-focused → energy-focused), fine-tuning, and model slimming
Single-GPU/CPU training plus data-sharded multi-GPU/multi-node training via DDP
Optional ZBL repulsion and an ASE calculator interface

Modification

Add the torchnep/ Python package (model, descriptors/ops, training, prediction, ASE calculator, neighbor search).
Add torchnep/tests/ — a pure-pytest suite (GPUMD parity, descriptors, neighbor lists, parsing, ASE), with baked reference fixtures so it runs on CPU without a GPUMD build.
Add a worked example under torchnep/example/PbTe/.
Add packaging (pyproject.toml, README.md, LICENSE, GPL-3.0-or-later) so TorchNEP can be published to PyPI.
Add two GitHub Actions workflows (scoped to torchnep/** so ordinary GPUMD PRs are unaffected):
torchnep-test.yml — runs the CPU pytest suite, triggered only when TorchNEP .py/pyproject.toml files change.
torchnep-publish.yml — builds and publishes to PyPI on a torchnep-v* tag, via PyPI Trusted Publishing.

Others

The CI tests run on CPU only and need no GPUMD binary (parity is checked against committed reference fixtures).
No changes to any existing GPUMD C++/CUDA sources, build system, or workflows; this PR is purely additive.

…o loss.out and remove the gnorm column.

Dankomaister · 2026-07-01T14:49:53Z

This is great!

I tested TorchNEP on some of my more challenging systems and the accuracy is much better than SNES-NEP, not to mention it is significantly faster to train.
What would be nice to have for TorchNEP is.

Support for validation data. (i.e. test.xyz for SNES-NEP) for use with lr_scheduler and for determining nep_best.txt. I do have some systems which show overfitting with SNES-NEP and it would be nice to detect this also in TorchNEP.
Unique training runs. As far as I can see, the descriptor parameters and NN weights are initialized randomly for each new run, but the same seed (the epoch number) is used when shuffling structures. Thus, each independent run will see the same order of structures, which is not ideal when training an ensemble for active learning.
More robust xyz reader. I have noticed that the code crashes when reading some of my xyz datasets because these happen to have a space between the energy key and the value, for example energy= 0.2151 etc.

/Daniel

mushroomfire added 8 commits June 26, 2026 16:04

update torchnep

24b7a9d

torchnep-v1.0.0

6901094

fix typo

1648474

make prediction output same with GPUMD

61557ac

Use the 6-component virial tensor convention; Add a header line (#) t…

3e9be1a

…o loss.out and remove the gnorm column.

remove qscalar test

cd925a0

analytical b1; gpumd qscalar

67d2f56

rename to qscaler; change defaults same with gpumd init

71190e2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TorchNEP: a pure-PyTorch NEP4 training framework#1574

Add TorchNEP: a pure-PyTorch NEP4 training framework#1574
mushroomfire wants to merge 8 commits into
brucefan1983:masterfrom
mushroomfire:master

mushroomfire commented Jun 26, 2026

Uh oh!

Dankomaister commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mushroomfire commented Jun 26, 2026

Uh oh!

Dankomaister commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants