Skip to content

Conversation

@PythonFZ
Copy link
Contributor

@PythonFZ PythonFZ commented Sep 23, 2025

This PR adds the MLIP/Water Tutorial.
It is spilt into three parts

  1. TIP4P Water by @hidekb
  2. MLIP Setup and evaluation by @PythonFZ
  3. MLIP MD by @hidekb

For testing until merged, you can use

git clone https://github.com/PythonFZ/espresso.git
cd espresso
git checkout mlip-water
cd doc/tutorials/mlip-water/
# use your favorite way of installation, e.g. uv sync / pip install . or pre-installed
dvc pull

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@RudolfWeeber
Copy link
Contributor

Thank you! It looks very good.

A few points from an initial quick read:

Part 1

  • considr using GPU P3M, if a GPu is available.
  • maybe extend description of how the rigid body with Virtual Sites works

Part 2

  • For newcomers, some of the terminology might not be fully clear, e.g., hyperparameter, descriptor...
  • Maybe it can be clarified what "true energy" erfers to, i.e., make more explicit that this refers to the result from qm calcuations on which the model was trained
  • Eventhough the models are pre-trained, can one show some samples from the training data visually?
  • What is the role of zntrack?

Part 3

  • Maybem we can make it clear that this script in essence works for other MLIPs, as long as they provide an ASE calculator
  • Explain thaat ase.integrate() assignes to the particles' ext_force. This means that
    • this attribute is overwritten
    • but otherwise, all ESPResSo interactions can be used in combination with the ASe calculator

@@ -0,0 +1,583 @@
{
Copy link
Contributor

@schlaicha schlaicha Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not be more readable/less error prone to get the constants from scipy.constants?

And if you use pint and units everywhere, also the lengths should get there units IMHO (and for the charges and masses below). Alternatively (maybe preferred), add the explanation that these constants are defined in order to convert to the unit-less quantities below - as it is now it is not clear intuitively which quantities have or don't have units.


Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified this cell for using Scipy.constants and added "these constants are defined in order to convert to the unit-less quantities below" below the cell.

@@ -0,0 +1,583 @@
{
Copy link
Contributor

@schlaicha schlaicha Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #4.    def rigid_constrain(p, C):

Why C? This relates two particle positions, right? Maybe center would be more instructive?


Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the argument C to center_mol becuase center is already defined.

@@ -0,0 +1,583 @@
{
Copy link
Contributor

@schlaicha schlaicha Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typiically the hydrogens have no LJ interaction, but for example in the CHARMM compatible versions they do. To be fair I would simply add the typical in the sentence.


Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added typical in front of TIP4P.

@@ -0,0 +1,583 @@
{
Copy link
Contributor

@schlaicha schlaicha Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.        epsilon=OO_EPSILON, sigma=OO_SIGMA, cutoff=0.85, shift="auto"

This is the original cutoff, and water properties at this small cutoff are very sensitive


Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified cutoff from 2.5 * OO_SIGMA to 0.85.

@@ -0,0 +1,583 @@
{
Copy link
Contributor

@schlaicha schlaicha Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #3.    system.thermostat.set_langevin(kT=1, gamma=1, seed=1)

This will not give the correct dynamics. I see that we have no other choices for the thermostat, but at least we should tune the friction coefficient to match the dynamics of TIP4P/2005, $k_\mathrm{B}T/\gamma = D = 2.08 \cdot 10^9 \mathrm{m^2/s)$?

Would this yield the desired behavior?


Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this tutorial, we only discuss the radial distribution function. Then, for fast equilibrium, we set gamma=1.

@@ -0,0 +1,583 @@
{
Copy link
Contributor

@schlaicha schlaicha Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #1.    # replace singel O particle by molecules

comment makes no sense to me


Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it.

@@ -0,0 +1,583 @@
{
Copy link
Contributor

@schlaicha schlaicha Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #5.    r_max = 0.7  # nm

Why not L/2? Then this looks much more familar...

Correspondingly

rdf_bins = int(L/2 * 100)

Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this tutorial, all RDFs are compared with each other. For the third part, the maximum length for RDF is 7 angstroms since the system size is 14 angstroms. To match that length, we set it to 7 angstroms.

@@ -0,0 +1,583 @@
{
Copy link
Contributor

@schlaicha schlaicha Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #5.    data = np.load("rdf_experiment.npz")

File is missing, so I cannot judge on the quality of the rdf...


Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After using $dvc pull, the necessary data is provided.

@schlaicha
Copy link
Contributor

schlaicha commented Sep 25, 2025

I now had a look at the first part, this is looking very impressive to me (see my comments above)!

I just did a slightly longer (100x) run to test the pressure in the simulation, at least with the current friction coefficient (see my comment above) it seems reasonable: $$313 \pm 45$$ bar, which seems very plausible to me...

I will look into parts 2 and 3 tomorrow.

hidekb and others added 11 commits September 25, 2025 21:52
Fixes espressomd#5171, fixes espressomd#5035

- Remove unused/unnecessary code
- Fix zndraw interface not respecting PBC
- Introduce bonds drawing
- Upgrade zndraw to v0.5.11
Add a single-step symplectic Euler integrator to propagate particles with time-dependent external forces.

Co-authored-by: Hideki Kobayashi <[email protected]>
Description of changes:

- remove communication overhead in ParticleSlice setters

Co-authored-by: Jean-Noël Grad <[email protected]>
@@ -0,0 +1,742 @@
{
Copy link
Contributor

@schlaicha schlaicha Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one should add a hint on the unit, rmax is in Angstrom?

Also, concerning the n_radial, I would add the explanation from below: "(...) in message passing, which affects angular resolution"


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the note on units to the top and replaced the description ( it was wrong before, thanks for highlighting!)

@@ -0,0 +1,742 @@
{
Copy link
Contributor

@schlaicha schlaicha Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #12.    ax[1].set_ylabel("Density")

Should also be labeled 'counts' in order not to confuse.


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using density for the forces, because there are so many force component but keep the non-normalized histogram for the energies, because the numbers still make sense to the reader.

@PythonFZ
Copy link
Contributor Author

In the ASE interface, we are setting the calculator in a loop.

for _ in tbar:
    atoms = system.ase.get()
    atoms.calc = box.calc

The .calc is a property (https://gitlab.com/ase/ase/-/blob/master/ase/atoms.py#L396) and there might be a performance consideration here.

@PythonFZ
Copy link
Contributor Author

@PicoCentauri we are using the GMNN1 as implemented in APAX2

Footnotes

  1. https://pubs.acs.org/doi/10.1021/acs.jctc.0c00347

  2. https://pubs.acs.org/doi/full/10.1021/acs.jcim.5c01221

@schlaicha
Copy link
Contributor

Hi,

I had another look and all my points seem well adressed. I just pushed quite a few of typo corrections and didactic rephrasings. From my side this is finished and can be used as tutorial!

A few open points:

  • Concerning the CUDA support for P3M in the first parg suggested by @RudolfWeeber , the way it is implemented now always demands CUDA, so the code will fail without CUDA installed - I think this should be a switch checking if CUDA is available and then using it.

  • I don't understand the last paragraph in the third tutorial part,

    Because our MLIP learns all interactions, it can happen that minimum bond distances are not obeyed, (...)

    I guess you are refering to the data $$r_\text{max} = 2\text{\AA}?$$
    Then I would simply conclude that such a small cutoff misses the fundamental interactions and therefore produces nonphysical results, leaving out the rest of the discussion.

    this can be mitigated by including short-range two-body

    why would you want to do that?

@PythonFZ
Copy link
Contributor Author

PythonFZ commented Oct 2, 2025

I can shorten this, but I think it includes some valuable information

Because our MLIP learns all interactions, it can happen that minimum bond distances are not obeyed, leading to unphysical behavior in the simulation that are reflected in the RDF plots.

I think this highlights again, that there are e.g. no springs between particles and if not enough information is given, the system can blow up. Maybe including another sentence to explicitly refer to the 2 A data might help?

Although this can be mitigated by including short-range two-body potential terms, fundamentally, if such behavior occurs, the model parameters or training data should be improved.

Including the Ziegler-Biersack-Littmark (ZBL) short-range correction potential is common practice in many MLIPs but for our model, we have seen that including these correction terms might not necessarily improve the model accuracy.

If this is information is to coarse and doesn't really benefit the model, I'll shorten the section as you've suggested?

@schlaicha
Copy link
Contributor

If this is information is to coarse and doesn't really benefit the model, I'll shorten the section as you've suggested?

Yes, I think either you need to provide more explanations (like the one you provided above) or simplify the section...

@PythonFZ PythonFZ requested a review from schlaicha October 6, 2025 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants