ML/MM toolkit — Toward Accelerated Mechanistic Investigation of Enzymatic Reactions

Overview

Quantum mechanics/molecular mechanics (QM/MM) methods have long been used to analyze enzymatic reaction mechanisms in silico. While treating the active site with QM and the remainder with MM reduces the computational cost, the inherently high cost of the QM calculation remains a major limitation. Replacing QM with Machine Learning (ML) interatomic potentials yields ML/MM approaches that further reduce computational cost while retaining accuracy.

ML/MM toolkit is an open-source command-line toolkit for ML/MM calculations. It streamlines the workflow necessary for enzymatic reaction mechanism analyses — energy minimization, transition-state (TS) search, and vibrational analysis — to calculate the reaction free energy (ΔG) and activation free energy (ΔG^‡) for a given enzyme–substrate complex structure. A link atom boundary treatment is implemented to include amino-acid residues in the ML region, and full-system Hessians are available for TS searches and vibrational analyses. A microiteration scheme separates ML and MM degrees of freedom, enabling efficient TS optimization and Hessian-based methods for systems comprising around 10,000 atoms.

Each workflow step is also available as an individual subcommand (opt, scan, scan2d, path-search, tsopt, freq, irc, dft, energy-diagram, etc.), allowing fine-grained control over each stage. A single command can generate a first-pass enzymatic reaction path with ML/MM accuracy:

mlmm all -i R.pdb P.pdb -c PRE -l "PRE:-2"

The full workflow — ML/MM model setup → MEP search → TS optimization → IRC → thermochemistry → single-point DFT — can be run in one command:

mlmm all -i R.pdb P.pdb -c PRE -l "PRE:-2" --tsopt --thermo --dft

Given (i) two or more PDB files (R → ... → P), or (ii) one PDB with --scan-lists, or (iii) one TS candidate with --tsopt, mlmm-toolkit automatically:

extracts an active-site pocket around user-defined substrates,
assigns ONIOM regions (ML / Movable MM / Frozen MM) via B-factor encoding,
generates MM parameters (parm7/rst7) using AmberTools,
explores minimum-energy paths (MEPs) with GSM,
optionally optimizes transition states, runs vibrational analysis, IRC, and single-point DFT.

Important (prerequisites):

Input PDB files must already contain hydrogen atoms.

When providing multiple PDBs, they must contain the same atoms in the same order (only coordinates may differ).

A parm7 topology file (AmberTools) is required for MM calculations; use mlmm mm-parm to generate one.

Related tools

Tool	Use case	Repository
pdb2reaction	CLI-based reaction mechanism analysis for cluster models and small molecules	https://github.com/t-0hmura/pdb2reaction
UMA–Pysisyphus Interface	YAML-input-based reaction mechanism analysis for small molecules	https://github.com/t-0hmura/uma_pysis

mlmm-toolkit additionally automates MM force field generation and ML region assignment from a single PDB input — the all workflow requires only PDB files to run the full pipeline.

Both mlmm-toolkit and pdb2reaction include a custom GPU-optimized pysisyphus fork for geometry optimization, TS search, and IRC. This bundled fork is not compatible with the upstream pysisyphus package; do not install them side by side.

Installation

Quick install

For CUDA 12.6:

conda create -n mlmm python=3.11 -y
conda activate mlmm
conda install -c conda-forge ambertools pdbfixer -y

pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu126
pip install mlmm-toolkit
huggingface-cli login

For CUDA 12.9 (recommended for RTX 50 series):

conda create -n mlmm python=3.11 -y
conda activate mlmm
conda install -c conda-forge ambertools pdbfixer -y

pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu129
pip install mlmm-toolkit
huggingface-cli login

Previous version: v0.2.0 introduced breaking changes to the CLI interface and Hessian handling. To install the previous stable release (v0.1.1) from GitHub:
pip install git+https://github.com/t-0hmura/mlmm_toolkit.git@v0.1.1

Prerequisites

Requirement	Notes
Python >= 3.11	3.12 also supported
CUDA runtime >= 12.6	CUDA 12.9 recommended for RTX 50 series
Linux / WSL 2	—

Full setup (conda)

conda create -n mlmm python=3.11 -y
conda activate mlmm
conda install -c conda-forge ambertools pdbfixer -y

pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu129
pip install mlmm-toolkit
plotly_get_chrome -y
huggingface-cli login

If you are on an HPC cluster that uses environment modules, load CUDA before installing PyTorch:

module load cuda/12.6

fairchem-core is a core dependency. aimnet2 and other ML backends (orb-models, mace-torch) are optional extras (e.g. pip install "mlmm-toolkit[aimnet2]").

DFT single-point (`mlmm dft`)

DFT dependencies are not installed by default. To use mlmm dft, install the [dft] extra:

pip install "mlmm-toolkit[dft]"

This installs PySCF, GPU4PySCF (x86_64 only), and related CUDA libraries. Note that DFT single-point calculations are practical only for systems up to ~500 atoms in the ML region; larger systems will require prohibitive compute time and memory.

Avoid AmberTools conflicts (optional)

If AmberTools is loaded, unload it before installing to prevent conflicts for ParmEd:

module unload amber

For DMF method

conda install -c conda-forge cyipopt -y
pip install pydmf

Authenticate for UMA (Hugging Face)

UMA model weights are on Hugging Face Hub. You need to log in once (see https://github.com/facebookresearch/fairchem):

huggingface-cli login

Supported ML potentials

Potential	Repository	Install extra
UMA (default)	https://github.com/facebookresearch/fairchem	(included)
ORB	https://github.com/orbital-materials/orb-models	`pip install "mlmm-toolkit[orb]"`
MACE	https://github.com/ACEsuit/mace	see note below
AIMNet2	https://github.com/isayevlab/aimnetcentral	`pip install "mlmm-toolkit[aimnet2]"`

Note: MACE and UMA cannot coexist due to conflicting e3nn versions (fairchem-core requires e3nn>=0.5, mace-torch requires e3nn==0.4.4). To use MACE, uninstall fairchem-core first:
pip uninstall fairchem-core
pip install mace-torch
This means UMA will no longer be available in that environment. We recommend using a separate conda environment for MACE.

Preparing an Enzyme–Substrate System

Build a structural model of the complex. Download coordinates from Protein Data Bank. If an experimental structure is not available, use structure prediction programs such as AlphaFold3, docking simulation programs, or GUI software such as PyMOL.
Generate parameter/topology and coordinate files. Create .pdb, .parm7, and .rst7 files of the complex (see the OpenMM tutorial). To mimic aqueous conditions, solvate the complex, then remove water molecules beyond ~6 Å.

Note: elemental information (columns 77–78) is omitted in PDB files generated by tleap. Use mlmm add-elem-info to fix this.
Define the ML region. Use mlmm extract or any molecular viewer to select residues around the substrate:
```
mlmm extract -i complex.pdb -c PRE -r 6.0 -l "PRE:-2" -o ml_region.pdb
```
Important: atom order, residue names, and residue numbers must match between the full PDB and the ML-region PDB. (In PyMOL, tick "Original atom order" when exporting.)

Quick Examples

Full workflow (multi-structure)

mlmm all -i R.pdb P.pdb -c PRE -l "PRE:-2" \
    --tsopt --thermo --dft

Scan mode (single structure)

mlmm all -i R.pdb -c PRE -l "PRE:-2" \
    --scan-lists "[('PRE 353 O1\'','PRE 353 C3',1.2)]"

TS optimization only

mlmm tsopt -i TS_candidate_layered.pdb --parm complex.parm7 \
    -q 1 --opt-mode light

Step-by-step workflow

# 1. Generate MM parameters
mlmm mm-parm -i complex.pdb -l "PRE:-2"

# 2. Extract active-site pocket
mlmm extract -i complex.pdb -c '353' -l "PRE:-2" -r 6.0

# 3. Assign 3-layer ONIOM regions
mlmm define-layer -i complex.pdb --model-pdb pocket.pdb --radius-freeze 10.0

# 4. Optimize geometry
mlmm opt -i complex_layered.pdb --parm complex.parm7 -q 1 --opt-mode heavy

# 5. MEP search
mlmm path-search -i R_layered.pdb P_layered.pdb --parm complex.parm7 -q 1

# 6. TS optimization
mlmm tsopt -i hei.pdb --parm complex.parm7 -q 1 --opt-mode light

# 7. Frequency analysis
mlmm freq -i ts_optimized.pdb --parm complex.parm7 -q 1

# 8. IRC
mlmm irc -i ts_optimized.pdb --parm complex.parm7 -q 1

# 9. DFT single-point
mlmm dft -i optimized.pdb --parm complex.parm7 -q 1

Fully working example scripts are provided in the examples/ directory. Start with the minimal examples/toy_system/ example, then explore a realistic enzyme case in examples/methyltransferase/.

CLI Subcommands

Workflow

Subcommand	Role	Documentation
`all`	End-to-end: extraction → MEP → TS → IRC → freq → DFT	docs/all.md

Structure Preparation

Subcommand	Role	Documentation
`mm-parm`	Generate parm7/rst7 topology via AmberTools	docs/mm_parm.md
`extract`	Extract active-site pocket (cluster model)	docs/extract.md
`define-layer`	Assign 3-layer B-factor encoding	docs/define_layer.md
`add-elem-info`	Add/repair PDB element columns (77-78)	docs/add_elem_info.md
`fix-altloc`	Resolve alternate conformations	docs/fix_altloc.md

Optimization & Path Search

Subcommand	Role	Documentation
`opt`	Geometry optimization (L-BFGS or RFO)	docs/opt.md
`tsopt`	TS optimization (Dimer or RS-I-RFO)	docs/tsopt.md
`path-opt`	MEP optimization via GSM or DMF	docs/path_opt.md
`path-search`	Recursive MEP search with refinement	docs/path_search.md
`scan`	1D bond-length driven scan	docs/scan.md
`scan2d`	2D distance grid scan	docs/scan2d.md
`scan3d`	3D distance grid scan	docs/scan3d.md

Analysis

Subcommand	Role	Documentation
`freq`	Vibrational frequency analysis + thermochemistry	docs/freq.md
`irc`	IRC calculation (EulerPC)	docs/irc.md
`dft`	Single-point DFT (GPU4PySCF / PySCF)	docs/dft.md

Visualization & Export

Subcommand	Role	Documentation
`trj2fig`	Energy plot from XYZ trajectory	docs/trj2fig.md
`energy-diagram`	Energy diagram from numeric values	docs/energy_diagram.md
`oniom-export`	Generate Gaussian ONIOM / ORCA QM/MM input	docs/oniom_export.md
`oniom-import`	Import ONIOM input and reconstruct XYZ/layered PDB	docs/oniom_import.md

3-Layer System

mlmm-toolkit uses a 3-layer system encoded in PDB B-factor columns:

Layer	B-factor	Description	Hessian
ML	0.0	Active site (UMA energy/force/Hessian)	Yes
Movable MM	10.0	MM atoms allowed to move	No (by B-factor alone)
Frozen	20.0	Distant protein (coordinates fixed)	No

Covalent bonds cut at the ML/MM boundary are capped with hydrogen link atoms. The ML region can therefore include entire amino-acid side chains when necessary.

Use mlmm define-layer to assign layers automatically based on a model PDB (the ML region).

The MM layer uses an analytical Hessian force field (hessian_ff) by default. Any force field capable of generating Amber parameters (e.g., ff19SB + OPC3 + GAFF2) can be used.

Units

Interface	Energy	Distance	Force	Hessian
Core & ASE	eV	Å	eV Å^-1	eV Å^-2
PySisyphus (CLI)	Hartree	Bohr	Ha Bohr^-1	Ha Bohr^-2

Unit conversions are handled automatically inside the ML/MM calculator.

Python API

See Python API Reference for full documentation.

Three API levels are available: MLMMCore (base engine), MLMMASECalculator (ASE), and mlmm (pysisyphus Calculator).

MLMMCore (Base-Level)

from mlmm import MLMMCore

core = MLMMCore(
    input_pdb    = "complex.pdb",    # Full system PDB (protein + substrate + solvent)
    real_parm7   = "complex.parm7",  # Amber topology for the full system
    model_pdb    = "ml_region.pdb",  # ML region only (trimmed PDB)
    model_charge = -1,               # Formal charge of the ML region including link H atoms
    model_mult   = 1,                # Spin multiplicity of the ML region
)

from ase.io import read
atoms = read("structure.pdb")
results = core.compute(atoms.get_positions(), return_forces=True, return_hessian=True)

energy  = results["energy"]    # float, eV
forces  = results["forces"]    # ndarray (N, 3), eV/Å
hessian = results["hessian"]   # torch.Tensor (3N, 3N), eV/Å^2

ASE Interface

from mlmm import MLMMCore, MLMMASECalculator

core = MLMMCore(
    input_pdb="complex.pdb", real_parm7="complex.parm7",
    model_pdb="ml_region.pdb", model_charge=-1, model_mult=1,
)
calc = MLMMASECalculator(core)

atoms.calc = calc
energy = atoms.get_potential_energy()  # eV

pysisyphus YAML Workflows

mlmm pysis opt.yaml

See pysis documentation for YAML format and examples.

Notes

complex.pdb is the reference PDB used when the Amber parameters were generated, whereas structure.pdb can contain any starting geometry.

If model_charge is omitted, it defaults to 0. Always set it explicitly for charged systems.

v0.1.x parameter names (real_pdb, etc.) and mlmm_ase() are still supported with deprecation warnings.

Getting Help

mlmm --help
mlmm <subcommand> --help
mlmm <subcommand> --help-advanced

--help shows core options. --help-advanced shows the full option list including advanced parameters.

Documentation

Getting Started — Installation and first steps
Concepts — 3-layer system, ONIOM, link atoms
CLI Command Reference
YAML Schema
Troubleshooting — Common errors and fixes
Full command index: docs/index.md

Citation

If you find this work helpful for your research, please cite:

@article{ohmura2025mlmm,
  author       = {Ohmura, Takuto and Inoue, Sei and Terada, Tohru},
  title        = {ML/MM Toolkit -- Towards Accelerated Mechanistic Investigation of Enzymatic Reactions},
  year         = {2025},
  journal      = {ChemRxiv},
  doi          = {10.26434/chemrxiv-2025-jft1k},
  url          = {https://doi.org/10.26434/chemrxiv-2025-jft1k}
}

License

ML/MM toolkit is distributed under the GNU General Public License version 3 (GPL-3.0).

This software is still under development. Please use it at your own risk.

Name		Name	Last commit message	Last commit date
Latest commit History 382 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
hessian_ff		hessian_ff
mlmm		mlmm
pysisyphus		pysisyphus
scripts		scripts
tests		tests
thermoanalysis		thermoanalysis
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
THIRD_PARTY_NOTICES.txt		THIRD_PARTY_NOTICES.txt
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

ML/MM toolkit — Toward Accelerated Mechanistic Investigation of Enzymatic Reactions

Overview

Related tools

Installation

Quick install

Prerequisites

Full setup (conda)

DFT single-point (mlmm dft)

Avoid AmberTools conflicts (optional)

For DMF method

Authenticate for UMA (Hugging Face)

Supported ML potentials

Preparing an Enzyme–Substrate System

Quick Examples

Full workflow (multi-structure)

Scan mode (single structure)

TS optimization only

Step-by-step workflow

CLI Subcommands

Workflow

Structure Preparation

Optimization & Path Search

Analysis

Visualization & Export

3-Layer System

Units

Python API

MLMMCore (Base-Level)

ASE Interface

pysisyphus YAML Workflows

Getting Help

Documentation

Citation

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 8

Contributors

Uh oh!

Languages

DFT single-point (`mlmm dft`)