Quantum mechanics/molecular mechanics (QM/MM) methods have long been used to analyze enzymatic reaction mechanisms in silico. While treating the active site with QM and the remainder with MM reduces the computational cost, the inherently high cost of the QM calculation remains a major limitation. Replacing QM with Machine Learning (ML) interatomic potentials yields ML/MM approaches that further reduce computational cost while retaining accuracy.
ML/MM toolkit is an open-source command-line toolkit for ML/MM calculations. It streamlines the workflow necessary for enzymatic reaction mechanism analyses — energy minimization, transition-state (TS) search, and vibrational analysis — to calculate the reaction free energy (ΔG) and activation free energy (ΔG‡) for a given enzyme–substrate complex structure. A link atom boundary treatment is implemented to include amino-acid residues in the ML region, and full-system Hessians are available for TS searches and vibrational analyses. A microiteration scheme separates ML and MM degrees of freedom, enabling efficient TS optimization and Hessian-based methods for systems comprising around 10,000 atoms.
Each workflow step is also available as an individual subcommand (opt, scan, scan2d, path-search, tsopt, freq, irc, dft, energy-diagram, etc.), allowing fine-grained control over each stage. A single command can generate a first-pass enzymatic reaction path with ML/MM accuracy:
mlmm all -i R.pdb P.pdb -c PRE -l "PRE:-2"The full workflow — ML/MM model setup → MEP search → TS optimization → IRC → thermochemistry → single-point DFT — can be run in one command:
mlmm all -i R.pdb P.pdb -c PRE -l "PRE:-2" --tsopt --thermo --dftGiven (i) two or more PDB files (R → ... → P), or (ii) one PDB with --scan-lists, or (iii) one TS candidate with --tsopt, mlmm-toolkit automatically:
- extracts an active-site pocket around user-defined substrates,
- assigns ONIOM regions (ML / Movable MM / Frozen MM) via B-factor encoding,
- generates MM parameters (parm7/rst7) using AmberTools,
- explores minimum-energy paths (MEPs) with GSM,
- optionally optimizes transition states, runs vibrational analysis, IRC, and single-point DFT.
Important (prerequisites):
- Input PDB files must already contain hydrogen atoms.
- When providing multiple PDBs, they must contain the same atoms in the same order (only coordinates may differ).
- A parm7 topology file (AmberTools) is required for MM calculations; use
mlmm mm-parmto generate one.
| Tool | Use case | Repository |
|---|---|---|
| pdb2reaction | CLI-based reaction mechanism analysis for cluster models and small molecules | https://github.com/t-0hmura/pdb2reaction |
| UMA–Pysisyphus Interface | YAML-input-based reaction mechanism analysis for small molecules | https://github.com/t-0hmura/uma_pysis |
mlmm-toolkit additionally automates MM force field generation and ML region assignment from a single PDB input — the all workflow requires only PDB files to run the full pipeline.
Both mlmm-toolkit and pdb2reaction include a custom GPU-optimized pysisyphus fork for geometry optimization, TS search, and IRC. This bundled fork is not compatible with the upstream pysisyphus package; do not install them side by side.
For CUDA 12.6:
conda create -n mlmm python=3.11 -y
conda activate mlmm
conda install -c conda-forge ambertools pdbfixer -y
pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu126
pip install mlmm-toolkit
huggingface-cli loginFor CUDA 12.9 (recommended for RTX 50 series):
conda create -n mlmm python=3.11 -y
conda activate mlmm
conda install -c conda-forge ambertools pdbfixer -y
pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu129
pip install mlmm-toolkit
huggingface-cli loginPrevious version: v0.2.0 introduced breaking changes to the CLI interface and Hessian handling. To install the previous stable release (v0.1.1) from GitHub:
pip install git+https://github.com/t-0hmura/mlmm_toolkit.git@v0.1.1
| Requirement | Notes |
|---|---|
| Python >= 3.11 | 3.12 also supported |
| CUDA runtime >= 12.6 | CUDA 12.9 recommended for RTX 50 series |
| Linux / WSL 2 | — |
conda create -n mlmm python=3.11 -y
conda activate mlmm
conda install -c conda-forge ambertools pdbfixer -y
pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu129
pip install mlmm-toolkit
plotly_get_chrome -y
huggingface-cli loginIf you are on an HPC cluster that uses environment modules, load CUDA before installing PyTorch:
module load cuda/12.6
fairchem-coreis a core dependency.aimnet2and other ML backends (orb-models,mace-torch) are optional extras (e.g.pip install "mlmm-toolkit[aimnet2]").
DFT dependencies are not installed by default. To use mlmm dft, install the [dft] extra:
pip install "mlmm-toolkit[dft]"This installs PySCF, GPU4PySCF (x86_64 only), and related CUDA libraries. Note that DFT single-point calculations are practical only for systems up to ~500 atoms in the ML region; larger systems will require prohibitive compute time and memory.
If AmberTools is loaded, unload it before installing to prevent conflicts for ParmEd:
module unload amberconda install -c conda-forge cyipopt -y
pip install pydmfUMA model weights are on Hugging Face Hub. You need to log in once (see https://github.com/facebookresearch/fairchem):
huggingface-cli login| Potential | Repository | Install extra |
|---|---|---|
| UMA (default) | https://github.com/facebookresearch/fairchem | (included) |
| ORB | https://github.com/orbital-materials/orb-models | pip install "mlmm-toolkit[orb]" |
| MACE | https://github.com/ACEsuit/mace | see note below |
| AIMNet2 | https://github.com/isayevlab/aimnetcentral | pip install "mlmm-toolkit[aimnet2]" |
Note: MACE and UMA cannot coexist due to conflicting
e3nnversions (fairchem-corerequirese3nn>=0.5,mace-torchrequirese3nn==0.4.4). To use MACE, uninstallfairchem-corefirst:pip uninstall fairchem-core pip install mace-torchThis means UMA will no longer be available in that environment. We recommend using a separate conda environment for MACE.
-
Build a structural model of the complex. Download coordinates from Protein Data Bank. If an experimental structure is not available, use structure prediction programs such as AlphaFold3, docking simulation programs, or GUI software such as PyMOL.
-
Generate parameter/topology and coordinate files. Create
.pdb,.parm7, and.rst7files of the complex (see the OpenMM tutorial). To mimic aqueous conditions, solvate the complex, then remove water molecules beyond ~6 Å.Note: elemental information (columns 77–78) is omitted in PDB files generated by tleap. Use
mlmm add-elem-infoto fix this. -
Define the ML region. Use
mlmm extractor any molecular viewer to select residues around the substrate:mlmm extract -i complex.pdb -c PRE -r 6.0 -l "PRE:-2" -o ml_region.pdbImportant: atom order, residue names, and residue numbers must match between the full PDB and the ML-region PDB. (In PyMOL, tick "Original atom order" when exporting.)
mlmm all -i R.pdb P.pdb -c PRE -l "PRE:-2" \
--tsopt --thermo --dftmlmm all -i R.pdb -c PRE -l "PRE:-2" \
--scan-lists "[('PRE 353 O1\'','PRE 353 C3',1.2)]"mlmm tsopt -i TS_candidate_layered.pdb --parm complex.parm7 \
-q 1 --opt-mode light# 1. Generate MM parameters
mlmm mm-parm -i complex.pdb -l "PRE:-2"
# 2. Extract active-site pocket
mlmm extract -i complex.pdb -c '353' -l "PRE:-2" -r 6.0
# 3. Assign 3-layer ONIOM regions
mlmm define-layer -i complex.pdb --model-pdb pocket.pdb --radius-freeze 10.0
# 4. Optimize geometry
mlmm opt -i complex_layered.pdb --parm complex.parm7 -q 1 --opt-mode heavy
# 5. MEP search
mlmm path-search -i R_layered.pdb P_layered.pdb --parm complex.parm7 -q 1
# 6. TS optimization
mlmm tsopt -i hei.pdb --parm complex.parm7 -q 1 --opt-mode light
# 7. Frequency analysis
mlmm freq -i ts_optimized.pdb --parm complex.parm7 -q 1
# 8. IRC
mlmm irc -i ts_optimized.pdb --parm complex.parm7 -q 1
# 9. DFT single-point
mlmm dft -i optimized.pdb --parm complex.parm7 -q 1Fully working example scripts are provided in the examples/ directory.
Start with the minimal examples/toy_system/ example, then explore a realistic enzyme case in examples/methyltransferase/.
| Subcommand | Role | Documentation |
|---|---|---|
all |
End-to-end: extraction → MEP → TS → IRC → freq → DFT | docs/all.md |
| Subcommand | Role | Documentation |
|---|---|---|
mm-parm |
Generate parm7/rst7 topology via AmberTools | docs/mm_parm.md |
extract |
Extract active-site pocket (cluster model) | docs/extract.md |
define-layer |
Assign 3-layer B-factor encoding | docs/define_layer.md |
add-elem-info |
Add/repair PDB element columns (77-78) | docs/add_elem_info.md |
fix-altloc |
Resolve alternate conformations | docs/fix_altloc.md |
| Subcommand | Role | Documentation |
|---|---|---|
opt |
Geometry optimization (L-BFGS or RFO) | docs/opt.md |
tsopt |
TS optimization (Dimer or RS-I-RFO) | docs/tsopt.md |
path-opt |
MEP optimization via GSM or DMF | docs/path_opt.md |
path-search |
Recursive MEP search with refinement | docs/path_search.md |
scan |
1D bond-length driven scan | docs/scan.md |
scan2d |
2D distance grid scan | docs/scan2d.md |
scan3d |
3D distance grid scan | docs/scan3d.md |
| Subcommand | Role | Documentation |
|---|---|---|
freq |
Vibrational frequency analysis + thermochemistry | docs/freq.md |
irc |
IRC calculation (EulerPC) | docs/irc.md |
dft |
Single-point DFT (GPU4PySCF / PySCF) | docs/dft.md |
| Subcommand | Role | Documentation |
|---|---|---|
trj2fig |
Energy plot from XYZ trajectory | docs/trj2fig.md |
energy-diagram |
Energy diagram from numeric values | docs/energy_diagram.md |
oniom-export |
Generate Gaussian ONIOM / ORCA QM/MM input | docs/oniom_export.md |
oniom-import |
Import ONIOM input and reconstruct XYZ/layered PDB | docs/oniom_import.md |
mlmm-toolkit uses a 3-layer system encoded in PDB B-factor columns:
| Layer | B-factor | Description | Hessian |
|---|---|---|---|
| ML | 0.0 | Active site (UMA energy/force/Hessian) | Yes |
| Movable MM | 10.0 | MM atoms allowed to move | No (by B-factor alone) |
| Frozen | 20.0 | Distant protein (coordinates fixed) | No |
Covalent bonds cut at the ML/MM boundary are capped with hydrogen link atoms. The ML region can therefore include entire amino-acid side chains when necessary.
Use mlmm define-layer to assign layers automatically based on a model PDB (the ML region).
The MM layer uses an analytical Hessian force field (hessian_ff) by default. Any force field capable of generating Amber parameters (e.g., ff19SB + OPC3 + GAFF2) can be used.
| Interface | Energy | Distance | Force | Hessian |
|---|---|---|---|---|
| Core & ASE | eV | Å | eV Å-1 | eV Å-2 |
| PySisyphus (CLI) | Hartree | Bohr | Ha Bohr-1 | Ha Bohr-2 |
Unit conversions are handled automatically inside the ML/MM calculator.
See Python API Reference for full documentation.
Three API levels are available: MLMMCore (base engine), MLMMASECalculator (ASE), and mlmm (pysisyphus Calculator).
from mlmm import MLMMCore
core = MLMMCore(
input_pdb = "complex.pdb", # Full system PDB (protein + substrate + solvent)
real_parm7 = "complex.parm7", # Amber topology for the full system
model_pdb = "ml_region.pdb", # ML region only (trimmed PDB)
model_charge = -1, # Formal charge of the ML region including link H atoms
model_mult = 1, # Spin multiplicity of the ML region
)
from ase.io import read
atoms = read("structure.pdb")
results = core.compute(atoms.get_positions(), return_forces=True, return_hessian=True)
energy = results["energy"] # float, eV
forces = results["forces"] # ndarray (N, 3), eV/Å
hessian = results["hessian"] # torch.Tensor (3N, 3N), eV/Å^2from mlmm import MLMMCore, MLMMASECalculator
core = MLMMCore(
input_pdb="complex.pdb", real_parm7="complex.parm7",
model_pdb="ml_region.pdb", model_charge=-1, model_mult=1,
)
calc = MLMMASECalculator(core)
atoms.calc = calc
energy = atoms.get_potential_energy() # eVmlmm pysis opt.yamlSee pysis documentation for YAML format and examples.
Notes
complex.pdbis the reference PDB used when the Amber parameters were generated, whereasstructure.pdbcan contain any starting geometry.- If
model_chargeis omitted, it defaults to 0. Always set it explicitly for charged systems.- v0.1.x parameter names (
real_pdb, etc.) andmlmm_ase()are still supported with deprecation warnings.
mlmm --help
mlmm <subcommand> --help
mlmm <subcommand> --help-advanced--help shows core options. --help-advanced shows the full option list including advanced parameters.
- Getting Started — Installation and first steps
- Concepts — 3-layer system, ONIOM, link atoms
- CLI Command Reference
- YAML Schema
- Troubleshooting — Common errors and fixes
- Full command index: docs/index.md
If you find this work helpful for your research, please cite:
@article{ohmura2025mlmm,
author = {Ohmura, Takuto and Inoue, Sei and Terada, Tohru},
title = {ML/MM Toolkit -- Towards Accelerated Mechanistic Investigation of Enzymatic Reactions},
year = {2025},
journal = {ChemRxiv},
doi = {10.26434/chemrxiv-2025-jft1k},
url = {https://doi.org/10.26434/chemrxiv-2025-jft1k}
}ML/MM toolkit is distributed under the GNU General Public License version 3 (GPL-3.0).
This software is still under development. Please use it at your own risk.
