This page collects common failure modes and practical fixes. Search this page for the error message you encounter. If you want a symptom-first entrypoint, start with Common Error Recipes and then return here for details.
Before a long run, verify:
- You can run
mlmm -hand see the CLI help. - MLIP model weights can be downloaded (for the default UMA backend, a Hugging Face login/token must be available on the machine; other backends may download from different sources).
- For enzyme workflows: your input PDB(s) contain hydrogens and element symbols.
- When you provide multiple PDBs: they have the same atoms in the same order (only coordinates differ).
- AmberTools is correctly installed via conda channels (or built from source) and
tleapis available (required formm-parm). - The
hessian_ffC++ native extension is correctly built (if automatic build fails, runcd hessian_ff/native && make).
Typical message:
Element symbols are missing in '...'.
Please run `mlmm add-elem-info -i...` to populate element columns before running extract.
Fix:
-
Run:
mlmm add-elem-info -i input.pdb -o input_with_elem.pdb
-
Then re-run
extract/allusing the updated PDB.
Why it happens:
- Sometimes, PDBs do not populate the element column consistently.
extractrequires element symbols for reliable atom typing.
Typical messages:
[multi] Atom count mismatch between input #1 and input #2:...
[multi] Atom order mismatch between input #1 and input #2.
Fix:
- Regenerate all structures with the same preparation workflow (same protonation tool, same settings).
- If you add hydrogens, do it in a way that produces consistent ordering across all frames.
Tip:
- For ensembles generated by MD, prefer extracting frames from the same trajectory/topology rather than mixing PDBs produced by different tools.
Alternative:
- If you cannot prepare matching multi-structure inputs, use the single-structure scan workflow instead: provide one PDB with
--scan-liststo generate endpoints via distance scans.
Symptoms:
- The extracted pocket is unexpectedly small.
- Key catalytic residues are missing.
Fixes to try:
- Increase
--radius(e.g., 2.6 -> 3.5 Angstrom). - Use
--selected-resnto force-include residues (e.g.,--selected-resn 'A:123,B:456'). - Alternatively, you can manually create the ML region PDB using a molecular viewer (e.g., PyMOL) by selecting the active-site atoms and exporting them. Supply this PDB via
--model-pdb.
Symptoms:
- Calculated energies or reaction barriers seem unreasonable.
- Results change significantly when the model size is increased.
Fix:
-
If the extracted pocket is too small, calculated energies and barriers may be unreliable. Increase the extraction radius (e.g.,
-r 4.0or higher) to include more of the protein environment:mlmm extract -i complex.pdb -c 'SUB' -o pocket.pdb -r 4.0
If the extracted pocket contains modified amino acid residues (e.g., phosphoserine, methylated lysine, D-amino acids) with non-standard three-letter codes, backbone truncation and link-hydrogen placement will not be applied to them by default. Use --modified-residue to register them:
mlmm extract -i complex.pdb -c PRE --modified-residue "SEP,TPO,MLY" -o pocket.pdbThe same flag is available on the all command and is forwarded to the extraction stage.
If --modified-residue is insufficient (e.g., the residue has an unusual backbone topology), construct the pocket model manually with appropriate link hydrogens, and pass the pocket PDB and parm7 files directly to downstream commands (opt, tsopt, path-opt, etc.) via --parm and --model-pdb.
Calculation subcommands require explicit -q/--charge and -m/--multiplicity.
In all, charge is resolved in order: -q/--charge override -> extraction summary -> --ligand-charge fallback when extraction is skipped.
Fix:
-
Provide charge and multiplicity explicitly:
mlmm path-search -i R.pdb P.pdb --parm real.parm7 --model-pdb model.pdb -q 0 -m 1
-
Or, when using extraction, provide a residue-name mapping and run through
all:mlmm -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3'
Typical message:
FileNotFoundError: tleap not found on PATH
or
mm-parm requires AmberTools (tleap, antechamber, parmchk2).
Fix:
-
Install AmberTools via conda:
conda install -c conda-forge ambertools -y
-
Or build from source (https://ambermd.org/AmberTools.php), or load the appropriate module on HPC:
module load ambertools
-
Verify availability:
which tleap which antechamber which parmchk2
-
Note: without AmberTools, you can still run
opt,tsopt,path-search, etc. if you supply--parmmanually.
Symptoms:
mm-parmfails during ligand parameterization.- Errors about atom type assignment or charge calculation.
Fixes to try:
-
Check that the ligand has correct element symbols and bond connectivity in the PDB.
-
Ensure
--ligand-chargeis specified correctly:-l 'GPP:-3,SAM:1'. -
Use
--keep-tempto preserve intermediate files and inspect<resname>.antechamber.log:mlmm mm-parm -i input.pdb -l 'LIG:-1' --keep-temp -
Check that hydrogen atoms are correctly added and TER records are appropriate.
-
Ensure
--ligand-multis specified for non-singlet ligands (e.g.,--ligand-mult 'HEM:1,NO:2'). The default spin multiplicity is 1 (singlet). -
Try running antechamber manually on the extracted ligand PDB to diagnose the issue:
antechamber -i ligand.pdb -fi pdb -o ligand.mol2 -fo mol2 -c bcc -nc -3 -at gaff2
-
For higher-accuracy partial charges, consider computing RESP charges from an HF/6-31G* calculation and providing custom
frcmod/libfiles instead of relying on AM1-BCC.
Typical messages:
Atom count in parm7 (...) does not match input PDB (...)
or
RuntimeError: parm7 topology does not match the input structure
or
Coordinate shape mismatch for... got (N, 3), expected (M, 3)
Fix:
- The parm7 file must correspond to exactly the same atoms (in the same order) as the input PDB.
- Re-run
mm-parmto regenerate the parm7 from the current PDB. - Do not edit or reorder PDB atoms after running
mm-parm. - When re-running
mm-parm, use the output PDB (<prefix>.pdb) as the input for subsequent calculations, since tleap may add or remove hydrogens.
Symptoms:
oniom-exportreports "Element sequence mismatch at atom index..."
Fix:
- Use
--no-element-checkto disable the element check (verify results manually). - The correct fix is to use the same PDB for
-ithat was used when generating the parm7.
Typical symptoms:
makeinhessian_ff/native/produces compilation errors.ImportError: cannot import name 'ForceFieldTorch' from 'hessian_ff'.RuntimeError: hessian_ff build attempts failed: ...
Fixes to try:
-
Ensure you have a C++ compiler (g++ >= 9) installed:
g++ --version
-
Ensure PyTorch headers are available:
python -c "import torch; print(torch.utils.cmake_prefix_path)" -
On HPC, load a compiler module:
module load gcc/11
-
Clean and rebuild:
conda install -c conda-forge ninja -y cd hessian_ff/native && make clean && make
Typical message:
ImportError: cannot import name 'ForceFieldTorch' from 'hessian_ff'
or:
RuntimeError: hessian_ff build attempts failed: ...
To rebuild hessian_ff native extensions in this environment:
conda install -c conda-forge ninja -y
cd $(python -c "import hessian_ff; print(hessian_ff.__path__[0])")/native && make clean && make
Fix:
-
The C++ native extension needs to be built first:
cd hessian_ff/native && make
-
Ensure the
hessian_ffpackage is in your Python path (it should be if you installed mlmm-toolkit withpip install -e .).
Symptoms:
- Atoms are assigned to unexpected layers.
- ML region is too small or too large.
Fixes to try:
- B-factor encoding: ML = 0.0, Movable-MM = 10.0, Frozen-MM = 20.0.
- Inspect the layer-assigned PDB visually (color by B-factor in your molecular viewer).
- Check that
--model-pdbcorrectly defines the ML region atoms. - Adjust the distance cutoffs in
define-layer: --radius-freeze(default 8.0 Angstrom): controls Movable-MM/Frozen boundary.- If needed, control Hessian-target MM separately in calc options (
hess_cutoff,hess_mm_atoms). - If using
use_bfactor_layers: truein YAML, verify that B-factor values match the expected encoding (0.0, 10.0, 20.0 with tolerance 1.0).
Typical symptoms:
- Calculator treats all atoms as frozen or all as ML.
- B-factor values are not one of {0.0, 10.0, 20.0}.
Fix:
- Re-run
define-layerto ensure correct B-factor encoding. - A tolerance of 1.0 is applied: B-factors near 0/10/20 map to ML/Movable/Frozen.
- Do not manually edit B-factors to arbitrary values.
Symptoms:
- Automatic layer detection from B-factors produces unexpected ML/Movable/Frozen splits.
- Running with
--detect-layerwithout--model-pdbfails.
Fixes to try:
- Ensure the input is a PDB (or an XYZ with
--ref-pdb). - Re-run
define-layerto explicitly assign B-factors, then use the generated PDB. - For distance-based control, specify
hess_cutoff/movable_cutoffand switch to--no-detect-layerif needed. - Note that supplying
--movable-cutoffdisables--detect-layer.
Symptoms:
- Errors about being unable to download model weights or missing authentication. For the default UMA backend, this typically means a missing Hugging Face login/token.
Fix:
-
Log in once per environment/machine:
huggingface-cli login
-
On HPC, ensure your home directory (or HF cache directory) is writable from compute nodes.
Symptoms:
torch.cuda.is_available()is false even though you have a GPU.- CUDA runtime errors at import time.
Fixes:
-
Install a PyTorch build matching your cluster CUDA runtime.
-
Confirm GPU visibility:
nvidia-smi python -c "import torch; print(torch.version.cuda, torch.cuda.is_available())"
If you use DMF (--mep-mode dmf) and see errors importing IPOPT/cyipopt:
Fix:
-
Install
cyipoptfrom conda-forge (recommended) before installingmlmm:conda install -c conda-forge cyipopt
If figure export fails and you see Plotly/Chrome-related errors:
Fix:
-
Install a headless Chrome once:
plotly_get_chrome -y
Symptoms:
torch.cuda.OutOfMemoryError: CUDA out of memory- System hangs or crashes during Hessian calculation.
ML/MM systems are typically larger than pure MLIP calculations, so VRAM pressure is higher.
Fixes to try (in order of likelihood):
- Verify Frozen-MM layer: check that
define-layerhas correctly assigned Frozen-MM atoms (B=20.0). If the Frozen-MM region is too small, the Movable-MM region (and thus the Hessian) becomes unnecessarily large. Decrease--radius-freezeto expand the Frozen region. - Reduce ML region size: use a smaller extraction radius (
--radiusinextract) or manually define a smaller ML region PDB via--model-pdb. - Use FiniteDifference ML Hessian: set
--hessian-calc-mode FiniteDifference(uses less VRAM but is slower). - Pre-define layers with
define-layerand useuse_bfactor_layers: true. - Use a GPU with more VRAM: 24 GB+ recommended for systems with 500+ ML atoms; 48 GB+ for 1000+ ML atoms.
Symptoms:
- TS optimization runs for many cycles without converging.
- Multiple imaginary frequencies remain after optimization.
Fixes to try:
- Switch optimizer modes:
--opt-mode grad(Dimer) or--opt-mode hess(RS-I-RFO). - Enable flattening of extra imaginary modes:
--flatten. - Increase max cycles:
--max-cycles 20000. - Use tighter convergence:
--thresh bakeror--thresh gau_tight. - Adjust
hess_cutoffto expand the range of atoms included in the Hessian calculation.
Symptoms:
opt/tsoptkeeps running but the reported energy has been flat for many cycles (every few dozen steps shows|dE| < 1e-4au).- Max/RMS forces sit just above the
gau/bakerthresholds and never drop further, even after thousands of cycles. - Summary log eventually reports convergence via the energy plateau fallback rather than the gradient preset.
Why it happens:
- MLIPs have a finite numerical precision (force "noise floor"). For large
ML/MM systems, that noise floor can exceed the standard gradient-based
convergence thresholds (
gau,baker, …), so the forces never drop below the preset even though the geometry is effectively stationary.
What to do:
- Since v0.2.8, this is handled automatically: the shared
optblock enablesenergy_plateau: trueby default. When the energy range over the last 50 steps falls below1.0e-4au (~0.06 kcal/mol), the optimizer declares convergence and exits cleanly. No action is needed in the common case. - If you see the plateau fallback triggering too early on a system that is
still clearly moving, tighten the thresholds in YAML:
opt: energy_plateau_thresh: 1.0e-05 # stricter plateau tolerance (au) energy_plateau_window: 100 # require a longer flat stretch
- To disable the fallback entirely (e.g., for benchmarking convergence
behavior), set
opt.energy_plateau: false— the optimizer will then rely solely on thethreshpreset. - The plateau check is automatically skipped for chain-of-states (COS)
optimizers (GS/DMF string optimizers), so
path-opt/path-searchare unaffected.
Symptoms:
- IRC stops before reaching a clear minimum.
- Energy oscillates or gradient remains high.
Fixes to try:
- Reduce step size:
--step-size 0.05(default is 0.10). - Increase max cycles:
--max-cycles 200. - Check if the TS candidate has only one imaginary frequency before running IRC.
Symptoms:
- Path search terminates with no valid MEP.
- Bond changes are not detected correctly.
Fixes to try:
- Increase
--max-nodes(e.g., 15 or 20) for complex reactions. - Enable endpoint pre-optimization:
--preopt. - Try the alternative MEP method:
--mep-mode dmf(if GSM fails) or vice versa. - Adjust bond detection parameters in YAML (
bond.bond_factor,bond.delta_fraction).
- Out of memory (VRAM): reduce ML region size, reduce Hessian-target MM region, reduce nodes (
--max-nodes), or use lighter optimizer settings (--opt-mode grad). - Analytical ML Hessian is slow or OOM: use
--hessian-calc-mode FiniteDifferencefor the ML region. Only useAnalyticalif you have ample VRAM (24 GB+ recommended for 300+ ML atoms). - MM Hessian:
mm_fd: true(default) uses finite-difference for MM Hessian. Analytical MM Hessian (mm_fd: false) is faster for small systems but may require more memory. - MM Hessian is slow: set
hess_cutoffto limit the number of Hessian-target MM atoms. - Large systems (2000+ atoms): ensure frozen atoms are properly set (Frozen layer, B=20) to reduce the movable DOF count. Use
define-layerwith appropriate cutoffs. - Multi-GPU: place ML on one GPU (
ml_cuda_idx: 0) and MM on another (mm_device: cuda,mm_cuda_idx: 1) if available. - ML and MM parallel execution: by default, ML (GPU) and MM (CPU) run in parallel. Tune CPU thread count with
mm_threads.
Symptom: ImportError: orb-models is required for the ORB backend
Fix: Install the optional dependency for the chosen backend:
pip install "mlmm-toolkit[orb]" # ORB backend
pip install "mlmm-toolkit[aimnet]" # AIMNet2 backend
pip install --no-deps mace-torch # MACE backendSymptom: RuntimeError: CUDA out of memory when using ORB, MACE, or AIMNet2.
Fix: Non-UMA backends use finite-difference Hessians, which require more VRAM. Options:
- Use
--hessian-calc-mode FiniteDifferenceexplicitly with a smallerhess_cutoff - Use
ml_device: cpuin YAML (slower but avoids VRAM limits)
Symptom: XTBEmbedError: xTB command not found
Fix: Install xTB and ensure it's on $PATH:
conda install -c conda-forge xtbWhen asking for help, include:
- The exact command line you ran
summary.log(or console output)- The smallest input files that reproduce the problem (if possible)
- Your environment: OS, Python, CUDA, PyTorch versions
- Whether AmberTools and hessian_ff are properly installed