Skip to content

t-0hmura/pdb2reaction

Repository files navigation

pdb2reaction: automated reaction-path modeling directly from PDB structures

Overview

pdb2reaction is a Python CLI toolkit for turning PDB structures into enzymatic reaction pathways with machine-learning interatomic potentials (MLIPs). Each workflow step is also available as an individual subcommand (opt, scan, scan2d, path-search, tsopt, freq, irc, dft, energy-diagram, etc.) for fine-grained control.

A single command can generate a first-pass enzymatic reaction path:

# bezA (GPP C6-methyltransferase): methyl transfer (SAM→GPP C6) + proton abstraction (E170)
pdb2reaction -i 1.R.pdb 3.P.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3'
# Scan mode (single structure → staged bond scans → MEP)
pdb2reaction -i 1.R.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
    --scan-lists '[("CS1 SAM 320","GPP 321 C7",1.60)]' \
                 '[("GPP 321 H11","GLU 186 OE2",0.90)]'

The full workflow — MEP search → TS optimization → IRC → thermochemistry → single-point DFT — can be run in one command:

pdb2reaction -i 1.R.pdb 3.P.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
    --tsopt --thermo --dft

Working examples are provided in the examples/ directory, including complete all workflow scripts for both multi-structure MEP and scan-based pipelines. The example system is GPP C6-methyltransferase BezA (Tsutsumi et al., Angew. Chem. Int. Ed. 2022, 61, e202111217), which catalyzes a two-step reaction: (1) electrophilic methyl transfer from SAM to the C6 position of GPP via a C7 carbocation intermediate, and (2) proton abstraction from C6 by the catalytic base E170 to yield 6-methylgeranyl pyrophosphate (6MGPP).


Given (i) two or more PDB files (R → ... → P), or (ii) one PDB with --scan-lists, or (iii) one TS candidate with --tsopt, pdb2reaction automatically:

  • extracts an active-site model around user-defined substrates to build a cluster model,
  • explores minimum-energy paths (MEPs) with GSM or DMF,
  • optionally optimizes transition states, runs vibrational analysis, IRC, and single-point DFT,

using machine-learning interatomic potentials (MLIPs).

Related tools

Tool Use case Repository
mlmm-toolkit ML/MM (ONIOM) with full protein environment — automates MM parameter generation and ML region assignment from a single PDB input https://github.com/t-0hmura/mlmm_toolkit
UMA–Pysisyphus Interface YAML-input-based reaction mechanism analysis for small molecules https://github.com/t-0hmura/uma_pysis

Both pdb2reaction and mlmm-toolkit include a custom GPU-optimized pysisyphus fork for geometry optimization, TS search, and IRC. This bundled fork is not compatible with the upstream pysisyphus package; do not install them side by side.

Important (prerequisites):

  • Input PDB files must already contain hydrogen atoms.
  • When providing multiple PDBs, they must contain the same atoms in the same order (only coordinates may differ).
  • Boolean CLI options accept both --flag / --no-flag and value style --flag True/False (yes/no, 1/0 are also accepted). Prefer toggle style in new scripts.
  • The workflow also works for small-molecule systems. If you omit --center/-c and --ligand-charge, you can use .xyz or .gjf inputs as well.

Documentation

This software is still under development. Please use it at your own risk.


Installation

pdb2reaction requires Linux with a CUDA-capable GPU.

Prerequisites

  • Python >= 3.11
  • CUDA 12.x

Minimal setup (CUDA 12.9, torch 2.8.0)

pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu129
pip install pdb2reaction
plotly_get_chrome -y
huggingface-cli login

For DMF method

conda create -n pdb2reaction python=3.11 -y
conda activate pdb2reaction
conda install -c conda-forge cyipopt -y
pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu129
pip install pdb2reaction
plotly_get_chrome -y

DFT single-point (pdb2reaction dft)

DFT dependencies are not installed by default. To use pdb2reaction dft, install the [dft] extra:

pip install "pdb2reaction[dft]"

This installs PySCF, GPU4PySCF (x86_64 only), and related CUDA libraries. Note that DFT single-point calculations are practical only for systems up to ~500 atoms; larger systems will require prohibitive compute time and memory.

For detailed installation instructions, see Installation.

Supported ML potentials

Potential Repository Install extra
UMA (default) https://github.com/facebookresearch/fairchem (included)
ORB https://github.com/orbital-materials/orb-models pip install "pdb2reaction[orb]"
MACE https://github.com/ACEsuit/mace See below
AIMNet2 https://github.com/isayevlab/aimnetcentral pip install "pdb2reaction[aimnet]"

MACE installation: MACE requires e3nn==0.4.4, which conflicts with fairchem-core (UMA). To use MACE, first uninstall UMA's dependency, then install MACE:

pip uninstall fairchem-core
pip install mace-torch

UMA and MACE cannot coexist in the same environment. Use separate conda environments if you need both.


Quick Examples

The examples below use GPP C6-methyltransferase BezA (Tsutsumi et al., Angew. Chem. Int. Ed. 2022, 61, e202111217) — a two-step mechanism: electrophilic methyl transfer from SAM to GPP C6 (via C7 carbocation), then proton abstraction by E170. Complete working scripts are in examples/.

Full workflow (multi-structure MEP)

pdb2reaction -i 1.R.pdb 3.P.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
    --tsopt --thermo --out-dir result_mep

Scan mode (single structure → staged bond scans → MEP)

pdb2reaction -i 1.R.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
    --scan-lists '[("CS1 SAM 320","GPP 321 C7",1.60)]' \
                 '[("GPP 321 H11","GLU 186 OE2",0.90)]' \
    --tsopt --thermo --out-dir result_scan

TS optimization only

pdb2reaction -i TS_candidate.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
    --tsopt

Step-by-step workflow

1. Extract active-site model (cluster model)extract

pdb2reaction extract -i 1.R.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' -r 6.0

2. Optimize geometryopt

pdb2reaction opt -i model.pdb -l 'SAM:1,GPP:-3'

3. MEP searchpath-opt

pdb2reaction path-opt -i R_model.pdb P_model.pdb -l 'SAM:1,GPP:-3'

4. TS optimizationtsopt

pdb2reaction tsopt -i hei.pdb -l 'SAM:1,GPP:-3'

5. Frequency analysisfreq

pdb2reaction freq -i ts_optimized.pdb -l 'SAM:1,GPP:-3'

6. IRCirc

pdb2reaction irc -i ts_optimized.pdb -l 'SAM:1,GPP:-3'

7. DFT single-pointdft

pdb2reaction dft -i optimized.pdb -l 'SAM:1,GPP:-3'

CLI Subcommands

Workflow

Subcommand Role Documentation
all End-to-end: extraction → MEP → TS → IRC → freq → DFT docs/all.md

Structure Preparation

Subcommand Role Documentation
extract Extract active-site model (cluster model) docs/extract.md
fix-altloc Resolve alternate conformations in PDB files docs/fix_altloc.md
add-elem-info Add/repair PDB element columns (77–78) docs/add_elem_info.md

Optimization & Path Search

Subcommand Role Documentation
opt Geometry optimization (L-BFGS or RFO) docs/opt.md
tsopt TS optimization (Dimer or RS-I-RFO) docs/tsopt.md
path-opt MEP optimization via GSM or DMF docs/path_opt.md
path-search Recursive MEP search with refinement docs/path_search.md
scan 1D bond-length driven scan docs/scan.md
scan2d 2D distance grid scan docs/scan2d.md
scan3d 3D distance grid scan docs/scan3d.md

Analysis

Subcommand Role Documentation
freq Vibrational frequency analysis + thermochemistry docs/freq.md
irc IRC calculation (EulerPC) docs/irc.md
dft Single-point DFT (GPU4PySCF / PySCF) docs/dft.md
bond-summary Compare structures and report bond changes docs/bond-summary.md

Visualization

Subcommand Role Documentation
trj2fig Energy plot from XYZ trajectory docs/trj2fig.md
energy-diagram Energy diagram from numeric values docs/energy_diagram.md

Tip: In tsopt, freq, and irc, setting --hessian-calc-mode Analytical is strongly recommended when you have enough VRAM.


HPC / Multi-GPU

On HPC clusters or multi-GPU workstations, pdb2reaction can parallelize UMA inference across nodes. Set workers and workers_per_node to enable parallel inference; see docs/uma_pysis.md for details.


Getting Help

pdb2reaction --help
pdb2reaction <subcommand> --help
pdb2reaction <subcommand> --help-advanced
pdb2reaction all --help-advanced
# Shorthand alias (equivalent to pdb2reaction)
p2r --help
# Equivalent module invocation
python -m pdb2reaction --help

pdb2reaction all --help shows core options. Use pdb2reaction all --help-advanced for the full option list. scan, scan2d, scan3d, and the calculation commands (opt, path-opt, path-search, tsopt, freq, irc, dft) now follow the same progressive-help pattern (--help core, --help-advanced full). add-elem-info, trj2fig, and energy-diagram also use the same pattern. extract and fix-altloc also support progressive help (--help core, --help-advanced full parser options).

If you encounter any issues, please open an issue at https://github.com/t-0hmura/pdb2reaction/issues.


Citation

A preprint describing pdb2reaction is in preparation. Currently, if you find this work helpful for your research, please cite the software itself:

@software{ohmura2026pdb2reaction,
  author       = {Ohmura, Takuto},
  title        = {pdb2reaction},
  year         = {2026},
  month        = {3},
  version      = {0.3.2},
  url          = {https://github.com/t-0hmura/pdb2reaction},
  license      = {GPL-3.0},
  doi          = {10.5281/zenodo.19197878}
}

Known limitations

  • MACE and UMA cannot coexist in the same environment due to an e3nn version conflict. Use separate conda environments.
  • DFT single-point (pdb2reaction dft) is practical up to ~500 atoms; larger systems may require fragmentation.
  • ORB backend has a higher failure rate on multi-step reactions (SVD failures in path optimization).
  • CPU-only execution is supported but 10-100x slower than GPU.

License

pdb2reaction is distributed under the GNU General Public License version 3 (GPL-3.0).

This software is still under development. Please use it at your own risk.