Skip to content
This repository was archived by the owner on Aug 29, 2022. It is now read-only.

Notes on typical setup pipelines

ppxasjsm edited this page Feb 16, 2017 · 4 revisions

Based on an initial conversation at MolSSI 8 Oct 2016.

Contributors

  • John Chodera
  • Julien Michel
  • Peter Kasson
  • Oliver Beckstein

Before starting, Julien frequently does some manual inspection in a GUI:

  • missing loops
  • incomplete residues
  • cofactors: keep or discard? where to get parameters?
  • ions: keep or substitute?
  • crystal waters: keep or throw away?
  • crystal contacts, domain swapping
  • Read PDB paper to ensure that this is the protein structure that he wants to use
  • Note that assay conditions may differ from crystallographic conditions

What is our "typical" biomolecular preparation pipeline?

  • Decide which structure you want to use
  • Decide which chain to use if multiple copies
  • Reverting mutations or simulate a different construct
  • Disulfide if not in a reducing environment
  • Address PTMs
  • Julien typically uses the Maestro Protein Prep Wizard to:
    • add missing loops (up to a certain length)
    • add N/C-termini? Most people omit these
    • assign protonation states for desired pH
    • keep crystal waters; add hydrogens
    • interactively check histidine
  • Structural metal ions (e.g. Zn2+, Ca2+):
    • decide whether to retain
    • substitute with multisite models (alternatives: covalently bonded (harmonically restrained); single-site LJ)
  • Ligands and cofactors:
    • pick protonation state / tautomer
    • find or create parameters
    • covalently bound cofactors?
    • Consult Uppsala EDS to verify that ligand density justifies binding mode
    • model in rest of ligand or replace the ligand with another one (CCSD? swap from other PDB file? OpenEye)
  • Protonation states?
    • (PROPKA? 3.1 can do ligands; MCCE2?) Counterions and solvent
    • can do in either order
    • how big should box be? what shape? what buffer should be used? (Peter Kasson uses 20A buffer; Julien uses 12A; Oliver uses 15A)
    • for membrane proteins, at least 3-4 layers of lipids sideways; z-axis is very tricky
    • ionic strength

Challenges not yet addressed:

  • Membrane proteins
  • Proteins at surfaces/interfaces

Discussion @ Chodera Lab 16/2/2017

Participants:

  • J. Chodera
  • T. Mey
  • Chodera Lab members

Priorities over dependencies:

  1. Open source software
  2. Academic licence software
  3. Proprietary software

Workflow

  • Decide on source structure Data
    Input: Sequence(s) / biological units /ligands assay conditions
    • Important factors: Resolution, missing loops, bound ligands, sequence identity, conformation/diversity, structural bio techniques
    • Solution idea : construct explorer: uniprod + domains + splice mutants+ more domain knowledge e.g. python dictionary: {'PTMS 3 letter code': 1 protein, 'c1cccccc1' 1 ligand, 'nacl': 20 mM, 'Tris': 20 MM, 'pH' : 8.0}

Additional information needed would be e.g. Ligand expd tlc, Prot/tautomers or any new chemistry

Input should be generated automatically and could take the format of a topology-like object , or nested lists.

*Building blocks:

  • Clean API
  • Best practices, i.e. fully automated pipeline, e.g. using XML style input.
  • Questions: Should decisions based on best practices be potenitally allow for interactive intervention? Can a default choice be modified after running though automated setup. What kind of warning, override hints should be allowed? Modularity of different entry levels along the work flow should be allowed.

Tools we have

Tools we need

Ideas

  • Logging issues
  • Overrides/suggestions from users

Stuff that is out there we don't know how could fit in:

H++ (http://biophysics.cs.vt.edu)
SIMULAID (http://inka.mssm.edu/~mezei/simulaid/)
WHATIF (http://swift.cmbi.ru.nl/whatif/)
CHARMM-GUI (http://www.charmm-gui.org)
PROPKA (https://github.com/jensengroup/propka-3.1)
MCCE2 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2735604/)
HTMD (https://www.htmd.org)