-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor CovalentBuilder, Enable Custom Tether Atoms within Receptor, and Export Modified Polymer to PDB #277
Refactor CovalentBuilder, Enable Custom Tether Atoms within Receptor, and Export Modified Polymer to PDB #277
Conversation
in CLI script, when there are multiple connect_patterns or target res
mk_prepare_ligand.py
This is mostly done, but since the receptor tether atoms are no longer restricted to CA,CB, I would like to take some time and test some strange setup. There're new options available, but the majority of the syntax didn't change. I’ll be adding a new usage note soon. |
Usage Note 1: Custom receptor tether atoms (C-O) by Smarts: Docking Penicillin G (a bad enantiomer?)
Input:
Input:
Input:
Output: Usage Note 2: Custom receptor tether atoms by input atom names: In RNA & Targeting a Nonstandard ResidueThis example needs 9391c01 in #259
Input:
Input:
Input:
Output: |
a molsetup.atom
Just one more comment: We could modularize the prepare covalent ligand function similar to the original CovalentBuilder. However, it’s not necessary for the ligand’s anchored atoms to be linked to actual atoms. Similarly, the receptor tether atoms may not originate from the same monomer (especially if metals are involved) or share the identical atom types with the ligand tether atoms. Given the wide variety of possible configurations, it seems much more convenient to just set up the tether atoms by their indices and coordinates. This can be supplied from Python and via the |
Thank you for the great work! I think we want the covalent ligand to be a monomer because it is covalently bonded to the receptor and thus it is the same molecule as the receptor. Future features that require this design are:
We discussed creating a new command line script for preparing covalent modification. Do you have more thoughts on that? I'm not sure it's a good or a bad idea, I'm bringing it up because it feels strange to have |
Hi @diogomart
This makes sense and can be very useful. But to avoid overhead I want to defer it, and not include the "parameterization of the modified polymer" in docking preparation or basic export. In this PR, I put very primitive codes into This part of the code uses the two-monomer representation of target residue and covalent ligand, which is more flexible. This is useful for processing a large library of covalent ligands. Therefore, for post-processing of covalent docking and parameterization, a "monomer-fusion" function could be helpful. It needs more work, but I can open an issue or new PR for this. Does it sound like a good plan? With the two-monomer representation or even more fragmentation, we need to think about how to reconstruct padded_mol for individual fragments. This can also be a very useful thing to consider in future (for residue that is too big). But I will not include it in this PR.
This is OK before and after this PR. This PR gives the freedom to define the attachment points, in case the residue has nonstandard backbone or is a nucleic acid or for any reason the user does not want to flexibilize the entire sidechain. But again, it's always possible to flexibilize the sidechain from CA-CB.
I can draft a command-line script. I think it might be a good idea so I can revert the changes to the current command-line scripts. Let me prepare a draft, update this PR and see if you like it.
I understand this too. But I also like the current way that we can re-use mk_prepare_receptor.py for multiple purposes, given the wide range of options. Thanks again for your kind response. Let me know what you think! |
I wouldn't draft a new command line script just yet. The change I'm thinking about, storing the covalent ligand inside the polymer from the start, not just after the docking, might affect how we want the scripts to look like. In a way, preparing a covalent docking includes inserting a new monomer into an existing polymer.
I was just giving an example. I was not suggesting to use atom names. What I'm thinking is to enable keeping the original residue, e.g. cysteine, as a monomer, the covalent ligand, e.g. an acrylamide, as a second monomer, and still be able to make both the cysteine and acrylamide flexible, or just the acrylamide flexible as in the examples you provided.
Great point, this is the crux of it. It is too crazy to have the user specify a padding reaction? |
To clarify, I think we should stick to the two-monomer representation of target residue and covalent ligand that you already implemented. I think that's the best design. The only change is to add the covalent ligand to the polymer during preparation (before docking). The positions will clash and will be reasonable only after docking. |
Hi @diogomart
This is the overhead I was trying to avoid for docking. I don't want to store the covalent ligand in polymer, if we're only at the stage of docking. The modified polymer can be constructed after processing of covalent docking results (not for all covalent ligands). unless it's required by the docking engine, I don't know
In this PR, the original residue (target residue) is kept as is. It just has some atoms ignored so that they won't appear in the (rigid) receptor PDBQT file. The ignored atoms can be up to CA-CB or even backbone atoms, which is how this PR makes the target residue flexible.
Both are possible with this PR (
This is ok (currently possible?) but requires user knowledge of RDKit reaction language. I really appreciate the time and your thoughts. I will think more about this! |
I can draft that too and see if we like it! ^^ I really don't have a strong preference. |
Just one more clarification for this PR, on making both the cysteine (original residue) and acrylamide (covalent ligand) flexible: This is supported in this PR, but the flexible branches taken from the target residue need to be part of the covalent ligand. This way, the flexibility model is constructed over the covalent ligand Monomer. Currently there's no quick way to construct and store one flexibility model over two monomers (??). This is why I was thinking about making a "monomer-fusion" function. |
Can you elaborate on what you mean by overhead and on the disadvantages of including the covalent ligand in the polymer? |
I do like the idea of including the covalent ligand in the Polymer (I already wrote the implementation in the code block, but commented it out because it's not essential for exporting to PDB). I just wish to defer (delay) this process until the post-processing of docking, perhaps after some filtering. This is because, I wanted to avoid modification of polymer for individual covalent ligand in the docking preparation stage. I feel like Polymer is a very heavy data structure, thus I want to avoid the process of copying (making an editable copy) or modifying it. If we do want to store a polymer-like structure for the target residue and the covalent ligand, I might want to implement an "di-mer" editable copy so we're not readinf/writing the whole Polymer over and over again. The implementation (in this PR) does not modify the polymer for each covalent ligand. So that the receptor only needs to be processed once in the docking preparation, like in basic docking. The biggest advantage of current is that, the receptor is only prepared once and it's good for any covalent ligands (just like the way before this PR). The modified polymer can be exported from I will think about this >.< I can draft a version that modifies Polymer in the preparation. Actually I'm not sure how much "overhead" I'm talking about, maybe I'm just afraid of modifying Polymer... We can talk tomorrow or sometime in the near future about this. I get your points of including the covalent ligand in the Polymer and the important of it. Thanks again for your time and kind response. |
This makes perfect sense! And it's far superior for large screening. Thanks for the explanation. I was too focused on the single ligand case. Let's keep this as you implemented: ligand not inserted in polymer before docking. One other concern I have is that the Let's discuss tomorrow and then decide if we want to modify this PR or not. |
Thanks! Let's meet tomorrow and discuss.
Yes, that's a great point I didn't consider. Maybe I'm not doing the right thing.. is_ignore is the hacky way I use to suppress the writing to receptor PDBQT. |
Commenting for record purposes - recap of today's discussion:
With that, we will convert it to a draft for now. |
I will close this PR as it has become too complex to move forward without significant changes to other parts of meeko. Smaller but more realistic objects like being able to parse covalently modified polymers, handling of generalized padding, cross-linking / multiple bonds between pairs of residues, template fusion, etc, are required to have the blunt-headed and properly padded target residue for covalent docking. But at the moment, I feel it's very difficult to continue this PR without making these changes to many other parts. It might cause confusion to include all changes in every aspect into this PR. Even with the drawback we discussed above, I still believe the current state of this PR is an efficient solution with a reasonable procedure for docking preparation. Therefore, I would like to leave this PR as a workaround for those who need to perform covalent docking with custom tether atoms, preferably with the current docking engines like AutoDock GPU. |
This was discussed a while back among several of us. The major objective of this PR is to bring covalent ligands into the polymer world as monomers as suggested by @mattholc with improvements on how covalent docking can be set up and generalized in Meeko, with and without ProDy.
Three major tasks were included in this PR:
mk_prepare_ligand.py
: Improved Receptor ParsingClass
CovalentBuilder
and the dependency of ProDy incovalentbuilder.py
were removed (refactored). Reusable functions (find_smarts
andtransform
) were isolated.Supports parsing of [json, pdb] (and [mmcif] with ProDy) to define receptor tether atoms. Some options (
--add_templates, --wanted_altloc, --default_altloc
) were exposed the same way asmk_prepare_receptor.py
because they're potentially important options to determine the coordinates of the receptor tether atoms.Sets receptor tether atoms by
--rec_residue
,--rec_tether_names
or alternatively--rec_tether_smarts
and--rec_smarts_indices
.d706720 is a working checkpoint for covalent ligand preparation. No more
CovalentBuilder
and ProDy is optional.mk_prepare_ligand.py
-> JSON ->mk_prepare_receptor.py
: A Specialized Receptor Preparation Routine and Custom Setup of Receptor Tether Atoms393372b is a working checkpoint to write covalent receptor JSON from
mk_prepare_ligand.py
at the same time as the ligands are prepared. One can re-use this JSON to write the rigid part to the receptor PDBQT for covalent docking.mk_export.py
: Fix Up Excess (padding) hydrogens in the exported PDB from mk_export.py #280 and Revert the inefficient implementation I made tomk_export.py
63747dd is a working checkpoint that has the optimized
mk_export.py
. Previously, I implementeddeepcopy
of Polymer which is very inefficient. Now, the Polymer is modified outside of the pose iteration. The additional PDB block is simply written byChem.MolToPDBBlock
. It is also possible, although not implemented, to export the updated Polymer JSON with covalent ligand as an extra Monomer.Additional changes:
Fix: Log (logger.warning) lone hydrogens that are ignored in template matching
This can be quite common when hydrogens present in PDB have an elongated bond length. Previously, these lone hydrogens were ignored silently. This PR includes a warning without directly disrupting the execution of template matching:
mk_prepare_ligand.py
outputs (overwrites) prepared covalent ligands at different residues to the same file #278Fix: Disambiguate output filenames for prepared covalent ligandswith different connect patterns and/or target residues
Changing the Format of the output filename to:
Examples:
Relocated
parse_cmdline_res
andparse_cmdline_res_assign
tomeeko.utils.utils
to be re-used in
mk_prepare_ligand.py
Added option to write updated receptor JSON files from
mk_prepare_ligand.py
to be re-used by
mk_prepare_receptor.py
mk_prepare_receptor.py
In summary, this PR extends covalent docking capabilities, making receptor tether atom setups more flexible and improving both the efficiency and continuity of the ligand and receptor preparation workflows. Together with optional alternate charge models for receptor and better support for metals, this work would be useful for nucleic acid docking.