Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust positions of hydrogen isotopes as if they were regular Hs #296

Merged
merged 16 commits into from
Jan 10, 2025

Conversation

rwxayheee
Copy link
Contributor

@rwxayheee rwxayheee commented Dec 26, 2024

This is for #295 and #298.

Added a function set_h_isotope_atom_coords to replace SetTerminalAtomCoords for the assignment of H isotope coordinates with AddHs, and with preserved chirality if from input molecular graph.

More description in comments.

@rwxayheee rwxayheee changed the base branch from release to develop December 26, 2024 02:53
@rwxayheee rwxayheee changed the title Update pos of h isotopes Adjust positions of hydrogen isotopes as if they were regular Hs Dec 26, 2024
@rwxayheee rwxayheee marked this pull request as ready for review December 26, 2024 11:02
@rwxayheee
Copy link
Contributor Author

rwxayheee commented Dec 27, 2024

I tried to implement the embed-align method which is a different approach to this PR. The main advantage of doing that is it has an almost seamless way to handle isotopes and isomers. But the re-generated mol conformation can still look different (even the RMSD is low, like 0.5) from the docked pose, which sometimes possesses a strange torsion. With that I don't think it's applicable at the moment. Embedding the isotope-containing fragment is another way, but again if the docked pose is distorted, there's inevitably a shift of the export ligand. Since we also need to recover the exact position of polar hydrogens, we don't want the ligand atom positions to change.

Tomorrow I will go back to the Add-RemoveHs method and continue this (for the chirality rebuild, or check, i'm not sure..). I had many misconceptions about RDKit Smiles handling before this. A lot of what I previously commented on #297 and #295 was wrong. It cost me so much time to realize the way I wrote Smiles has many problems ..

@rwxayheee
Copy link
Contributor Author

rwxayheee commented Dec 28, 2024

Documenting for future reference:

I tried many different ways to call SetTerminalAtomCoords (on complete mol, on mol with partially assigned conf, etc.) and in different versions. It reproduces AddHs geometry for single atom, when the other atoms' coordinates are reasonable.

However, when multiple atoms (isotopes) are missing a valid coordinates (or just at (0,0,0)), the positions from SetTerminalAtomCoords can go wrong even with an ideal geometry of the heavy atoms. See an example at the end of this comment.

I feel like this is because it's a function intended for single coordinate assignment, assuming the surrounding geometry is complete. This behavior has led to lots of confusion I had these days, but now I think the undesired positions have very little to do with docking poses being distorted. The unwanted geometry results from how the SetTerminalAtomCoords function is designed.

Thus, the real problem might be the usage of SetTerminalAtomCoords and the condition of the molecule conformation when this function is being called. This PR will implement a replacement that's specific for H-isotope (this issue). But in the future, I will keep looking and see if there are other replacements in RDKit.

It should be noted that AddHs does not necessarily return the best geometry, either - there can be close contacts between hydrogens.

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Geometry import Point3D

"""
rdkit.__version__
'2024.09.3'
"""

# SMILES in Issue 295
smi = "[2H]C([2H])([2H])[C@H](O)C([2H])([2H])CCCn1c(=O)c2c(ncn2C)n(C)c1=O"
mol = Chem.MolFromSmiles(smi)
AllChem.EmbedMolecule(mol, AllChem.ETKDG())

mol_with_h = Chem.AddHs(mol, addCoords=True)

# Ideal, as reference
writer = Chem.SDWriter("mol_0.sdf")
writer.write(mol_with_h)
writer.close()

for atom in mol_with_h.GetAtoms():
    idx = atom.GetIdx()
    if atom.GetIsotope()>0:
        neigh = atom.GetNeighbors()
        AllChem.SetTerminalAtomCoords(mol_with_h, idx, neigh[0].GetIdx())

# OK, if isotopes are set one at a time? 
writer = Chem.SDWriter("mol_1.sdf")
writer.write(mol_with_h)
writer.close()

conformer = mol_with_h.GetConformer()
for atom in mol_with_h.GetAtoms():
    idx = atom.GetIdx()
    if atom.GetIsotope()>0:
        conformer.SetAtomPosition(idx, Point3D(0,0,0))

for atom in mol_with_h.GetAtoms():
    idx = atom.GetIdx()
    if atom.GetIsotope()>0:
        neigh = atom.GetNeighbors()
        AllChem.SetTerminalAtomCoords(mol_with_h, idx, neigh[0].GetIdx())

# Not OK - has distorted angels
writer = Chem.SDWriter("mol_2.sdf")
writer.write(mol_with_h)
writer.close()

@diogomart
Copy link
Contributor

I agree: the problem is how I'm using SetTerminalAtomCoords. Based on its source in RDKit's AddHs.cpp, it uses the positions of all atoms in the molecule. We are assigning positions to heavy atoms but leaving deuterium undefined or at (0,0,0). Then, when we call SetTerminalAtomCoords to calculate a position for a given atom, the undefined or zeroed positions of nearby atoms bound to the same heavy atom are taken into account, causing erroneous geometries.

@rwxayheee rwxayheee marked this pull request as draft December 28, 2024 06:45
@rwxayheee
Copy link
Contributor Author

[1a5373b] is a working version. I will do some cleanup, add comments and run more checks tomorrow.

Added a function to replace SetTerminalAtomCoords. It uses AddHs to assign coordinates and tries to correct _CIPCode if the initial assignment does not match with input. This function can be used like a version ofAddHs that includes H isotopes.

Input:
an RDKit mol, it can be under-processing (no regular Hs, new conformer not set yet),
the incomplete conformer,
an optional flag to return fully-processed mol copy (all Hs added, with a complete conformer)

Output:
a dictionary (H isotope atom idx in input mol -> assigned coordinate in Point3D), or a fully-processed mol

Current state:
Can pass existing tests
Output for #295:
DB15122_1a5373b.sdf.txt

The output looks exactly like the output when this PR was at c9435bb.

The major improvement is handling of chirality, and I also refactored and encapsulated the additional process, to insert the minimal amount of codes into rdkit_mol_create.py.

now initialize by Chem.rdCIPLabeler.AssignCIPLabels, and update from 3D by Chem.AssignStereochemistryFrom3D
@rwxayheee rwxayheee marked this pull request as ready for review December 28, 2024 23:26
@rwxayheee
Copy link
Contributor Author

rwxayheee commented Dec 28, 2024

[b1a2835] is a bugfix. I changed the CIP chirality tag assignment methods. Previously, I wasn't using the method that's 3D-aware to re-evaluate the chirality after the initial placement and the isotope swap.

This PR should be ready at this point..! Some comments for future reference:

  1. Efficiency and implementation of the added function

The added function, set_h_isotope_atom_coords, is here as a specific replacement to SetTerminalAtomCoords and is encapsulated to be called at the specific location. But because the function uses AddHs, the isotope coordinate assignment could have been deferred to add_hydrogens. I didn't implement the return_mol option, as I don't want not introduce more than necessary changes to the existing RDKitMolCreate procedure. Overall I hope this is ok a temporary fix. When we have a better replacement in RDKit, we can just remove this function.

  1. Another occurrence of SetTerminalAtomCoord

There's another occurrence of SetTerminalAtomCoord in molsetup.py. But none of the outer function is currently in use.

@rwxayheee
Copy link
Contributor Author

ad3a81a was a change suggested by @diogomart:
In the process of hydrogen conformer generation, we first made copy_mol from original mol, removed all isotopes as if and added all hydrogens back as if they were all regular hydrogens.
This commit adds the re-ordering procedure after the addition of hydrogens to ensure a full restoration of atom ordering, including heavy atoms AND explicit hydrogen isotopes, by moving regular hydrogens as placeholders for assignment. It reproduces the atom ordering in the input Smiles (from PDBQT, may or may not be the same as SDF for ligand preparation).

0906daf was a bug fix trying to address a design mistake I made. Before, I either passed or gave up too early when trying to restore the chirality tag by swapping the highest-value and lowest-value isotopes - I was not using regular hydrogens as candidates for isotope-swapping, but they should be considered lowest-value isotopes.

@diogomart Could you please verify that the original problems were solved? I tested with a few examples I can think of, including glycines, but maybe not enough. I really appreciate the time you took to look into this.

Copy link
Contributor

@diogomart diogomart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added the chiral deutorated glycine test from yesterday, they pass. Thanks for the great work!

@rwxayheee
Copy link
Contributor Author

Thanks for adding the test!

@rwxayheee
Copy link
Contributor Author

Merging to develop

@rwxayheee rwxayheee merged commit 6432c92 into forlilab:develop Jan 10, 2025
1 check passed
@rwxayheee rwxayheee deleted the update_pos_of_H_isotopes branch January 10, 2025 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants