-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Labels
Description
pdb matching is a very useful step for pdb analysis. It would be nice if we could add this to pdb2sql.
Expected performance:
INPUT:
- a reference pdb file with multiple chains
- a set of pdb files for the same protein complex but with different numbering and chain IDs
OUTPUT:
- chain ID mapping
- pdb files renumbered based on the reference pdb. Chain IDs are also changed based on the reference pdb
Ideally, we hope to separate pdb_matching into two functions (steps):
Step 1. pdb_match_chn_batch.py: match chain IDs of pdb files to ref.pdb. Output _newChnID.pdb files.
Note: This step can be skipped if model.pdb files have already matched chain IDs. This step is also error-prone when multiple chains are highly similar to each other. Therefore, a human visual check is necessary.
Step 2. pdb_renum_batch.py: align and renumber pdb files to ref.pdb. Output _renum.pdb files.
There are two existing solutions:
- https://github.com/LilySnow/PDB-matching (python + cpp)
- DeepRank/haddock-tools@ed9beee (python, by the haddock group)
Maybe we could use these solutions as the basis?