Skip to content

PDB format output with numbers as chain ID #43

@fwaibl

Description

@fwaibl

Hi. I am using atomium to extract molecules from mmCIF files and write them into PDB format. Generally, this works really well, but I encountered an issue with structures where the chain ID is a number instead of a letter.

Expected behaviour

The chain ID should not be written as part of the residue number, but only in the column reserved for the chain ID.

Actual behaviour

When the chain ID is a number, it is written into the PDB string twice (once as chain ID and once as part of the residue number). The resulting files are too broad for the PDB specification and are parsed badly by many other programs.

Example code to reproduce

import atomium
cif = atomium.fetch("6L4T")
lig = [l for l in cif.model.ligands() if l.id == "13.308"][0]
print(atomium.pdb.structure_to_pdb_string(lig))

Output (truncated):

HETATM20582  NB  KC1 1313308     208.930 314.544 325.109  1.00 90.18           N  
HETATM20583  ND  KC1 1313308     205.979 312.067 326.352  1.00 90.18           N  
HETATM20584  C1A KC1 1313308     208.131 312.489 328.676  1.00 90.18           C  
HETATM20585  C1B KC1 1313308     209.880 315.122 325.835  1.00 90.18           C  
HETATM20586  C1C KC1 1313308     206.761 314.055 322.987  1.00 90.18           C  
HETATM20587  C1D KC1 1313308     204.767 311.511 325.824  1.00 90.18           C  

Note that the chain ID ("13") is written twice.

Python Version/Operating System

I am using atomium 1.0.11 (from conda-forge) on Python 3.10 / Linux

Thanks in advance for your support, and thanks for publishing atomium as open-source :-)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions