Skip to content

Support files formatted in general non-PDBx CIF format #846

@Alex6357

Description

@Alex6357

The CIFBlock.deserialize() method fails to parse CIF data correctly, causing inability to access CIFCategory objects.

Reproduction example:

import biotite.structure.io.pdbx as pdbx

cifblock = pdbx.CIFBlock.deserialize(
    """
_symmetry_space_group_name_H-M   'P 1'
_cell_length_a   3.94513000
_cell_length_b   3.94513000
_cell_length_c   3.94513000
_cell_angle_alpha   90.00000000
_cell_angle_beta   90.00000000
_cell_angle_gamma   90.00000000
_symmetry_Int_Tables_number   1
_chemical_formula_structural   SrTiO3
_chemical_formula_sum   'Sr1 Ti1 O3'
_cell_volume   61.40220340
_cell_formula_units_Z   1
loop_
 _symmetry_equiv_pos_site_id
 _symmetry_equiv_pos_as_xyz
  1  'x, y, z'
loop_
 _atom_site_type_symbol
 _atom_site_label
 _atom_site_symmetry_multiplicity
 _atom_site_fract_x
 _atom_site_fract_y
 _atom_site_fract_z
 _atom_site_occupancy
  Sr  Sr0  1  0.00000000  0.00000000  0.00000000  1.0
  Ti  Ti1  1  0.50000000  0.50000000  0.50000000  1.0
  O  O2  1  0.50000000  0.00000000  0.50000000  1.0
  O  O3  1  0.50000000  0.50000000  0.00000000  1.0
  O  O4  1  0.00000000  0.50000000  0.50000000  1.0
"""
)

print(cifblock._categories)
print(cifblock.get("atom_site"))

Output:

{"symmetry_space_group_name_H-M   'P 1": "_symmetry_space_group_name_H-M   'P 1'\n", 'cell_length_a   3': '_cell_length_a   3.94513000\n', 'cell_length_b   3': '_cell_length_b   3.94513000\n', 'cell_length_c   3': '_cell_length_c   3.94513000\n', 'cell_angle_alpha   90': '_cell_angle_alpha   90.00000000\n', 'cell_angle_beta   90': '_cell_angle_beta   90.00000000\n', 'cell_angle_gamma   90': '_cell_angle_gamma   90.00000000\n', 'symmetry_Int_Tables_number   ': '_symmetry_Int_Tables_number   1\n', 'chemical_formula_structural   SrTiO': '_chemical_formula_structural   SrTiO3\n', "chemical_formula_sum   'Sr1 Ti1 O3": "_chemical_formula_sum   'Sr1 Ti1 O3'\n", 'cell_volume   61': '_cell_volume   61.40220340\n', 'cell_formula_units_Z   ': '_cell_formula_units_Z   1\n', None: 'loop_\n _atom_site_type_symbol\n _atom_site_label\n _atom_site_symmetry_multiplicity\n _atom_site_fract_x\n _atom_site_fract_y\n _atom_site_fract_z\n _atom_site_occupancy\n  Sr  Sr0  1  0.00000000  0.00000000  0.00000000  1.0\n  Ti  Ti1  1  0.50000000  0.50000000  0.50000000  1.0\n  O  O2  1  0.50000000  0.00000000  0.50000000  1.0\n  O  O3  1  0.50000000  0.50000000  0.00000000  1.0\n  O  O4  1  0.00000000  0.50000000  0.50000000  1.0\n'}
None

From my code analysis (based on personal interpretation of the source code), _categories should be a key-value mapping of CIFCategory names to the raw string lines where these names are declared. Currently, the category name parsing is incorrect.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions