Skip to content

Unable to process assembly cif files that have 2 and 3 character auth_asym_ids #96

@danny305

Description

@danny305

The mmCIF file format allows auth_asym_ids to be up to 3 characters long while the PDB format limits them to 1 char. This fundamental difference seems to require refactoring all the chain logic from chat to char *.

I haven't written C/C++ since I last helped you implement the CIF capabilities (almost 3 years ago). I have been trying to implement these changes and I am in over my head and do not feel comfortable trying to implement this change in your C code, especially in all of your logic for processing the PDB files.

For me the problem is with line 612 on cif.cc where it explicitly select the first char of the chain and freesasa_node_atom_chain returns a char.

Here are some pdb codes where this fails when you use the assembly file from the RCSB:

  • 7nzc
  • 7cma
  • 6ihe (this one fails for the wrong atom name after I process it with ChimeraX to add hydrogens--separate issue)

URL to wget the assembly files:

wget -O 7cma-assembly.cif https://files.rcsb.org/download/7cma-assembly1.cif 

Tell me how you want to proceed. I can refactor the C++/cif logic from char to char * but dont think I can refactor the PDB logic, thats all in C and you are doing some intense memory management.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions