Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RATT transfers order( , ) coordinates and loses a parenthesis #12

Open
0xaf1f opened this issue Feb 16, 2023 · 5 comments
Open

RATT transfers order( , ) coordinates and loses a parenthesis #12

0xaf1f opened this issue Feb 16, 2023 · 5 comments

Comments

@0xaf1f
Copy link
Contributor

0xaf1f commented Feb 16, 2023

The reference annotation contains

FT   gene            3593369..3593852
FT                   /locus_tag="Rv3216"
FT                   /pseudogene="unknown"
FT                   /db_xref="GeneID:888845"
FT   misc_feature    order(3593369..3593437,3593439..3593852)
FT                   /locus_tag="Rv3216"
FT                   /note="acetyltransferase (2.3.1.-), contains GNAT domain
FT                   (GCN5-like N-acetyltransferase. See Vetting et al. 2005),
FT                   probably pseudogene as appears frameshifted due to 1bp
FT                   insertion at position 3593438. Frameshift present in all
FT                   sequenced tubercle bacilli. Start changed since first
FT                   submission, extended by 50aa."
FT                   /pseudogene="unknown"
FT                   /db_xref="PSEUDO:CCP46032.1"

which gets transferred to the input assembly as

FT   gene            complement(116773..117256)
FT                   /locus_tag="Rv3216"
FT                   /note="*pseudogene: unknown"
FT                   /db_xref="GeneID:888845"
FT                   /gene="Rv3216"
FT   misc_feature    complement(order(116773..117256)
FT                   /locus_tag="Rv3216"

and then parsing the annotation file fails because the misc_feature coordinate has an unbalanced parenthesis.

@ThomasDOtto
Copy link
Owner

ThomasDOtto commented Feb 17, 2023 via email

@haessar
Copy link
Collaborator

haessar commented Mar 13, 2023

@0xaf1f Thomas refers to a fix, are you aware of this?

@0xaf1f
Copy link
Contributor Author

0xaf1f commented Mar 13, 2023

No, I haven't gotten to it yet since I've been working on my own code. I think RATT would benefit from using Bioperl to read/write embl files (it might even take care of #10), but I haven't looked into how disruptive that would be versus updating a regex. I wouldn't suggest waiting for me when your focus is already here.

@haessar
Copy link
Collaborator

haessar commented Mar 17, 2023

Using Bio::SeqIO (Bioperl) would allow me to essentially replace main.ratt.pl:300-500 or so with only a few lines of code, if I have it right. Will put it on the to-do list.

@ThomasDOtto
Copy link
Owner

ThomasDOtto commented Mar 17, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants