Get sequences command give me an error #328
-
|
I'm trying to read a few fasta files which various entries, a few of them have symbols like J or X or N and this is giving me an error, how I can read this kind of files. The code I'm using is: And the error I'm getting is: During handling of the above exception, another exception occurred: Traceback (most recent call last): Thanks in advance. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
|
1. Read the sequences as string, replace the symbol with an appropriate replacement and create a sequences = {header : ProteinSequence(seq_str.replace("J", "L")) for header, seq_str in fasta_file.items()}2. Read the sequences as string and create alphabet = LetterAlphabet(ProteinSequence.alphabet.get_symbols() + ["J"])
sequences = {header : GeneralSequence(alphabet, seq_str) for header, seq_str in fasta_file.items()} |
Beta Was this translation helpful? Give feedback.
'J'is currently not a symbol in the amino acid alphabet. Hence neither aNucleotideSequenceorProteinSequencecan be created from the sequences in your FASTA file. There are two possible solutions to this issue, both using the low-level API ofFastaFilethat returns strings instead ofSequenceobjects:1. Read the sequences as string, replace the symbol with an appropriate replacement and create a
ProteinSequence2. Read the sequences as string and create
GeneralSequenceobjects with a custom alphabet