I'm not sure if this is something we want to fix or just something we want to note (can be a real issue for some viral sequences) but sequences with large numbers of Xs or other non-residue characters are not counted as gaps when applying the sequence coverage threshold.
|
keep_seqs = (1 - ali.count("-", axis="seq")) >= min_cov |
I'm not sure if this is something we want to fix or just something we want to note (can be a real issue for some viral sequences) but sequences with large numbers of Xs or other non-residue characters are not counted as gaps when applying the sequence coverage threshold.
EVcouplings/evcouplings/align/protocol.py
Line 911 in 594b45a