You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
if I got it right, the longest sequence in a cluster is selected as a representative sequence. I was wondering if there could be another way to select the representative. I understand the rationale behind the longest sequence, but in practice I found them often to be the odd ones within a cluster, having additional genes no other virus has. This is mainly happening when clustering by ANI and qcov without rcov, since that can lead to a larger length distribution within a cluster. This is not a feature request, I'm just interested in your opinion on that, I could not come up with a good solution myself.
Thanks already :)
The text was updated successfully, but these errors were encountered:
The representativeness of a sequence is indicated by ordering in the objects file (-o input to the cluster step). The align step orders sequences decreasingly by length but you can introduce any criterion of representativeness by reordering objects file prior the cluster step.
Hi,
if I got it right, the longest sequence in a cluster is selected as a representative sequence. I was wondering if there could be another way to select the representative. I understand the rationale behind the longest sequence, but in practice I found them often to be the odd ones within a cluster, having additional genes no other virus has. This is mainly happening when clustering by ANI and qcov without rcov, since that can lead to a larger length distribution within a cluster. This is not a feature request, I'm just interested in your opinion on that, I could not come up with a good solution myself.
Thanks already :)
The text was updated successfully, but these errors were encountered: