Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selection of cluster representatives #19

Open
maltesie opened this issue Jan 28, 2025 · 2 comments
Open

Selection of cluster representatives #19

maltesie opened this issue Jan 28, 2025 · 2 comments

Comments

@maltesie
Copy link

Hi,

if I got it right, the longest sequence in a cluster is selected as a representative sequence. I was wondering if there could be another way to select the representative. I understand the rationale behind the longest sequence, but in practice I found them often to be the odd ones within a cluster, having additional genes no other virus has. This is mainly happening when clustering by ANI and qcov without rcov, since that can lead to a larger length distribution within a cluster. This is not a feature request, I'm just interested in your opinion on that, I could not come up with a good solution myself.

Thanks already :)

@agudys
Copy link
Member

agudys commented Jan 28, 2025

Hello @maltesie,

The representativeness of a sequence is indicated by ordering in the objects file (-o input to the cluster step). The align step orders sequences decreasingly by length but you can introduce any criterion of representativeness by reordering objects file prior the cluster step.

Best,
Adam

@maltesie
Copy link
Author

maltesie commented Feb 1, 2025

Hi @agudys ,

thanks for this info, that is good to know. Wouldn't I need the clustering before I can decide the representativeness within a cluster?

Best,
Malte

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants