-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Hi ,Maizie Zhou Lab
First, I want to sincerely thank you for open-sourcing such well-structured work with comprehensive tutorials. The project is both substantial in content and meticulously documented, which greatly benefits the community.
I'm writing to suggest a potential enhancement: Could you consider releasing precomputed embeddings as supplementary materials? I am mainly concerned with clustering, and most clustering is done through embedding. This would particularly help researchers/clients with limited computational resources (like myself) to:
Reproduce clustering implementations across different models
Conduct downstream analysis without requiring heavy computation
Lower the entry barrier for community participation
Specific Suggestions:
Provide embeddings in standard formats (.npy/.h5) for major model variants
Include a minimal example of loading and utilizing these embeddings
Optionally host them on cloud storage (Google Drive, Hugging Face Hub, etc.) with MD5 checksums
This addition could significantly amplify the project's impact by:
✅ Enabling immediate practical applications
✅ Facilitating comparative studies
✅ Encouraging follow-up research from resource-constrained teams
Would this be feasible within your release plan? I'd be happy to help test or document this feature if needed.
Thank you again for your excellent work and open-source contributions!