Skip to content

[Feature Request] Release Precomputed Embeddings to Facilitate Resource-Constrained Research #7

@wrab12

Description

@wrab12

Hi ,Maizie Zhou Lab

First, I want to sincerely thank you for open-sourcing such well-structured work with comprehensive tutorials. The project is both substantial in content and meticulously documented, which greatly benefits the community.

I'm writing to suggest a potential enhancement: Could you consider releasing precomputed embeddings as supplementary materials? I am mainly concerned with clustering, and most clustering is done through embedding. This would particularly help researchers/clients with limited computational resources (like myself) to:

Reproduce clustering implementations across different models
Conduct downstream analysis without requiring heavy computation
Lower the entry barrier for community participation
Specific Suggestions:

Provide embeddings in standard formats (.npy/.h5) for major model variants
Include a minimal example of loading and utilizing these embeddings
Optionally host them on cloud storage (Google Drive, Hugging Face Hub, etc.) with MD5 checksums
This addition could significantly amplify the project's impact by:
✅ Enabling immediate practical applications
✅ Facilitating comparative studies
✅ Encouraging follow-up research from resource-constrained teams

Would this be feasible within your release plan? I'd be happy to help test or document this feature if needed.

Thank you again for your excellent work and open-source contributions!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions