Bulk Download of MSA Files from AlphaFold DB

**Description:**

I'm trying to download Multiple Sequence Alignment (MSA) files for several million proteins from the AlphaFold database (https://alphafold.ebi.ac.uk/).

For individual proteins, I can successfully download MSA files through the "Download files" section on the protein entry pages (e.g., https://alphafold.ebi.ac.uk/files/msa/AF-G1JSI4-F1-msa_v6.a3m). However, I need to download MSA files at scale using a list of protein IDs.
I've explored several options but encountered limitations:

- Direct API-style downloads​ using the individual protein links - This appears to work for single files, but I'm concerned about potential rate limiting when scaling to millions of requests. I couldn't find documentation about API rate limits or bulk download policies.
- [Google Cloud bucket](https://console.cloud.google.com/marketplace/product/bigquery-public-data/deepmind-alphafold?project=gen-lang-client-0423459879)​ - The available data appears to be limited to version v4 and doesn't include MSA files.
- EBI FTP server​ (https://ftp.ebi.ac.uk/pub/databases/alphafold/) - While the changelog mentions MSA updates, I couldn't locate the actual MSA files in the directory structure.

**Questions:**

- What is the recommended approach for bulk downloading MSA files given a list of protein IDs?
- Are there any rate limits or best practices I should follow when making large numbers of requests to the individual download endpoints?

Thank you for your assistance and for maintaining this valuable resource!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk Download of MSA Files from AlphaFold DB #1111

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bulk Download of MSA Files from AlphaFold DB #1111

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions