Skip to content

Always delete raw Colabfold folder before processing #39

@qurat-ul-ain95

Description

@qurat-ul-ain95

Hi Team,

This is not an issue per se but is definitely a strong suggestion by an end user. It's possible for an openfold job to fail for N number of reasons and a lot of the times this happens before cleanup: i.e. the raw colabfold output folder and stale out.tar.gz files remain in the MSA dir. Subsequent runs skip hitting the Colabfold API because of

`if not os.path.isfile(tar_gz_file):` 

in core/data/tools/colabfold_msa_server.py

which leads to the pipeline using older colabfold raw output folder for the new query as this check only checks file existence, not whether it matches the current query. This results in nondescript downstream errors which are very difficult for the end user to investigate.

This check for tar_gz_file is redundant anyway because if the query runs properly, this tar_gz_file is deleted along with the entire raw folder. On rerun, the pipeline always calls the API again.

My suggestion is to always delete the raw colabfold folder before this line runs. What do you think? Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdata preprocessingRelating to the preprocessing of queries and datasets

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions