-
Notifications
You must be signed in to change notification settings - Fork 47
Description
Hi Team,
This is not an issue per se but is definitely a strong suggestion by an end user. It's possible for an openfold job to fail for N number of reasons and a lot of the times this happens before cleanup: i.e. the raw colabfold output folder and stale out.tar.gz files remain in the MSA dir. Subsequent runs skip hitting the Colabfold API because of
`if not os.path.isfile(tar_gz_file):` in core/data/tools/colabfold_msa_server.py
which leads to the pipeline using older colabfold raw output folder for the new query as this check only checks file existence, not whether it matches the current query. This results in nondescript downstream errors which are very difficult for the end user to investigate.
This check for tar_gz_file is redundant anyway because if the query runs properly, this tar_gz_file is deleted along with the entire raw folder. On rerun, the pipeline always calls the API again.
My suggestion is to always delete the raw colabfold folder before this line runs. What do you think? Thanks!