Skip to content

Conversation

@EddieLF
Copy link
Contributor

@EddieLF EddieLF commented Mar 11, 2024

Automatically moves the tar file to a /completed folder upon successfully extracting its contents.

This means that if the batch has to be re-run on the same bucket path due to one or more job failures, the tarballs that were successfully extracted will no longer be picked up and queued for re-extraction.

@EddieLF EddieLF requested a review from MattWellie March 11, 2024 06:02
# Move the tarball to the "completed" directory
subprocess.run(
[
'gsutil',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the spirit of modernisation can we bump this to gcloud storage

@jmarshall
Copy link
Contributor

jmarshall commented Mar 11, 2024

Does it need to also look in the …/completed/… path when it's first downloading the tarball, so that it can stop cleanly if this tarball has already been done? At the moment, presumably on re-run it would throw an exception and fail. (ETA: Oh, I guess completed is a signal to the operator to not re-run it on that one! 😄)

AIUI renaming — even within the same bucket — is not a free operation and is really a copy+delete. I'm probably overthinking this, but could we instead attach some metadata saying it's been done? (I haven't used this API so don't know if this is even feasible…)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants