Skip to content
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion scripts/generic_https_transfer.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,10 @@
help='Use filenames defined before each url',
)
@click.option('--presigned-url-file-path')
def main(presigned_url_file_path: str, filenames: bool):
def main(
presigned_url_file_path: str,
filenames: bool
):
"""
Given a list of presigned URLs, download the files and upload them to GCS.
If each signed url is prefixed by a filename and a space, use the --filenames flag
Expand All @@ -38,7 +41,10 @@ def main(presigned_url_file_path: str, filenames: bool):
billing_project = env_config['hail']['billing_project']
dataset = env_config['workflow']['dataset']
output_prefix = env_config['workflow']['output_prefix']
preemptible_vm = env_config['workflow'].get('preemptible_vm', False)

assert all({billing_project, cpg_driver_image, dataset, output_prefix})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now open question @jmarshall and @folded
How do we evaluate the truthiness of a variable which is likely to be False the majority of the time? Is it a good alternative to use the get method (like I have done) with False as the default value to ensure that this variable is populated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, get like that with the default is a good approach, and you'll see it used in a few places in e.g. server/ar.py.

Do you want the default when preemptible_vm is absent from the config to be preemptible spot VMs like this script has previously used, or do you want to change the default to be non-preemptible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want default to be pre-emptible (i.e. preemptible_vm = true) because this is the common use case across the CPG data team.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. This means you want the default to be ….get('preemptible_vm', True).

(The natural state of a job is job.spot(True). On a newly created job, only job.spot(False) actually has an effect in changing the state, to non-preemptible. Me, I find this very confusing and have to look it up every time!)


names = None
with AnyPath(presigned_url_file_path).open() as file:
if filenames:
Expand All @@ -65,6 +71,7 @@ def main(presigned_url_file_path: str, filenames: bool):
for idx, url in enumerate(presigned_urls):
filename = names[idx] if names else os.path.basename(url).split('?')[0]
j = batch.new_job(f'URL {idx} ({filename})')
j.spot(is_spot=preemptible_vm)
quoted_url = quote(url)
authenticate_cloud_credentials_in_job(job=j)
# catch errors during the cURL
Expand Down
Loading