-
Notifications
You must be signed in to change notification settings - Fork 5
Configure generic_https_download for non-preemptible vms #742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@EddieLF especially want feedback on the name of the new My question is whether the naming of this option will make sense to us in a week's time. |
scripts/generic_https_transfer.py
Outdated
| for idx, url in enumerate(presigned_urls): | ||
| filename = names[idx] if names else os.path.basename(url).split('?')[0] | ||
| j = batch.new_job(f'URL {idx} ({filename})') | ||
| j.spot(is_spot=non_preemptible_vm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“Spot” means the same as “preemptible”… so with the variable being true for a non-preemptible VM this would need to be is_spot=not non_preemptible_vm.
|
FYI analysis-runner has a similar option, like this: and is the same as This is implemented as a flag option named |
Drive-by: I also think that it's good style to frame options in the positive, which would avoid --no-non-preemptible-vm being a thing. |
|
@jmarshall Fixed to use env_config and removed @folded agreed! |
scripts/generic_https_transfer.py
Outdated
| output_prefix = env_config['workflow']['output_prefix'] | ||
| preemptible_vm = env_config['workflow'].get('preemptible_vm', False) | ||
|
|
||
| assert all({billing_project, cpg_driver_image, dataset, output_prefix}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now open question @jmarshall and @folded
How do we evaluate the truthiness of a variable which is likely to be False the majority of the time? Is it a good alternative to use the get method (like I have done) with False as the default value to ensure that this variable is populated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, get like that with the default is a good approach, and you'll see it used in a few places in e.g. server/ar.py.
Do you want the default when preemptible_vm is absent from the config to be preemptible spot VMs like this script has previously used, or do you want to change the default to be non-preemptible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want default to be pre-emptible (i.e. preemptible_vm = true) because this is the common use case across the CPG data team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. This means you want the default to be ….get('preemptible_vm', True).
(The natural state of a job is job.spot(True). On a newly created job, only job.spot(False) actually has an effect in changing the state, to non-preemptible. Me, I find this very confusing and have to look it up every time!)
|
@EddieLF this is good to go (pending my question on Slack RE the security checks). With @jmarshall , changed the default variable for the preemptible machines to make this |
EddieLF
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
jmarshall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too!
This PR adds a
non-preemptible-vmflag forgeneric_https_upload.pyto mitigate against jobs being preempted in Hail batch.This change is required to support download of 10x long read uBAMs (~400GB each), which in a previous attempt to download were preempted several times before the job was cancelled.
Changes
click.option--non-preemptible-vmto allow the user to toggle between preemptible and non-preemptible machines. This is automatically set toFalse.