Skip to content

Conversation

@amyminiter
Copy link
Contributor

This PR adds a non-preemptible-vm flag for generic_https_upload.py to mitigate against jobs being preempted in Hail batch.

This change is required to support download of 10x long read uBAMs (~400GB each), which in a previous attempt to download were preempted several times before the job was cancelled.

Changes

  1. Add a new click.option --non-preemptible-vm to allow the user to toggle between preemptible and non-preemptible machines. This is automatically set to False.
  2. Add a single line of code after the initialisation of each download job to allow jobs to be configured as non-preemtible.

@amyminiter amyminiter requested a review from EddieLF November 19, 2025 01:05
@amyminiter amyminiter requested review from a team as code owners November 19, 2025 01:05
@amyminiter
Copy link
Contributor Author

@EddieLF especially want feedback on the name of the new click.option. 'Pre-emptible' already sounds like it's been negated due to the prefix but then you can't strip that away because 'emptible' isn't a word.

My question is whether the naming of this option will make sense to us in a week's time.

for idx, url in enumerate(presigned_urls):
filename = names[idx] if names else os.path.basename(url).split('?')[0]
j = batch.new_job(f'URL {idx} ({filename})')
j.spot(is_spot=non_preemptible_vm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“Spot” means the same as “preemptible”… so with the variable being true for a non-preemptible VM this would need to be is_spot=not non_preemptible_vm.

@jmarshall
Copy link
Contributor

FYI analysis-runner has a similar option, like this:

  --preemptible, --no-preemptible
                        Whether to use a preemptible machine or not.

and is the same as --preemptible by default (it should probably say what the default is in the usage!).

This is implemented as a flag option named preemptible that defaults to True, so you get to write job.spot(preemptible) hopefully (relatively) clearly…

@folded
Copy link

folded commented Nov 19, 2025

FYI analysis-runner has a similar option, like this:

  --preemptible, --no-preemptible
                        Whether to use a preemptible machine or not.

and is the same as --preemptible by default (it should probably say what the default is in the usage!).

This is implemented as a flag option named preemptible that defaults to True, so you get to write job.spot(preemptible) hopefully (relatively) clearly…

Drive-by: I also think that it's good style to frame options in the positive, which would avoid --no-non-preemptible-vm being a thing.

@amyminiter
Copy link
Contributor Author

@jmarshall Fixed to use env_config and removed click (much cleaner and nice to learn there's an established way of doing things)

@folded agreed!

output_prefix = env_config['workflow']['output_prefix']
preemptible_vm = env_config['workflow'].get('preemptible_vm', False)

assert all({billing_project, cpg_driver_image, dataset, output_prefix})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now open question @jmarshall and @folded
How do we evaluate the truthiness of a variable which is likely to be False the majority of the time? Is it a good alternative to use the get method (like I have done) with False as the default value to ensure that this variable is populated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, get like that with the default is a good approach, and you'll see it used in a few places in e.g. server/ar.py.

Do you want the default when preemptible_vm is absent from the config to be preemptible spot VMs like this script has previously used, or do you want to change the default to be non-preemptible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want default to be pre-emptible (i.e. preemptible_vm = true) because this is the common use case across the CPG data team.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. This means you want the default to be ….get('preemptible_vm', True).

(The natural state of a job is job.spot(True). On a newly created job, only job.spot(False) actually has an effect in changing the state, to non-preemptible. Me, I find this very confusing and have to look it up every time!)

@amyminiter
Copy link
Contributor Author

@EddieLF this is good to go (pending my question on Slack RE the security checks). With @jmarshall , changed the default variable for the preemptible machines to make this True - the standard use case is for CPG'ers to use the preemptible machine). And as the is_spot variable is configured automatically as True, this is not modified unless the individual submitting the request has asserted otherwise via the config file.

Copy link
Contributor

@EddieLF EddieLF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jmarshall jmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too!

@amyminiter amyminiter merged commit 189f33c into main Nov 20, 2025
7 of 10 checks passed
@amyminiter amyminiter deleted the make_https_download_preemptible branch November 20, 2025 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants