-
Notifications
You must be signed in to change notification settings - Fork 40
Description
Is your feature request related to a problem? Please describe.
Sometimes my snakemake rules have a slow ramp-up on RAM usage, and upon failure the pipeline grants the rules more ram with the retry system. However, my SLURM env (and others) have requeue options that cause failed jobs to re-run a default of 5 times before the job is killed and retry has a chance to resubmit with more resources.
Effectively, this causes jobs to be run 4 extra times with failed parameters before snakemake with the slurm executor plugin can do its thing and resubmit the job again with more resources.
Describe the solution you'd like
I would like there to be a --no-requeue option for the slurm executor plugin to override the system or user defaults.
Describe alternatives you've considered
This behavior can work with the generic cluster executor for snakemake by passing the --no-requeue optional command. This could also work by changing user configs for SLURM, but that isn't very portable and could clobber someone's settings when I just want this behavior for some scripts. I don't know how this could be set up with the snakemake profile files, even if it could though that may be bit heavy-handed since there are strict requirements on the path of the snakemake profile files, reducing its portability.
From what I can tell, the options for requeue in the plugin are limited to a flag that turns requeue on (--slurm-requeue). Maybe I've misunderstood something, so thanks if you can clarify!
Additional context
Happy to submit a PR if you have ideas on what no-requeue behavior you think would mesh well with the plugin!