-
Notifications
You must be signed in to change notification settings - Fork 102
Configuration
First, you can use either .cfg
or .json
for configuration. Keys and section-names are case-sensitive. (Before April 2018, they were case-insensitive.)
pypeflow-2.0.0 offers a new, more flexible way to configure job-submission via pypeflow.
You should be able to quite, alter any of these, and resume to see the new values take effect. (This was a long-standing request from David Gordon.)
The job.defaults
section should have the basics, and any defaults. Your have several choices.
[job.defaults]
njobs = 32
[job.step.cns]
njobs = 8
That would allow up to 32 simultaneous jobs in most steps, but only 8 during falcon-consensus.
This is simplest, and the first thing you should ever try:
[job.defaults]
pwatcher_type = blocking
submit = /bin/bash -c "${JOB_SCRIPT}"
[General]
pwatcher_type = blocking
# Because of a bug, this is needed in the "General" section, but soon
# it will work from the "job.defaults" too, which is preferred.
If you want to separate stderr/stdout into each task-dir, for isolated debugging:
[job.defaults]
pwatcher_type = blocking
submit = /bin/bash -c "${JOB_SCRIPT}" > "${JOB_STDOUT}" 2> "${JOB_STDERR}"
[General]
pwatcher_type = blocking
Note that there is no &
; these are foreground processes. Each will block the thread which calls one.
It is easy to construct such a string for your own job-submission system, as long as the system
provides a way to do "blocking" calls. E.g. SGE uses -sync y
(and -V
to pass the shell environment):
[job.defaults]
pwatcher_type = blocking
submit = qsub -S /bin/bash -sync y -V \
-q ${JOB_QUEUE} \
-N ${JOB_NAME} \
-o "${JOB_STDOUT}" \
-e "${JOB_STDERR}" \
-pe smp ${NPROC} \
"${JOB_SCRIPT}"
JOB_QUEUE = myqueue
MB = 4000
NPROC = 4
By convention, we use JOB_*
for most variables. However, NPROC
and MB
are special; those limit the resources, so the process itself will be informed. Aside from those, we generate the following automatically:
JOB_STDOUT
JOB_STDERR
JOB_SCRIPT
JOB_NAME
- (Some older aliases are also supported.)
You can provide default values for any of the substitutuion variables. (You can even define your own, but please use all-upper case.) And you can override these in the step-specific sections.
(Btw, we have had trouble with -l h_vmem=${MB}M
.)
[job.step.cns]
NPROC = 24
MB = 2000
Currently, the falcon "steps" are:
job.step.dust
job.step.da
job.step.la
job.step.cns
job.step.pda
job.step.pla
-
job.step.asm
(akajob.step.fc
)
For other examples, see pypeFLOW configuration.
This is fairly normal. We submit jobs somehow, and we poll the filesystem to learn when each job is done.
This is a bit more convenient because we provide useful defaults for various job-submission systems. (We cannot do this generically because each system has a different way of "killing" a job early.)
[job.defaults]
pwatcher_type = fs_based
job_type = sge # choices: local/sge/lsf/pbs/slurm/torque/etc?
JOB_QUEUE = myqueue
[job.defaults]
pwatcher_type = fs_based
job_type = local
This should be used before using sge
etc, since it will test your workflow, independent of any job-submission problems.
It uses &
to put simple processes into the background.
If you do not like our submit
and kill
strings, you can provide your own in [job.defaults]
.
Variable substitutions are the same as for the blocking pwatcher (above).
[job.defaults]
submit = qsub -S /bin/bash --special-flags -q myqueue -N ${JOB_NAME} "${JOB_SCRIPT}"
kill = qdel -j ${JOB_NAME}
It's tricky. And we don't yet have a dry-run mode. But it lets you do whatever you want.
Note: We do not yet have a way to learn the job-number from the submission command, so job-killing is subject to name-collisions. This is one reason why the "blocking" calls are easier to support.
In the past, you would specify overrides for each section.
[General]
default_concurrent_jobs = 32
cns_concurrent_jobs = 8
That would allow up to 32 simultaneous jobs in most steps, but only 8 during falcon-consensus.
[General]
job_queue = mydefaultqueue
sge_option_da = -pe smp 8 -q queueA
sge_option_la = -pe smp 2 -q queueA
sge_option_cns = -pe smp 8 -q queueA
sge_option_pda = -pe smp 8 -q queueB
sge_option_pla = -pe smp 2 -q queueB
sge_option_fc = -pe smp 24 -q queueB
Because we use Python ConfigParser, you could also do this:
[General]
job_queue = myqueue
sge_option_da = -pe smp 8 -q %(job_queue)
Those still work. They are substituted into your "submit" string as ${JOB_OPTS}
if you do not provide JOB_OPTS
yourself. But we recommend using the system above.
Why? Well, for one thing, the job needs to know how many processors were actually reserved for it. Otherwise, it could use whatever it wants. So hard-coded numbers are not helpful.
Also, it is far more flexible. You can set your own submission string, and you can pass-along whatever extra variables you need.
See also:
- https://github.com/PacificBiosciences/pypeFLOW/wiki/configuration -- for general pypeflow configuration
- https://github.com/PacificBiosciences/FALCON/wiki/Options-Available -- for Falcon-specific configuration