tsubame-exec
is a simple script to automate running jobs on the
tsubame
supercomputer. In theory this should work on any machine supporting
grid engine style qsub/qstat commands, but I have only tested this on
tsubame.
Usage:
tsubame-exec -c config.toml
The format of the config.toml is as follows:
[connection]
host = "tsubame" # required, may use .ssh/config alias
username = "username"
password = "pw"
[sync.XXXXXX]
from = "~/local_path/"
to = "/gs/bs/tga-test/remote_path"
excludes = [".*", "__*"]
[exec]
cmd = "echo 'hello world'" # required, may be a list of multiple commands
max_runtime = 23:59:59 # may also be a str
name = "test_job"
group = "tga-test"
extra_options = ["-p PRIORITY"]
[exec.env]
dir = "tsubame_exec" # required, directory from which to run cmd
modules = ["intel", "cuda"]
python_deps = ["numpy", "matplotlib"]
env_vars = {TEST = 123}
[exec.resource]
type = "gpu_1"
count = 1 # default 1
In this case, tsubame-exec
will:
- ssh into the remote machine specified in
connection
- For every table under
sync
: - Call rsync to sync local directory
sync.XXX.from
to remote directorysync.XXX.to
- cd into
exec.env.dir
- Generate a script
job.sh
based on the fields in tableexec
- Run
qsub job.sh
and return the job id
- You can have as many tables under the
sync
table as you like (as long as they have unique names). - Python dependencies specified in
exec.env.python_deps
are installed via pip, but this is subject to change. tsubame-exec -c config.toml --tsubame-validation
is equivalent to includingtsubame_validation = true
in the top level of yourconfig.toml
. Use this option to invoke tsubame-specific safety checks.- define global options in
XDG_CONFIG_DIR/tsubame_exec/config.toml
. these options are merged with the config file every run. Use the global file to define, for instance, connection settings or syncs that should be run for every project. - Use
tsubame-exec --tail {stdout/stderr}
to wait for a job to submit and then follow a stream. there is currently no checking to see whether a job has finished, so this will block, use ctrl-c to exit. I suggest printing a message in your code to notify completion. - Some keys support string templating using
string.Template
. Theexec
table gets flattened (in the case of nested dictionaries, the key names are joined by '_'). So, to use the resource type, usecmd = "echo $resource_type"
.exec.name
is templated first, followed byexec.env.dir
, and then finallyexec.cmd
. This order is chosen with purpose; it makes it possible to define a hyperparameter in theexec
table, and then automatically change the job name and create a new directory for your run files.
exec.max_runtime
translates to#$ -l h_rt=TIME
in the generated script- I suspect that most use cases can be covered with the
exec.extra_options
list. Each one is simply prefixed with#$
and inserted into the generated script
- Tests