This is a collection of module-centric Julia deployments on HPC systems. A common challenge faced by software deployments on shared systems is interacting with other installed system software. Most HPC systems therefore come with an environment modules system -- which is the preferred way to manage HPC software environments. This repo shows how templated environment modules and settings files can be used to install and manage Julia on a shared file systems.
The environment modules on HPC systems can be very complex, often presenting users with different choices in toolchains and versions. A common example is that users can select different MPI implementations, and different GPU runtime versions. This can lead to a combinatorial explosion of different system dependencies.
Hence the solution presented here is to "tie" the Julia configuration into the environment module system.
Requirements:
- Lua and luaposix
- Lmod or Environment Modules
We want to ensure that the installers have have predictable behavior, regardless of the execution context (e.g. where scripts are being run from). Bash kinda sucks at this (especially when handling relative paths) -- an imperfect (but acceptable) solution is to use a launcher script: entrypoint.sh. For example, if you want to render all Julia environment configuration files, please run:
$ ./entrypoint.sh nersc/environments/templates/render.sh
This can be run from anywhere because entrypoint.sh
sets two environment
variables __PREFIX__
, and __DIR__
:
__PREFIX__
contains the location of the project root (i.e. Whereentrypoint.sh
is saved)__DIR__
contains the location of he script being run
This project uses Simple
Templates to render
templates. A standalone version of Simple Templates is located at:
./opt/bin/simple-templates.ex
and only requires a reasonably modern version
of Lua to run (I didn't check how modern it needs to be though).
Simple Templates takes template files formatted in the mustache templating language and populates them with values from a settings (TOML version 0.4) file. This allows us to adapt Julia settings and modules whenever systems are reconfigured and upgraded.
With the exception of Lua and Lmod, the objective of this approach is to be as self-contained as possible. More powerful templating engines need other dependencies to be installed (e.g. a modern version of Python + a virtualenv), which are often not present on bare-bones systems. The reason for going with Lua is to not need a lengthy install procedure. Furthermore, more powerful templating languages are just not needed to generate what are basically just a bunch of settings and module files.
This project uses Simple
Modules to download and
install software, and to render the corresponding module files. A standalone
version of Simple Modules is located at: ./opt/bin/simple-modules.ex
and
requires a reasonably modern version of luaposix (which is used by Lmod also).
Simple Modules is a small utility that lets you easily download, build, and deploy software modules.
We avoid activation and wrapper scripts as much as possible -- Lmod is the best tool for modifying the runtime environment, and it is capable of unwinding those changes when requested.
A consequence of this choice is that all configurations need to be defined when modules are loaded, which can lead to some deployment quirks listed below.
We use configuration files exclusively. Consequently, every conceivable
scenario needs its own set of configuration files. For example, every
combination of CUDA version and MPI toolchain needs to be accounted for. The
approach we take is to generate settings (using templates -- see above) for
every combination (the NERSC deployment is a good example of
this). A module will then put the right
settings files into the JULIA_LOAD_PATH
like so:
local PE_ENV = os.getenv("PE_ENV"):lower()
local FAMILY_MPI = os.getenv("LMOD_FAMILY_MPI"):lower()
local FAMILY_CUDATOOLKIT = os.getenv("LMOD_FAMILY_CUDATOOLKIT_VERSION"):lower()
local ENV_NAME = PE_ENV .. "." .. FAMILY_MPI .. ".cuda" .. FAMILY_CUDATOOLKIT
local JULIA_LOAD_PATH = ":{{{JULIA_LOAD_PATH_PREFIX}}}/" .. ENV_NAME
append_path("JULIA_LOAD_PATH", JULIA_LOAD_PATH)
(Note: the string {{{JULIA_LOAD_PATH_PREFIX}}}
is a template parameter
which is filled out by Simple Modules)
Out approach is not perfect -- one common issue is when Lmod changes dependent
modules, the values for the environment variables PE_ENV
, LMOD_FAMILY_MPI
,
and LMOD_FAMILY_CUDATOOLKIT_VERSION
in the example above might change, making
Lmod "loose track" of JULIA_LOAD_PATH
. Only imperfect (but simple) solution
is to keep track of the set value of JULIA_LOAD_PATH
like so:
local JULIA_LOAD_PATH
if ("unload" == mode()) then
JULIA_LOAD_PATH = os.getenv("__JULIAUP_MODULE_JULIA_LOAD_PATH")
else
JULIA_LOAD_PATH = ":{{{JULIA_LOAD_PATH_PREFIX}}}/" .. ENV_NAME
end
setenv("__JULIAUP_MODULE_JULIA_LOAD_PATH", JULIA_LOAD_PATH)
append_path("JULIA_LOAD_PATH", JULIA_LOAD_PATH)
While not automagically updating JULIA_LOAD_PATH
when ENV_NAME
changes --
this approach allows a module reload
to fix the JULIA_LOAD_PATH
.
Several components of this install are common across many HPC facilities. You can find a description of these here.
Coming soon
Coming soon
Coming soon