Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 56 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ For more information on the original eDNAFlow pipeline and other software used a
+ [Non-demultiplexed paired-end runs](#non-demultiplexed-paired-end-runs)
+ [For previously-demultiplexed paired-end runs](#for-previously-demultiplexed-paired-end-runs)
* [Contents of output directories](#contents-of-output-directories)
* [Configuration profiles](#configuration-profiles)
* [When things go wrong (interpreting errors)](#when-things-go-wrong-interpreting-errors)
* [A note on globs/wildcards](#a-note-on-globswildcards)
- [Description of run options](#description-of-run-options)
Expand Down Expand Up @@ -139,12 +140,7 @@ For manual installation of Nextflow, follow the instructions at [on the nextflow
To install Singularity manually, follow the instructions at [singularity installation](https://sylabs.io/guides/3.5/admin-guide/installation.html). If working on HPC, you may need to contact your HPC helpdesk. The Singularity installation how-to is long and complicated, but if you're on a RedHat or Debian-adjacent distro, there are .deb and .rpm packages that can be found at [https://github.com/sylabs/singularity/releases/latest](https://github.com/sylabs/singularity/releases/latest).

#### Podman
Manual podman installation instructions can be found [here](https://podman.io/docs/installation). rainbow_bridge should work with podman out of the box, but you will have to specify a podman profile for it to function properly. There are two profiles built in: `podman_arm`, and `podman_intel`. These both tell nextflow to use podman for its container system, and the second half just specifies your CPU architecture. Most systems will probably use the `_intel` variant, but if you are on a newer Mac with Apple silicon, you'll want to use the `_arm` variant. To run the pipeline using podman as its container system, specify a profile with the following option (note the single dash in the '-profile' option):

```bash
# in this example, we're running on a Mac with the M3 processor
$ rainbow_bridge.nf -profile podman_arm <further options>
```
Manual podman installation instructions can be found [here](https://podman.io/docs/installation). rainbow_bridge should work with podman out of the box, but you will have to specify a podman profile for it to function properly. There are two profiles built in: `podman_arm`, and `podman_intel`. These both tell nextflow to use podman for its container system, and the second half just specifies your CPU architecture. Most systems will probably use the `_intel` variant, but if you are on a newer Mac with Apple silicon, you'll want to use the `_arm` variant. See the section on [configuration profiles](#configuration-profiles) for information on using these named profiles.

### Testing installation

Expand Down Expand Up @@ -391,6 +387,57 @@ When the pipeline finishes, output from each step can be found in directories co
| work/ | A bunch of nonsense | All internal and intermediate files processed by nextflow | |
| .nextflow/ | various | Hidden nextflow-generated internal folder | |

## Configuration profiles

Resource availability, container subsystems, and various other aspects vary from computer to computer. To that end, nextflow allows the creation of custom named configuration profiles that can be loaded when running rainbow_bridge to customize various settings. Details about the creation of these profiles is beyond the scope of this documentations, but can be found in the [nextflow documentation](https://www.nextflow.io/docs/latest/config.html#config-profiles). By default, rainbow_bridge loads the `standard` profile, which uses the 'local' nextflow executor and limits its maximum CPUs and memory to the system limits or the values passed to the `--max-cpus` and `--max-memory` options (whichever is smaller). It also uses singularity as its default container engine. rainbow_bridge comes with the following built-in configuration profiles:

| Profile name | Description |
| ------------ | ----------- |
| standard (loaded automatically) | Default profile: local executor, cpus/memory set to system limits or `--max-cpus`/`--max-memory`, singularity container engine |
| singularity | Enables the singularity container engine |
| podman_intel | Enables the podman container engine with intel architecture |
| podman_arm | Enables the podman container engine with ARM architecture |

To use any named profile when running rainbow_bridge, simply pass it using the `-profile <profile name>` option (note again the single dash, since it's a nextflow option and not a rainbow_bridge option). Note that specifying any named profile will override the `standard` profile, so that container/executor settings may need to be redefined.

rainbow_bridge will automatically load profiles found in files matching the pattern `conf/profiles/*.config` within the pipeline's installation directory. To create a custom profile, first define your profile in a file with the `.config` extension and copy it to `conf/profiles` subdirectory under the location of the rainbow_bridge script file. For example, if you've got a server called `bigiron` with 100 cpus and 700 GB of memory and you've installed rainbow_bridge to `/opt/pipelines/rainbow_bridge`, you could create a file called `bigiron.config`, and save it to `/opt/pipelines/rainbow_bridge/conf/profiles`. The `bigiron.config` file might look something like this:

```
bigiron {
executor {
name = 'local'
cpus = 100
memory = 700.GB
}
}
```

As mentioned above, this profile will override the `standard` profile, and since a container system is not specified, nextflow will look for executables on the local filesystem. Fortunately, nextflow supports multiple profiles: just separate the names with a comma. For this example, if we wanted to use the `bigiron` profile with the singularity container system, we could launch rainbow_bridge using `-profile bigiron,singularity`, like this:

```console
$ rainbow_bridge.nf -profile bigiron,singularity <...further options...>
```

If you want to define a profile but don't have write access to the `<rainbow_bridge>/conf/profiles` directory, you can create a custom config file containing your profile, save it anywhere, and pass its filename to rainbow_bridge with the `-c` option (single dash again!). rainbow_bridge will still load any built-in profiles from `conf/profiles`. In this case, you will have to enclose your profile definition in the `profiles {}` scope, like this:

```
profiles {
bigiron {
executor {
name = 'local'
cpus = 100
memory = 700.GB
}
}
}
```

And (assuming you've named the file `bigiron.config` and saved it in the directory where you're running your analysis), execute the pipeline like this:

```console
$ rainbow_bridge.nf -c bigiron.config -profile bigiron,singularity <...further options...>
```

## When things go wrong (interpreting errors)

Occasionally your pipeline run will encounter something it doesn't know how to handle and it will fail. There are two general failure modes: silent and loud.
Expand Down Expand Up @@ -796,6 +843,9 @@ These options allow you to allocate resources (CPUs and memory) to rainbow_bridg
<small>**`--max-memory [mem]`**</small>: Maximum memory available to nextflow processes, e.g., '8.GB' (default: maximum available system memory)
<small>**`--max-cpus [num]`**</small>: Maximum cores available to nextflow processes (default: maximum available system CPUs)
<small>**`--max-time [time]`**</small>: Maximum time allocated to each pipeline process, e.g., '2.h' (default: 10d)
<small>**`--max-retries [num]`**</small>: The maxmimum number of times (default: 1) rainbow_bridge will attempt to re-execute a process that fails due to resource limitations. Resource allocation requests will be multiplied by the number of retry attempts.

Within rainbow_bridge, different processes are allocated different amount of base resources, depending on how memory- or CPU-intensive they are. However, the pipeline will not exceed the values passed to `--max-cpus` or `--max-memory`. Thus, if a given process is allocated 6 CPUs by default but the user passes `--max-cpus 2`, it will only use 2 CPUs.

### Singularity options

Expand Down
47 changes: 29 additions & 18 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,19 @@
* but also the base resource usage settings
*/
process {

// set caching option
cache = 'lenient'

// set default cpus and memory
cpus = { check_max( 1 * task.attempt, 'cpus' ) }
memory = { check_max( 6.GB * task.attempt, 'memory' ) }

// set error strategy
errorStrategy = { task.exitStatus in ((130..145) + 104 + 175) ? 'retry' : 'finish' }
maxRetries = { params.maxRetries }
maxErrors = '-1'

withLabel: 'shell' { container = 'quay.io/nextflow/bash:latest' }
withLabel: 'obitools' { container = 'quay.io/biocontainers/obitools:1.2.13--py27heb79e2c_3' }
withLabel: 'blast' { container = 'quay.io/biocontainers/blast:2.17.0--h66d330f_0' }
Expand Down Expand Up @@ -37,34 +50,34 @@ process {
// these labels control various aspects of resource allocation
withLabel:process_single {
cpus = { 1 }
memory = { 6.GB * task.attempt }
memory = { check_max(6.GB * task.attempt,'memory') }
// time = { 4.h * task.attempt }
}
withLabel:process_low {
cpus = { 2 * task.attempt }
memory = { 12.GB * task.attempt }
cpus = { check_max(2 * task.attempt ,'cpus')}
memory = { check_max(12.GB * task.attempt,'memory') }
// time = { 4.h * task.attempt }
}
withLabel:process_lowish {
cpus = { 4 * task.attempt }
memory = { 12.GB * task.attempt }
cpus = { check_max(4 * task.attempt ,'cpus')}
memory = { check_max(12.GB * task.attempt,'memory') }
// time = { 4.h * task.attempt }
}
withLabel:process_medium {
cpus = { 6 * task.attempt }
memory = { 36.GB * task.attempt }
cpus = { check_max(6 * task.attempt ,'cpus')}
memory = { check_max(36.GB * task.attempt,'memory') }
// time = { 8.h * task.attempt }
}
withLabel:process_high {
cpus = { 12 * task.attempt }
memory = { 72.GB * task.attempt }
cpus = { check_max(12 * task.attempt ,'cpus')}
memory = { check_max(72.GB * task.attempt,'memory') }
// time = { 16.h * task.attempt }
}
withLabel:process_more_memory {
memory = { 10.GB * task.attempt }
memory = { check_max(10.GB * task.attempt,'memory') }
}
withLabel:process_high_memory {
memory = { 200.GB * task.attempt }
memory = { check_max(200.GB * task.attempt,'memory') }
}
withLabel:error_ignore {
errorStrategy = 'ignore'
Expand All @@ -73,12 +86,10 @@ process {
errorStrategy = 'retry'
maxRetries = 2
}
// allow all cpus
withLabel: 'all_cpus' { cpus = { (int)params.maxCpus } }

cache = 'lenient'

// set default cpus and memory
cpus = { check_max( 1 * task.attempt, 'cpus' ) }
memory = { check_max( 6.GB * task.attempt, 'memory' ) }
// allocate full spread of cpus & memory
withLabel:process_full {
cpus = { params.maxCpus }
memory = { params.maxMemory }
}
}
10 changes: 10 additions & 0 deletions conf/profiles/docker.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
docker {
docker.enabled = true
conda.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
apptainer.enabled = false
docker.runOptions = '-u $(id -u):$(id -g)'
}
8 changes: 8 additions & 0 deletions conf/profiles/podman_arm.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
/* use podman as container engine assuming ARM architecture */
/* (e.g., for mac with apple silicon) */
podman_arm {
podman {
enabled = true
runOptions = "--platform linux/arm64"
}
}
8 changes: 8 additions & 0 deletions conf/profiles/podman_intel.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
/* use podman as container assuming intel architecture */
/* (e.g., for mac with intel silicon) */
podman_intel {
podman {
enabled = true
runOptions = "--platform linux/amd64"
}
}
20 changes: 20 additions & 0 deletions conf/profiles/singularity.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
singularity {
singularity.enabled = true
singularity.autoMounts = true
conda.enabled = false
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
apptainer.enabled = false

// construct options for singularity bind directories
if (params.bindDir && params.bindDir != '') {
runOptions = "-B " + params.bindDir.split().join(" -B ")
}

// set singularity cache directory if specified
if (params.singularityCache && params.singularityCache != "") {
cacheDir = params.singularityCache
}
}
3 changes: 3 additions & 0 deletions lib/helper.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,9 @@ class helper {
--max-memory [mem] Maximum memory available to nextflow processes, e.g., '8.GB' (default: ${params.maxMemory})
--max-cpus [num] Maximum cores available to nextflow processes default: ${params.maxCpus})
--max-time [time] Maximum time allocated to each pipeline process, e.g., '2.h' (default: ${params.maxTime})
--max-retries [num] The maxmimum number of times rainbow_bridge will attempt to re-execute a process
that fails due to resource limitations (with increased resources for each iteration)
(default: 1)

Singularity options:
--bind-dir [dir] Space-separated list of directories to bind within singularity images
Expand Down
96 changes: 71 additions & 25 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,8 @@ trace {
/*
* Define the pipeline parameters and their default values
* Each of these parameters can be specified at command line (e.g. --barcode 'x.txt'); if none specified the below will be set as default
* We set them like this instead of inside a params {} block because this way their case gets properly translated from cameCase to kebab-case
*/
params {
params {
/*
* these options concern where to find sequence reads and barcode file,
* which can be done several ways
Expand Down Expand Up @@ -154,9 +153,10 @@ params {
help = false

/* resource options */
maxMemory = Runtime.runtime.maxMemory()
maxCpus = Runtime.runtime.availableProcessors()
maxMemory = 6.GB
maxCpus = 1
maxTime = 240.h
maxRetries = 1
}
// the parameter case translation hasn't happened yet at this
// point in the config reading/parsing, so we'll have to handle it
Expand Down Expand Up @@ -195,21 +195,79 @@ def check_max(obj, type) {
}
}

// presumably platform-independent function to get total memory (in bytes)
// and optionally return it as a MemoryUnit
def mem(mu=true) {
def total = 0
try {
// Try JVM-specific bean (works on most HotSpot / OpenJDK)
def osBean = java.lang.management.ManagementFactory.getOperatingSystemMXBean()
if (osBean.metaClass.hasProperty(osBean, "totalPhysicalMemorySize")) {
total = osBean.totalPhysicalMemorySize
}
} catch (e) {
// Fall back to OS commands
def os = System.getProperty("os.name").toLowerCase()
if (os.contains("linux")) {
// Parse /proc/meminfo
def meminfo = new File("/proc/meminfo").text
total = (meminfo =~ /MemTotal:\s+(\d+)\s+kB/)[0][1].toLong() * 1024L
}
else if (os.contains("mac")) {
// macOS sysctl
def m = "sysctl -n hw.memsize".execute().text.trim().toLong()
total = m
}
else if (os.contains("win")) {
// Windows WMIC (older versions) or PowerShell (newer)
try {
def m = "wmic computersystem get TotalPhysicalMemory".execute().text.find(/\d+/).toLong()
total = m
} catch (ignored) {
// PowerShell fallback
def psCmd = "powershell Get-CimInstance Win32_OperatingSystem | Select-Object TotalVisibleMemorySize,FreePhysicalMemory"
def out = psCmd.execute().text
def numbers = (out =~ /\d+/)*.toLong()
if (numbers.size() >= 2) {
def totalKb = numbers[0]
total = totalKb * 1024L
}
}
}
}

try {
if (mu) {
return "${(int)Math.floor(total/1024L/1024L)}.MB" as nextflow.util.MemoryUnit
} else {
return total
}
} catch (ignored) {
return -1
}
}

/* Load base.config by default for all profiles */
includeConfig 'conf/base.config'

/* define execution profiles */
profiles {

/* standard profile is loaded by default */
standard {

// make default executor local
// and limit max cpus to param value
executor.name = 'local'
executor.cpus = (int)params.maxCpus
executor.memory = params.maxMemory

// limit max cpus to min of param value and system processors
executor.cpus = Math.min(Runtime.runtime.availableProcessors(),(int)params.maxCpus)

// get system memory in bytes
def sys_mem = mem(false)
// get params.maxMemory as memory unit
def p_mem = params.maxMemory as nextflow.util.MemoryUnit
// limit memory to min of system memory and maxMemory param
executor.memory = Math.min(sys_mem,p_mem.bytes)

singularity {
/* enable singularity and have it do automounts */
Expand All @@ -228,21 +286,9 @@ profiles {
}
}

/* use podman as container engine assuming ARM architecture */
/* (e.g., for mac with apple silicon) */
podman_arm {
podman {
enabled = true
runOptions = "--platform linux/arm64"
}
}

/* use podman as container assuming intel architecture */
/* (e.g., for mac with intel silicon) */
podman_intel {
podman {
enabled = true
runOptions = "--platform linux/amd64"
}
}
// load external profiles from conf/profiles/*.config
try {
new java.io.File("${projectDir}/conf/profiles")
.eachFileMatch(~/(?i)^.+\.config$/) { file -> includeConfig file.absolutePath }
} catch (java.io.FileNotFoundException ignore) { }
}
6 changes: 3 additions & 3 deletions rainbow_bridge.nf
Original file line number Diff line number Diff line change
Expand Up @@ -417,7 +417,7 @@ process merge_relabeled {
// dereplication, chimera removal, zOTU table generation
process dereplicate {
label 'denoiser'
label 'process_high'
label 'process_full'

publishDir "${params.outDir}/zotus", mode: params.publishMode

Expand Down Expand Up @@ -534,7 +534,7 @@ process dereplicate {
// run blast query
process blast {
label 'blast'
label 'all_cpus'
label 'process_full'

publishDir {
def pid = String.format("%d",(Integer)num(params.percentIdentity ))
Expand Down Expand Up @@ -693,7 +693,7 @@ process collapse_taxonomy {
// run insect classifier model
process insect {
label 'r'
label 'all_cpus'
label 'process_full'

publishDir {
def offs = String.format("%d",(Integer)num(params.insectOffset))
Expand Down