diff --git a/CHANGELOG.md b/CHANGELOG.md index 01a0a76..deeff03 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,14 +9,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - [#117](https://github.com/nf-core/createtaxdb/pull/117) - Updated to nf-core/tools template 3.5.1 (by @jfy133) - [#143](https://github/com/nf-core/createtaxdb/pull/143) - Documented how to resovle KrakenUniq unbound variable jellfish issue (❤️ to @flass for suggesting, added by @jfy133) +- [#140](https://github/com/nf-core/createtaxdb/pull/140) - Added MetaCache database building support (❤️ to @ChillarAnand for suggestion, added by @alxndrdiaz and @jfy133) ### `Fixed` ### `Dependencies` -| Tool | Old Version | New Version | -| ------- | ----------- | ----------- | -| nf-core | 3.4.1 | 3.5.1 | +| Tool | Old Version | New Version | +| --------- | ----------- | ----------- | +| nf-core | 3.4.1 | 3.5.1 | +| MetaCache | | 2.5.0 | ### `Deprecated` diff --git a/CITATIONS.md b/CITATIONS.md index 5b7aadf..98b24be 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -46,6 +46,10 @@ > Vågene, Å. J., Herbig, A., Campana, M. G., Robles García, N. M., Warinner, C., Sabin, S., Spyrou, M. A., Andrades Valtueña, A., Huson, D., Tuross, N., Bos, K. I., & Krause, J. (2018). Salmonella enterica genomes from victims of a major sixteenth-century epidemic in Mexico. Nature Ecology & Evolution, 2(3), 520–528. https://doi.org/10.1038/s41559-017-0446-6 +- [MetaCache](https://doi.org/10.1093/bioinformatics/btx520) + + > Müller, A., Hundt, C., Hildebrandt, A., Hankeln, T., & Schmidt, B. (2017). MetaCache: context-aware classification of metagenomic reads using minhashing. Bioinformatics (Oxford, England), 33(23), 3740–3748. https://doi.org/10.1093/bioinformatics/btx520 + - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. diff --git a/README.md b/README.md index 22be037..19c1635 100644 --- a/README.md +++ b/README.md @@ -45,6 +45,7 @@ The pipeline is designed to be a companion pipeline to [nf-core/taxprofiler](htt - [MALT](https://doi.org/10.1038/s41559-017-0446-6) - [sourmash](https://doi.org/10.21105/joss.06830) - [sylph](https://doi.org/10.1038/s41587-024-02412-y) + - [MetaCache](https://doi.org/10.1093/bioinformatics/btx520) ## Usage @@ -81,6 +82,7 @@ nextflow run nf-core/createtaxdb \ --ganon_build_options='--kmer-size 45' \ --build_diamond \ --diamond_build_options='--no-parse-seqids' \ + --build_metacache \ --outdir ``` diff --git a/assets/createtaxdb-metromap-diagram-dark.png b/assets/createtaxdb-metromap-diagram-dark.png index 103d1a8..75adfe5 100644 Binary files a/assets/createtaxdb-metromap-diagram-dark.png and b/assets/createtaxdb-metromap-diagram-dark.png differ diff --git a/assets/createtaxdb-metromap-diagram-dark.svg b/assets/createtaxdb-metromap-diagram-dark.svg index 73a007f..6ce53b7 100644 --- a/assets/createtaxdb-metromap-diagram-dark.svg +++ b/assets/createtaxdb-metromap-diagram-dark.svg @@ -3,8 +3,8 @@ + showguides="true"> + + + + @@ -725,44 +745,49 @@ + MALT + y="191.87849">MALT @@ -900,11 +925,18 @@ cy="178.3291" r="2.6458333" transform="scale(-1,1)" /> + + MetaCache + + + sourmash + y="199.55136">sourmash sylph + y="207.48903">sylph DATABASE BUILDING DIAMOND + y="230.27434">DIAMOND Kaiju + y="222.46771">Kaiju sourmash + y="238.00008">sourmash SeqKit + y="222.44867">SeqKit + transform="translate(-5.2944349,215.28286)"> v2.0 + y="22.75222">v2.1 @@ -1380,61 +1441,61 @@ xml:space="preserve" style="font-size:4.23333px;line-height:1.25;font-family:Commissioner;-inkscape-font-specification:Commissioner;text-align:center;letter-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke-width:0.264583" x="15.954278" - y="210.85701" + y="218.79448" id="text44958-3-6-4-6">Amino acid sequences + y="218.79448">Amino acid sequences Nucleotide sequences + y="213.72813">Nucleotide sequences Taxonomy files + y="223.89046">Taxonomy files LEGEND + y="208.16782">LEGEND + showguides="true"> + + + + @@ -725,44 +745,49 @@ + MALT + y="191.87849">MALT @@ -900,11 +925,18 @@ cy="178.3291" r="2.6458333" transform="scale(-1,1)" /> + + MetaCache + + + sourmash + y="199.55136">sourmash sylph + y="207.48903">sylph DATABASE BUILDING DIAMOND + y="230.27434">DIAMOND Kaiju + y="222.46771">Kaiju sourmash + y="238.00008">sourmash SeqKit + y="222.44867">SeqKit + transform="translate(-5.2944349,215.28286)"> v2.0 + y="22.75222">v2.1 @@ -1380,61 +1441,61 @@ xml:space="preserve" style="font-size:4.23333px;line-height:1.25;font-family:Commissioner;-inkscape-font-specification:Commissioner;text-align:center;letter-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke-width:0.264583" x="15.954278" - y="210.85701" + y="218.79448" id="text44958-3-6-4-6">Amino acid sequences + y="218.79448">Amino acid sequences Nucleotide sequences + y="213.72813">Nucleotide sequences Taxonomy files + y="223.89046">Taxonomy files LEGEND + y="208.16782">LEGEND /downstream_samplesheets`. @@ -237,6 +238,21 @@ and the k-mer size for which the index was created. The `-sylph.syldb` file can be given to sylph profile itself with `sylph profile -sylph.syldb <...>` etc. +### metacache + +[MetaCache](https://github.com/muellan/metacache) is a classification system for mapping genomic sequences (short reads, long reads, contigs, ...) from metagenomic samples to their most likely taxon of origin. It uses locality sensitive hashing to quickly identify candidate regions within one or multiple reference genomes. + +
+Output files + +- `metacache/` + - `.meta`: sequence signature database binary file + - `.cache0`: sequence signature database binary file + +
+ +The `metacache/-metacache.meta` file can be given to metacache query itself with `metacache query metacache/-metacache.meta <...>` etc. + ### Downstream samplesheets The pipeline can also generate input files for the following downstream diff --git a/docs/usage/dev.md b/docs/usage/dev.md index cdd5146..1f8be78 100644 --- a/docs/usage/dev.md +++ b/docs/usage/dev.md @@ -12,10 +12,12 @@ Does not have to be in this precise order - [ ] Added `--_build_params` - [ ] Added other profiler-specific parameters (e.g. additional taxonomy files) - [ ] Format with VSCode Nextflow extension +- [ ] Update the `subworkflows/local/preprocessing/main.nf` + - [ ] Update subworkflow's the if/else (contatenation) sections for either DNA or AA FASTA preprocessing + - [ ] Format with VSCode Nextflow extension - [ ] Added tools(s) to `workflow/createtaxdb.nf` - [ ] Added relevant new input files to `take:` block, and pass into from `main.nf` to the `NFCORE_CREATETAXDB` workflow - [ ] Added relevant modules/subworkflows at the top using `include` statement - - [ ] Add the tool into the `PREPROCESSING` subworkflow's PREPARE if/else (contatenation) section - [ ] Added the tool-specific if/else statement in the main `createtaxdb.nf` - [ ] Version and MultiQC (if available) channels mixed - [ ] Include output channel in workflow `emit` statement @@ -27,12 +29,12 @@ Does not have to be in this precise order - [ ] Format with VSCode Nextflow extension - [ ] If necessary, added any profiler-specific parameter validation checks to `utils_nfcore_createtaxdb_pipeline` and possible at the top of `createtaxdb.nf` - [ ] Update tests - - [ ] Include the tool in the `test_minimal.config` (as false), `test.config`, `test_nothing.config`, and `test_full.config` - - [ ] Do a mini test of `test_minimal` to make sure it executes when sole tool + - [ ] Include the tool in the `test_minimal.config` (as false), `test.config` and `test_full.config` (as true), and `test_alternatives.config`, as required. + - [ ] Run a mini test of `test_minimal` to make sure it executes when sole tool - [ ] Format these files with VSCode Nextflow extension - [ ] Include the output object in the `tests/test.nf.test` file - - [ ] Re-run nf-test to update snapshot: `nf-test test --tag test --profile +docker --updateSnapshot` (tip: for assertions, borrow from the modules assertions!) -- [ ] Updated Documentation + - [ ] Re-run nf-test to update snapshot: `nf-test test --tag test --profile +docker --update-snapshot` (tip: for assertions, borrow from the modules assertions!) +- [ ] Update Documentation - [ ] `nf-core pipelines schema build` has been run and updated - [ ] All additional tool specific pipeline parameters have a additional help entry with the `Modifies tool parameter(s)` quote block - [ ] Added citation to `citations.md` (citation style: APA 7th edition) diff --git a/docs/usage/faq.md b/docs/usage/faq.md index 9d3cb23..5a27f61 100644 --- a/docs/usage/faq.md +++ b/docs/usage/faq.md @@ -36,6 +36,10 @@ We provide a list of required or recommended files, and which pipeline parameter - a MEGAN 'mapDB' mapping file (`--malt_mapdb`) - sourmash (no additional files required) - sylph (no additional files required) +- metacache + - taxonomy name dump file (`--namesdmp`) + - taxonomy nodes dump file (`--nodesdmp`) + - custom seqid2taxid file (`--nucl2taxid`) \* _will be automatically downloaded if not supplied. You must supply this to the pipeline if on an offline cluster._ @@ -401,6 +405,7 @@ tar czvf kmcp-krakenuniq.tar.gz krakenuniq/database-kmcp-index/ tar czvf -krakenuniq.tar.gz krakenuniq/-krakenuniq/ tar czvf -ganon.tar.gz ganon/ tar czvf -malt.tar.gz malt/malt_index/ +tar czvf -metacache.tar.gz metacache/ ``` ## I get an error about `ConcurrentModificationExeception` diff --git a/modules.json b/modules.json index b6ecbfe..def9bb3 100644 --- a/modules.json +++ b/modules.json @@ -71,6 +71,11 @@ "git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46", "installed_by": ["modules"] }, + "metacache/build": { + "branch": "master", + "git_sha": "e753770db613ce014b3c4bc94f6cba443427b726", + "installed_by": ["modules"] + }, "multiqc": { "branch": "master", "git_sha": "af27af1be706e6a2bb8fe454175b0cdf77f47b49", diff --git a/modules/nf-core/metacache/build/environment.yml b/modules/nf-core/metacache/build/environment.yml new file mode 100644 index 0000000..9641538 --- /dev/null +++ b/modules/nf-core/metacache/build/environment.yml @@ -0,0 +1,7 @@ +--- +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json +channels: + - conda-forge + - bioconda +dependencies: + - bioconda::metacache=2.5.0 diff --git a/modules/nf-core/metacache/build/main.nf b/modules/nf-core/metacache/build/main.nf new file mode 100644 index 0000000..8171237 --- /dev/null +++ b/modules/nf-core/metacache/build/main.nf @@ -0,0 +1,62 @@ +process METACACHE_BUILD { + tag "$meta.id" + label 'process_long' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/metacache:2.5.0--h077b44d_0': + 'biocontainers/metacache:2.5.0--h077b44d_0' }" + + input: + tuple val(meta), path(genome_files, stageAs: 'genomes/*') + path(taxonomy, stageAs: 'taxonomy/*') // optional. Should be [names.dmp, nodes.dmp], plus optionally merged.dmp + path(seq2taxid) // optional + + output: + tuple val(meta), path('*.meta'), path('*.cache*'), emit: db + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + if (task.cpus > 1) { log.warn("'metacache build' cannot be parallelized: ignoring task.cpus > 1") } + def taxonomy_filenames = taxonomy.collect{ f -> f.fileName.name }.sort() + assert !taxonomy || taxonomy_filenames == ['names.dmp', 'nodes.dmp'] || taxonomy_filenames == ['merged.dmp', 'names.dmp', 'nodes.dmp'] + def taxonomy_args = taxonomy ? "-taxonomy taxonomy" : '' + def seq2taxid_args = seq2taxid ? "-taxpostmap '${seq2taxid}'" : '' + """ + metacache \\ + build \\ + ${prefix} \\ + genomes/ \\ + $taxonomy_args \\ + $seq2taxid_args \\ + $args + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + metacache: \$(metacache info |& sed -n 's/^MetaCache version \\+\\([0-9.]\\+\\).*\$/\\1/p') + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + if (task.cpus > 1) { log.warn("'metacache build' cannot be parallelized: ignoring task.cpus > 1") } + def taxonomy_filenames = taxonomy.collect{ f -> f.fileName.name }.sort() + assert !taxonomy || taxonomy_filenames == ['names.dmp', 'nodes.dmp'] || taxonomy_filenames == ['merged.dmp', 'names.dmp', 'nodes.dmp'] + def n_outputs = (args ==~ /-parts\s/) ? (args.replaceAll(/^.*\s-parts\s+(\S+).*$/, '$1') as Integer) : 1 + assert n_outputs > 0 + """ + touch '${prefix}.meta' + touch '${prefix}.cache'{0..${n_outputs-1}} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + metacache: \$(metacache info |& sed -n 's/^MetaCache version \\+\\([0-9.]\\+\\).*\$/\\1/p') + END_VERSIONS + """ +} diff --git a/modules/nf-core/metacache/build/meta.yml b/modules/nf-core/metacache/build/meta.yml new file mode 100644 index 0000000..ace893a --- /dev/null +++ b/modules/nf-core/metacache/build/meta.yml @@ -0,0 +1,92 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json +name: "metacache_build" +description: Taxonomic profiling database building with MetaCache +keywords: + - genomics + - metagenomics + - taxonomy + - short reads + - long reads + - kmer + - k-mer + - metacache + - build + - reference +tools: + - "metacache": + description: | + MetaCache is a classification system for mapping genomic sequences (short reads, long reads, contigs, ...) from metagenomic samples to their most likely taxon of origin. It aims to reduce the memory requirement usually associated with k-mer based methods while retaining their speed. MetaCache uses locality sensitive hashing to quickly identify candidate regions within one or multiple reference genomes. A read is then classified based on the similarity to those regions. + + For an independent comparison to other tools in terms of classification accuracy see the LEMMI benchmarking site. + + The latest version of MetaCache classifies around 60 Million reads (of length 100) per minute against all complete bacterial, viral and archaea genomes from NCBI RefSeq Release 97 running with 88 threads on a workstation with 2 Intel(R) Xeon(R) Gold 6238 CPUs. + homepage: "https://muellan.github.io/metacache" + documentation: "https://github.com/muellan/metacache/tree/master/docs" + tool_dev_url: "https://github.com/muellan/metacache" + licence: ["GPL v3-or-later"] + identifier: "" + +input: + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + + - genome_files: + # Mandatory + type: file + description: "(possibly gzipped) fasta or fastq files of full genomes, for example + from an NCBI assembly" + pattern: "*.{fna,fa,fasta,fnq,fq,fastq}{,.gz}" + ontologies: + - edam: "http://edamontology.org/format_1929" + - edam: "http://edamontology.org/format_1930" + - edam: "http://edamontology.org/format_1954" + - edam: "http://edamontology.org/data_2044" + - taxonomy: + # Optional + type: file + description: "NCBI taxonomy formatted files nodes.dmp and names.dmp" + pattern: "{names,nodes,merged}.dmp" + ontologies: + - edam: "http://edamontology.org/data_3028" + - seq2taxid: + # Optional + type: file + description: > + NCBI-style 'accession2taxid' tab-separated file with 3 or 4 columns: + accession, accession_version, taxid, and gid (optional) + pattern: "*" + ontologies: [] +output: + db: + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1' ]` + - "*.meta": + type: file + description: "sequence signature database binary file" + pattern: "*.meta" + ontologies: + - edam: "http://edamontology.org/format_2333" # Binary format + - "*.cache*": + type: file + description: "sequence signature database binary files" + pattern: "*.cache+([0-9])" + ontologies: + - edam: "http://edamontology.org/format_2333" # Binary format + versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" + + ontologies: + - edam: http://edamontology.org/format_3750 # YAML +authors: + - "@Gullumluvl" +maintainers: + - "@Gullumluvl" diff --git a/modules/nf-core/metacache/build/tests/main.nf.test b/modules/nf-core/metacache/build/tests/main.nf.test new file mode 100644 index 0000000..ae4e1c9 --- /dev/null +++ b/modules/nf-core/metacache/build/tests/main.nf.test @@ -0,0 +1,317 @@ +nextflow_process { + + name "Test Process METACACHE_BUILD" + script "../main.nf" + process "METACACHE_BUILD" + + tag "modules" + tag "modules_nfcore" + tag "metacache" + tag "metacache/build" + + test("one fasta") { + + when { + process { + """ + input[0] = [ + [ id:'test' ], // meta map + file(params.modules_testdata_base_path + "genomics/homo_sapiens/genome/transcriptome.fasta", checkIfExists: true), + ] + input[1] = [] + input[2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert file(process.out.db[0][1]).size() }, + { assert file(process.out.db[0][2]).name == 'test.cache0' }, + { assert file(process.out.db[0][2]).size() }, + { assert snapshot(process.out).match() } + ) + } + + } + + test("one fasta -stub") { + + options "-stub" + + when { + process { + """ + input[0] = [ + [ id:'test' ], // meta map + file(params.modules_testdata_base_path + "genomics/homo_sapiens/genome/transcriptome.fasta", checkIfExists: true), + ] + input[1] = [] + input[2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert file(process.out.db[0][2]).name == 'test.cache0' }, + { assert snapshot(process.out).match() } + ) + } + } + + test("one fasta.gz") { + + when { + process { + """ + input[0] = [ + [ id:'test' ], // meta map + file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.fasta.gz", checkIfExists: true), + ] + input[1] = [] + input[2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert file(process.out.db[0][1]).size() }, + { assert file(process.out.db[0][2]).name == 'test.cache0' }, + { assert file(process.out.db[0][2]).size() }, + { assert snapshot(process.out).match() } + ) + } + + } + + test("one fasta.gz -stub") { + + options "-stub" + + when { + process { + """ + input[0] = [ + [ id:'test' ], // meta map + file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.fasta.gz", checkIfExists: true), + ] + input[1] = [] + input[2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert file(process.out.db[0][2]).name == 'test.cache0' }, + { assert snapshot(process.out).match() } + ) + } + } + test("two fasta") { + + when { + process { + """ + input[0] = [ + [ id:'test' ], // meta map + [file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.fasta", checkIfExists: true), + file(params.modules_testdata_base_path + "genomics/homo_sapiens/genome/transcriptome.fasta", checkIfExists: true)] + ] + input[1] = [] + input[2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert file(process.out.db[0][1]).size() }, + { assert file(process.out.db[0][2]).name == 'test.cache0' }, + { assert file(process.out.db[0][2]).size() }, + { assert snapshot(process.out).match() } + ) + } + + } + + test("two fasta -stub") { + + options "-stub" + + when { + process { + """ + input[0] = [ + [ id:'test' ], // meta map + [file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.fasta", checkIfExists: true), + file(params.modules_testdata_base_path + "genomics/homo_sapiens/genome/transcriptome.fasta", checkIfExists: true)] + ] + input[1] = [] + input[2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert file(process.out.db[0][2]).name == 'test.cache0' }, + { assert snapshot(process.out).match() } + ) + } + } + + test("one fastq") { + + when { + process { + """ + input[0] = [ + [ id:'test' ], // meta map + file(params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_1.fastq.gz", checkIfExists: true), + ] + input[1] = [] + input[2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert file(process.out.db[0][1]).size() }, + { assert file(process.out.db[0][2]).name == 'test.cache0' }, + { assert file(process.out.db[0][2]).size() }, + { assert snapshot(process.out).match() } + ) + } + + } + + test("one fasta + taxonomy") { + when { + process { + """ + input[0] = [ + [ id:'test' ], // meta map + file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.fasta", checkIfExists: true) + ] + input[1] = [ + file(params.modules_testdata_base_path + "genomics/sarscov2/metagenome/names.dmp", checkIfExists: true), + file(params.modules_testdata_base_path + "genomics/sarscov2/metagenome/nodes.dmp", checkIfExists: true) + ] + input[2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert file(process.out.db[0][1]).size() }, + { assert file(process.out.db[0][2]).name == 'test.cache0' }, + { assert file(process.out.db[0][2]).size() }, + { assert snapshot(process.out).match() } + ) + } + } + + test("one fasta + taxonomy -stub") { + + options "-stub" + + when { + process { + """ + input[0] = [ + [ id:'test' ], // meta map + file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.fasta", checkIfExists: true) + ] + input[1] = [ + file(params.modules_testdata_base_path + "genomics/sarscov2/metagenome/names.dmp", checkIfExists: true), + file(params.modules_testdata_base_path + "genomics/sarscov2/metagenome/nodes.dmp", checkIfExists: true) + ] + input[2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert file(process.out.db[0][2]).name == 'test.cache0' }, + { assert snapshot(process.out).match() } + ) + } + } + + test("one fasta + taxonomy + seq2taxid") { + + when { + process { + """ + input[0] = [ + [ id:'test' ], // meta map + file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.fasta", checkIfExists: true) + ] + input[1] = [ + file(params.modules_testdata_base_path + "genomics/sarscov2/metagenome/names.dmp", checkIfExists: true), + file(params.modules_testdata_base_path + "genomics/sarscov2/metagenome/nodes.dmp", checkIfExists: true) + ] + input[2] = [ + file(params.modules_testdata_base_path + "genomics/sarscov2/metagenome/seqid2taxid.map", checkIfExists: true), + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert file(process.out.db[0][1]).size() }, + { assert file(process.out.db[0][2]).name == 'test.cache0' }, + { assert file(process.out.db[0][2]).size() }, + { assert snapshot(process.out).match() } + ) + } + } + + test("one fasta + taxonomy + seq2taxid -stub") { + + options "-stub" + + when { + process { + """ + input[0] = [ + [ id:'test' ], // meta map + file(params.modules_testdata_base_path + "genomics/sarscov2/genome/genome.fasta", checkIfExists: true) + ] + input[1] = [ + file(params.modules_testdata_base_path + "genomics/sarscov2/metagenome/names.dmp", checkIfExists: true), + file(params.modules_testdata_base_path + "genomics/sarscov2/metagenome/nodes.dmp", checkIfExists: true) + ] + input[2] = [ + file(params.modules_testdata_base_path + "genomics/sarscov2/metagenome/seqid2taxid.map", checkIfExists: true), + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert file(process.out.db[0][2]).name == 'test.cache0' }, + { assert snapshot(process.out).match() } + ) + } + } +} diff --git a/modules/nf-core/metacache/build/tests/main.nf.test.snap b/modules/nf-core/metacache/build/tests/main.nf.test.snap new file mode 100644 index 0000000..a3902e9 --- /dev/null +++ b/modules/nf-core/metacache/build/tests/main.nf.test.snap @@ -0,0 +1,387 @@ +{ + "one fasta -stub": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.meta:md5,d41d8cd98f00b204e9800998ecf8427e", + "test.cache0:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ], + "db": [ + [ + { + "id": "test" + }, + "test.meta:md5,d41d8cd98f00b204e9800998ecf8427e", + "test.cache0:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.4" + }, + "timestamp": "2025-06-03T16:15:28.491937812" + }, + "one fasta": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.meta:md5,44bda0e589b01622a6376e410f829907", + "test.cache0:md5,1209d774f809cbde2a630f08d99382ac" + ] + ], + "1": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ], + "db": [ + [ + { + "id": "test" + }, + "test.meta:md5,44bda0e589b01622a6376e410f829907", + "test.cache0:md5,1209d774f809cbde2a630f08d99382ac" + ] + ], + "versions": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.4" + }, + "timestamp": "2025-06-03T16:15:23.955264819" + }, + "one fasta.gz": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.meta:md5,4cea57b64d29695e824c2c5ab880d2e7", + "test.cache0:md5,a4357c9b9512635f95d4507620e36fce" + ] + ], + "1": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ], + "db": [ + [ + { + "id": "test" + }, + "test.meta:md5,4cea57b64d29695e824c2c5ab880d2e7", + "test.cache0:md5,a4357c9b9512635f95d4507620e36fce" + ] + ], + "versions": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.4" + }, + "timestamp": "2025-06-03T16:15:33.006848723" + }, + "one fastq": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.meta:md5,d27079a160a7be4b7ced25c1c6f33f60", + "test.cache0:md5,2c3aba2e87bb82e66d7e83434fff8c50" + ] + ], + "1": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ], + "db": [ + [ + { + "id": "test" + }, + "test.meta:md5,d27079a160a7be4b7ced25c1c6f33f60", + "test.cache0:md5,2c3aba2e87bb82e66d7e83434fff8c50" + ] + ], + "versions": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.4" + }, + "timestamp": "2025-06-03T16:15:52.011973165" + }, + "two fasta -stub": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.meta:md5,d41d8cd98f00b204e9800998ecf8427e", + "test.cache0:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ], + "db": [ + [ + { + "id": "test" + }, + "test.meta:md5,d41d8cd98f00b204e9800998ecf8427e", + "test.cache0:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.4" + }, + "timestamp": "2025-06-03T16:15:47.311185636" + }, + "one fasta + taxonomy + seq2taxid -stub": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.meta:md5,d41d8cd98f00b204e9800998ecf8427e", + "test.cache0:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ], + "db": [ + [ + { + "id": "test" + }, + "test.meta:md5,d41d8cd98f00b204e9800998ecf8427e", + "test.cache0:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.4" + }, + "timestamp": "2025-06-03T16:16:12.528786791" + }, + "one fasta.gz -stub": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.meta:md5,d41d8cd98f00b204e9800998ecf8427e", + "test.cache0:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ], + "db": [ + [ + { + "id": "test" + }, + "test.meta:md5,d41d8cd98f00b204e9800998ecf8427e", + "test.cache0:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.4" + }, + "timestamp": "2025-06-03T16:15:37.496630306" + }, + "one fasta + taxonomy": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.meta:md5,7dddd6667a0c7d7d3531d093fee985b7", + "test.cache0:md5,a4357c9b9512635f95d4507620e36fce" + ] + ], + "1": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ], + "db": [ + [ + { + "id": "test" + }, + "test.meta:md5,7dddd6667a0c7d7d3531d093fee985b7", + "test.cache0:md5,a4357c9b9512635f95d4507620e36fce" + ] + ], + "versions": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.4" + }, + "timestamp": "2025-06-03T16:15:57.077548317" + }, + "one fasta + taxonomy + seq2taxid": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.meta:md5,7dddd6667a0c7d7d3531d093fee985b7", + "test.cache0:md5,a4357c9b9512635f95d4507620e36fce" + ] + ], + "1": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ], + "db": [ + [ + { + "id": "test" + }, + "test.meta:md5,7dddd6667a0c7d7d3531d093fee985b7", + "test.cache0:md5,a4357c9b9512635f95d4507620e36fce" + ] + ], + "versions": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.4" + }, + "timestamp": "2025-06-03T16:16:07.284240823" + }, + "two fasta": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.meta:md5,addd769ebecb3c4a24fcb969f8dd3c8e", + "test.cache0:md5,63a7b4c053dc9991328e4b12c22159b5" + ] + ], + "1": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ], + "db": [ + [ + { + "id": "test" + }, + "test.meta:md5,addd769ebecb3c4a24fcb969f8dd3c8e", + "test.cache0:md5,63a7b4c053dc9991328e4b12c22159b5" + ] + ], + "versions": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.4" + }, + "timestamp": "2025-06-03T16:15:42.319461844" + }, + "one fasta + taxonomy -stub": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.meta:md5,d41d8cd98f00b204e9800998ecf8427e", + "test.cache0:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ], + "db": [ + [ + { + "id": "test" + }, + "test.meta:md5,d41d8cd98f00b204e9800998ecf8427e", + "test.cache0:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,7a7dbc5c065635f0ffd5829bce62d73c" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.4" + }, + "timestamp": "2025-06-03T16:16:02.134401443" + } +} \ No newline at end of file diff --git a/nextflow.config b/nextflow.config index 0efa6c8..b4373cf 100644 --- a/nextflow.config +++ b/nextflow.config @@ -75,6 +75,7 @@ params { build_sourmash_dna = false build_sourmash_protein = false build_sylph = false + build_metacache = false bracken_build_options = '' centrifuge_build_options = '' @@ -90,6 +91,7 @@ params { sourmash_build_protein_options = "--param-string 'scaled=200,k=10,noabund'" sourmash_batch_size = 100 sylph_build_options = '' + metacache_build_options = '' // General output options diff --git a/nextflow_schema.json b/nextflow_schema.json index b47597a..10dfb17 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -311,6 +311,17 @@ "fa_icon": "fas fa-users-cog", "description": "Specify parameters being given to sylph sketch.", "help_text": "See [sylph documentation](https://github.com/bluenote-1577/sylph/wiki/sylph-cookbook#database-sketching-options)." + }, + "build_metacache": { + "type": "boolean", + "fa_icon": "fas fa-toggle-on", + "description": "Turn on building of MetaCache database. Requires nucleotide FASTA file input." + }, + "metacache_build_options": { + "type": "string", + "fa_icon": "fas fa-users-cog", + "description": "Specify parameters being given to metacache build.", + "help_text": "See [MetaCache documentation](https://github.com/muellan/metacache/blob/master/docs/mode_build.txt)." } }, "fa_icon": "fas fa-database" diff --git a/subworkflows/local/preprocessing/main.nf b/subworkflows/local/preprocessing/main.nf index 749d8df..d8882b5 100644 --- a/subworkflows/local/preprocessing/main.nf +++ b/subworkflows/local/preprocessing/main.nf @@ -30,7 +30,7 @@ workflow PREPROCESSING { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // PREPARE: Prepare input for single file inputs modules - if ([(params.build_malt && malt_build_mode == 'nucleotide'), params.build_centrifuge, params.build_kraken2, params.build_bracken, params.build_krakenuniq, params.build_ganon, params.build_kmcp, params.build_sourmash_dna, params.build_sylph].any()) { + if ([(params.build_malt && malt_build_mode == 'nucleotide'), params.build_centrifuge, params.build_kraken2, params.build_bracken, params.build_krakenuniq, params.build_ganon, params.build_kmcp, params.build_sourmash_dna, params.build_sylph, params.build_metacache].any()) { // Pull just DNA sequences ch_dna_refs_for_singleref = ch_samplesheet diff --git a/subworkflows/local/utils_nfcore_createtaxdb_pipeline/main.nf b/subworkflows/local/utils_nfcore_createtaxdb_pipeline/main.nf index 5a4d7a0..bb21646 100644 --- a/subworkflows/local/utils_nfcore_createtaxdb_pipeline/main.nf +++ b/subworkflows/local/utils_nfcore_createtaxdb_pipeline/main.nf @@ -63,7 +63,7 @@ workflow PIPELINE_INITIALISATION { \033[0;35m nf-core/createtaxdb ${workflow.manifest.version}\033[0m -\033[2m----------------------------------------------------\033[0m- """ - after_text = """${workflow.manifest.doi ? "\n* The pipeline\n" : ""}${workflow.manifest.doi.tokenize(",").collect { doi -> " https://doi.org/${doi.trim().replace('https://doi.org/','')}"}.join("\n")}${workflow.manifest.doi ? "\n" : ""} + after_text = """${workflow.manifest.doi ? "\n* The pipeline\n" : ""}${workflow.manifest.doi.tokenize(",").collect { doi -> " https://doi.org/${doi.trim().replace('https://doi.org/', '')}" }.join("\n")}${workflow.manifest.doi ? "\n" : ""} * The nf-core framework https://doi.org/10.1038/s41587-020-0439-x @@ -267,10 +267,10 @@ def toolCitationText() { params.build_kraken2 ? "Kraken2 (Wood et al. 2019)," : "", params.build_krakenuniq ? "KrakenUniq (Breitwieser et al. 2018)," : "", params.build_malt ? "MALT (Vågene et al. 2018)," : "", + params.build_metacache ? "MetaCache (Müller et al. 2017)," : "", params.build_sourmash_dna || params.build_sourmash_protein ? "sourmash sketch dna (Irber et al. 2024)," : "", params.build_sylph ? "sylph (Shaw and Yu 2024)," : "", - "and MultiQC (Ewels et al. 2016)", - ".", + "and MultiQC (Ewels et al. 2016).", ].join(' ').trim() return citation_text @@ -287,6 +287,7 @@ def toolBibliographyText() { params.build_kmcp ? '
  • Shen, W., Xiang, H., Huang, T., Tang, H., Peng, M., Cai, D., Hu, P., & Ren, H. (2023). KMCP: accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping. Bioinformatics (Oxford, England), 39(1). 10.1093/bioinformatics/btac845
  • ' : "", params.build_kraken2 ? '
  • Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biology, 20(1), 257. 10.1186/s13059-019-1891-0
  • ' : "", params.build_krakenuniq ? '
  • Breitwieser, F. P., Baker, D. N., & Salzberg, S. L. (2018). KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biology, 19(1), 198. 10.1186/s13059-018-1568-0
  • ' : "", + params.build_metacache ? '
  • Müller, A., Hundt, C., Hildebrandt, A., Hankeln, T., & Schmidt, B. (2017). MetaCache: context-aware classification of metagenomic reads using minhashing. Bioinformatics (Oxford, England), 33(23), 3740–3748. 10.1093/bioinformatics/btx520
  • ' : "", params.build_malt ? '
  • Vågene, Å. J., Herbig, A., Campana, M. G., Robles García, N. M., Warinner, C., Sabin, S., Spyrou, M. A., Andrades Valtueña, A., Huson, D., Tuross, N., Bos, K. I., & Krause, J. (2018). Salmonella enterica genomes from victims of a major sixteenth-century epidemic in Mexico. Nature Ecology & Evolution, 2(3), 520–528. 10.1038/s41559-017-0446-6
  • ' : "", params.build_sourmash_dna || params.build_sourmash_protein ? '
  • Irber, L., Pierce-Ward, N. T., Abuelanin, M., Alexander, H., Anant, A., Barve, K., Baumler, C., Botvinnik, O., Brooks, P., Dsouza, D., Gautier, L., Hera, M. R., Houts, H. E., Johnson, L. K., Klötzl, F., Koslicki, D., Lim, M., Lim, R., Nelson, B., ... Brown, C. T. (2024). sourmash v4: A multitool to quickly search, compare,and analyze genomic and metagenomic data sets. Journal of Open Source Software, 9(98), 6830. 10.21105/joss.06830
  • ' : "", params.build_sylph ? '
  • Shaw, J., & Yu, Y. W. (2024). Rapid species-level metagenome profiling and containment estimation with sylph. Nature Biotechnology, 1–12. 10.1038/s41587-024-02412-y
  • ' : "", diff --git a/tests/default.nf.test b/tests/default.nf.test index 79224d9..e9cbeda 100644 --- a/tests/default.nf.test +++ b/tests/default.nf.test @@ -52,8 +52,10 @@ nextflow_pipeline { "kmcp/database-kmcp-index.log - contains string: ${path("$outputDir/kmcp/database-kmcp-index.log").readLines().any{ it.contains('k-mers saved to database-kmcp-index')}}", "sourmash/database-sourmash-dna-31mer.sbt.zip - exists: ${path("$outputDir/sourmash/database-sourmash-dna-31mer.sbt.zip").exists()}", "sourmash/database-sourmash-protein-10mer.sbt.zip - exists: ${path("$outputDir/sourmash/database-sourmash-protein-10mer.sbt.zip").exists()}", - "downstream_samplesheets/databases-taxprofiler.csv - nr. lines: ${path("$outputDir/downstream_samplesheets/databases-taxprofiler.csv").readLines().size() == 12}", + "downstream_samplesheets/databases-taxprofiler.csv - nr. lines: ${path("$outputDir/downstream_samplesheets/databases-taxprofiler.csv").readLines().size() == 13}", "sylph/database-sylph.syldb - minimum file size: ${file("$outputDir/sylph/database-sylph.syldb").length() >= 280000}", + "metacache/database-metacache.meta - minimum file size: ${file("$outputDir/metacache/database-metacache.meta").length() >= 24000}", + "metacache/database-metacache.cache0 - minimum file size: ${file("$outputDir/metacache/database-metacache.cache0").length() >= 13000000}", ).match() }, diff --git a/tests/default.nf.test.snap b/tests/default.nf.test.snap index 60986cf..393d350 100644 --- a/tests/default.nf.test.snap +++ b/tests/default.nf.test.snap @@ -50,6 +50,9 @@ "MALT_BUILD": { "malt": "0.6.2" }, + "METACACHE_BUILD": { + "metacache": "2.5.0" + }, "SEQKIT_BATCHRENAME": { "seqkit": "2.9.0" }, @@ -106,12 +109,14 @@ "sourmash/database-sourmash-dna-31mer.sbt.zip - exists: true", "sourmash/database-sourmash-protein-10mer.sbt.zip - exists: true", "downstream_samplesheets/databases-taxprofiler.csv - nr. lines: true", - "sylph/database-sylph.syldb - minimum file size: true" + "sylph/database-sylph.syldb - minimum file size: true", + "metacache/database-metacache.meta - minimum file size: true", + "metacache/database-metacache.cache0 - minimum file size: true" ], "meta": { "nf-test": "0.9.2", "nextflow": "25.10.0" }, - "timestamp": "2025-11-13T10:17:11.260021335" + "timestamp": "2025-12-18T11:27:08.015469503" } } \ No newline at end of file diff --git a/workflows/createtaxdb.nf b/workflows/createtaxdb.nf index e862ac4..d5ad9cb 100644 --- a/workflows/createtaxdb.nf +++ b/workflows/createtaxdb.nf @@ -23,6 +23,7 @@ include { KRAKENUNIQ_BUILD } from '../modules/nf-core/ include { UNZIP } from '../modules/nf-core/unzip/main' include { MALT_BUILD } from '../modules/nf-core/malt/build/main' include { SYLPH_SKETCHGENOMES } from '../modules/nf-core/sylph/sketchgenomes/main' +include { METACACHE_BUILD } from '../modules/nf-core/metacache/build/main' include { FASTA_BUILD_ADD_KRAKEN2_BRACKEN } from '../subworkflows/nf-core/fasta_build_add_kraken2_bracken/main' include { GENERATE_DOWNSTREAM_SAMPLESHEETS } from '../subworkflows/local/generate_downstream_samplesheets/main.nf' @@ -262,6 +263,22 @@ workflow CREATETAXDB { ch_sylph_output = channel.empty() } + // MODULE : Run METACACHE/BUILD + if (params.build_metacache) { + METACACHE_BUILD( + PREPROCESSING.out.grouped_dna_fastas, + [file_taxonomy_namesdmp, file_taxonomy_nodesdmp], + [file_nucl2taxid], + ) + ch_versions = ch_versions.mix(METACACHE_BUILD.out.versions) + // Current module emits the two file as separate elements of the same tuple, so we need to combine them here + // to satisfy our later final output directory + ch_metacache_output = METACACHE_BUILD.out.db.map { meta, dbmeta, dbcache -> [meta, [dbmeta, dbcache]] } + } + else { + ch_metacache_output = channel.empty() + } + // // Aggregate all databases for downstream processes // @@ -278,6 +295,7 @@ workflow CREATETAXDB { ch_sourmash_dna_output.map { meta, db -> [meta + [tool: 'sourmash', type: 'dna'], db] }, ch_sourmash_protein_output.map { meta, db -> [meta + [tool: 'sourmash', type: 'protein'], db] }, ch_sylph_output.map { meta, db -> [meta + [tool: 'sylph', type: 'dna'], db] }, + ch_metacache_output.map { meta, db -> [meta + [tool: 'metacache', type: 'dna'], db] }, ) // @@ -291,7 +309,8 @@ workflow CREATETAXDB { // // Collate and save software versions // - def topic_versions = Channel.topic("versions") + def topic_versions = Channel + .topic("versions") .distinct() .branch { entry -> versions_file: entry instanceof Path @@ -300,9 +319,9 @@ workflow CREATETAXDB { def topic_versions_string = topic_versions.versions_tuple .map { process, tool, version -> - [ process[process.lastIndexOf(':')+1..-1], " ${tool}: ${version}" ] + [process[process.lastIndexOf(':') + 1..-1], " ${tool}: ${version}"] } - .groupTuple(by:0) + .groupTuple(by: 0) .map { process, tool_versions -> tool_versions.unique().sort() "${process}:\n${tool_versions.join('\n')}" @@ -322,25 +341,31 @@ workflow CREATETAXDB { // // MODULE: MultiQC // - ch_multiqc_config = channel.fromPath( - "$projectDir/assets/multiqc_config.yml", checkIfExists: true) - ch_multiqc_custom_config = params.multiqc_config ? - channel.fromPath(params.multiqc_config, checkIfExists: true) : - channel.empty() - ch_multiqc_logo = params.multiqc_logo ? - channel.fromPath(params.multiqc_logo, checkIfExists: true) : - channel.empty() - - summary_params = paramsSummaryMap( - workflow, parameters_schema: "nextflow_schema.json") + ch_multiqc_config = channel.fromPath( + "${projectDir}/assets/multiqc_config.yml", + checkIfExists: true + ) + ch_multiqc_custom_config = params.multiqc_config + ? channel.fromPath(params.multiqc_config, checkIfExists: true) + : channel.empty() + ch_multiqc_logo = params.multiqc_logo + ? channel.fromPath(params.multiqc_logo, checkIfExists: true) + : channel.empty() + + summary_params = paramsSummaryMap( + workflow, + parameters_schema: "nextflow_schema.json" + ) ch_workflow_summary = channel.value(paramsSummaryMultiqc(summary_params)) ch_multiqc_files = ch_multiqc_files.mix( - ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')) - ch_multiqc_custom_methods_description = params.multiqc_methods_description ? - file(params.multiqc_methods_description, checkIfExists: true) : - file("$projectDir/assets/methods_description_template.yml", checkIfExists: true) - ch_methods_description = channel.value( - methodsDescriptionText(ch_multiqc_custom_methods_description)) + ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml') + ) + ch_multiqc_custom_methods_description = params.multiqc_methods_description + ? file(params.multiqc_methods_description, checkIfExists: true) + : file("${projectDir}/assets/methods_description_template.yml", checkIfExists: true) + ch_methods_description = channel.value( + methodsDescriptionText(ch_multiqc_custom_methods_description) + ) ch_multiqc_files = ch_multiqc_files.mix(ch_collated_versions) ch_multiqc_files = ch_multiqc_files.mix( @@ -373,4 +398,5 @@ workflow CREATETAXDB { sourmash_dna_database = ch_sourmash_dna_output sourmash_aa_database = ch_sourmash_protein_output sylph_database = ch_sylph_output + metacache_database = ch_metacache_output }