Skip to content

cyclomics/cyclomicsseq

Repository files navigation

CycloSeq

Pipelines to process Cyclomics data.

This pipeline uses prior information from the backbone to increase the effectiveness of consensus calling from the circular DNA protocol by Cyclomics.

Dependencies

Click for installation instructions:

Data requirements

  • data output by ONT Guppy (SUP preferred for optimal results)
  • Reference genome, Ideally pre indexed by BWA to reduce runtime.

System requirements

The pipeline expects at least 16 threads to be available and 16GB of RAM. We recommend 64 GB of RAM to decrease the runtime significantly.

Reference genome

The pipeline has been developed with amplicons that map against the provided reference in mind.

We suggest to use Grch38.p14, since this works well with the VEP that is integrated in the pipeline its available via the code snippet below.

wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.29_GRCh38.p14/GCA_000001405.29_GRCh38.p14_genomic.fna.gz
gunzip GCA_000001405.29_GRCh38.p14_genomic.fna.gz

To reduce runtime pre index the reference genome with BWA, or obtain a preindexed copy.

Usage

Inside of Epi2Me Labs

This pipeline is compatible with the EPI2ME Labs platform by ONT. Please see ONT's installation guide.

Installation inside EPI2ME Labs:

  1. go to workflows by clicking on "installed workflows", or click the workflows icon in the top bar.
  2. click "Import workflow".
  3. Paste "https://github.com/cyclomics/cyclomicsseq" into the text bar and click Import workflow.

As a nextflow pipeline

In this section we assume that you have docker and nextflow installed on your system, if so running the pipeline is straightforward. You can run the pipeline directly from this repo, or pull it yourself and point nextflow towards it.

nextflow run cyclomics/cycmomicsseq -r <pipeline version> -profile docker --input_read_dir '/sequencing/20220209_1609_X3_FAS06478_0ed4361c/fastq_pass/' --output_dir '/data/myresults' --reference '/data/reference/chm13v2.fasta' --backbone BB12

Singularity

If docker is not an option, singularity (or Apptainer, as it is called since Q2 2022) is a good alternative that does not require root access and therefor used in shared compute environments.

The command becomes:

nextflow run cyclomics/cycloseq -profile singularity ...'

Please note that this assumes you've ran the pipeline before, if not add the -user flag as described in Usage[#Usage].

Conda

The pipeline is fully compatible with Conda. This means the full command becomes:

nextflow run cyclomics/cycloseq -profile conda ...'

By default it uses the environment file that is shipped with the pipeline. this file is located in the repo, the pipeline needs to know where this file is to run with the correct versions of the required software.

Flag descriptions

flag info
--input_read_dir Directory where the output fastqs of Guppy are located, e.g.: "/data/guppy/exp001/fastq_pass".
--read_pattern Regex pattern to look for fastq's in the read directory, defaults to: "**.{fq,fastq,fq.gz,fastq.gz}".
--sequencing_summary_path The summary file generated by guppy, Optional, default: "sequencing_summary*.txt".
--backbone Select a preset backbone.
--backbone_file File to use as backbone when --backbone is non of the available presets. eg a fasta file with a sequence with the name ">BB_custom" the name must start with BB for extraction reasons.
--reference Path to the reference genome to use, will ingest all index files in the same directory.
--output_dir Directory path where the results, including intermediate files, are stored.
--snp_filters.min_dir_ratio, --indel_filters.min_dir_ratio Minimum ratio of variant-supporting reads in each direction (default: 0.001 (SNP); 0.002 (Indel)).
--snp_filters.min_dir_count, --indel_filters.min_dir_count Minimum number of variant-supporting reads in each direction (default: 5).
--snp_filters.min_dpq, --indel_filters.min_dpq Minimum positional depth after Q filtering (default: 5_000).
--snp_filters.min_dpq_n, --indel_filters.min_dpq_n Number of flanking nucleotides to the each position that will determine the window size for local maxima calculation (default = 25).
--snp_filters.min_dpq_ratio, --indel_filters.min_dpq_ratio Ratio of local depth maxima that will determine the minimum depth at each position (default = 0.3).
--snp_filters.min_vaf, --indel_filters.min_vaf Minimum variant allele frequency (default: 0.003 (SNP); 0.004 (Indel)).
--snp_filters.min_rel_ratio, --indel_filters.min_rel_ratio Minimum relative ratio between forward and reverse variant-supporting reads (default: 0.3 (SNP); 0.4 (Indel)).
--snp_filters.min_abq, --indel_filters.min_abq Minimum average base quality (default: 70).

Using alternative Backbones

Due to lab conditions a different backbone might be used. the --backbone parameter can be set to any fasta file. The following defaults are available by default in the pipeline, you can enable them by copying the value in the the value column and pasting it behind the cli command.

backbone value default
BB22 --backbone BB22
BB25 --backbone BB25
BB41 --backbone BB41
BB42 --backbone BB42 X
BBCS --backbone BBCS
BBCR --backbone BBCR

Roadmap / Todo:

Performance:

  1. Multi-threaded variant calling

changelog

Please see CHANGELOG.md

Install-dependencies

Nextflow

Download the latest version by running the example below:

wget -qO- https://get.nextflow.io | bash

or see The official Nextflow documentation

Conda

Download the latest conda version from The official conda documentation

Run the below command and follow process:

bash Miniconda3-latest-Linux-x86_64.sh

Apptainer/Singularity

Download the latest version by running the example below:

wget https://github.com/apptainer/apptainer/releases/download/v1.1.0-rc.2/apptainer_1.1.0-rc.2_amd64.deb
sudo apt-get install -y ./apptainer_1.1.0-rc.2_amd64.deb

for the latest up to date information see their official documentation

Running on A SLURM cluster such as UMCU HPC

login to the hpc using SSH. there start a sjob with:

srun --job-name "InteractiveJob" --cpus-per-task 16 --mem=32G --gres=tmpspace:450G --time 24:00:00 --pty bash

go to the right project folder

cd /hpc/compgen/projects/cyclomics/cycloseq/pipelines/cycloseq/

start the pipeline as normal

Developer notes

Cycas addition to the repo

Cycas was added as a subtree using code from: https://gist.github.com/SKempin/b7857a6ff6bddb05717cc17a44091202. This was done instead of submodule to make pulling of the repo easier for endusers and to stay compatible with nextflow run functionallity.

more specifically:

git subtree add --prefix Cycas https://github.com/cyclomics/Cycas 0.4.3 --squash

To update run

git subtree pull --prefix Cycas https://github.com/cyclomics/Cycas <tag> --squash