Skip to content

New pipeline: nf-core/shallowseq #127

@brandenjlynch

Description

@brandenjlynch

Pipeline title/name

shallowseq

Keywords

genomics, cancer, shallow whole-genome sequencing, copy-number

What is it about?

A pipeline for processing shallow whole-genome sequencing data, including alignment, duplicate marking, and QC, followed by downstream generation of relative copy-number and estimation of absolute copy-number. The pipeline is optimized for shallow sequencing and formalin-fixed paraffin-embedded (FFPE) samples.

Please provide a schematic diagram of the proposed pipeline

Image

What would a minimal first release of this pipeline include?

A minimal first release would include read trimming with Trimmomatic, alignment with bwa-mem, duplicate marking with Picard MarkDuplicates, QC with FastQC and MultiQC, relative copy-number estimation with QDNAseq (and/or WisecondorX, IchorCNA), and absolute copy-number estimation with Rascal/ACE (or an equivalent method).

I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:

  • be built with Nextflow.
  • pass nf-core lint tests and use standardized parameters.
  • be community-owned and developed within the nf-core organization.
  • open source under the MIT license with proper credits and acknowledgments.
  • have a descriptive, all lowercase, and without punctuation name.
  • use the nf-core pipeline template and predominantly use official nf-core modules.
  • focus on a specific data/analysis type with appropriate scope.
  • have properly maintained documentation.
  • be bundled using versioned Docker/Singularity containers.

Why do we need a new pipeline?

There are no existing pipelines which are optimized for processing shallow whole-genome sequencing data from patient tumors, especially data obtained from FFPE samples, which are generally of poor quality compared to fresh or frozen samples. FFPE samples are very common clinically and shallow sequencing is cost-effective, making it desirable to be able to process these samples in a robust way. While existing pipelines like sarek or oncoanalyser can call variants, including copy-number variants, they leverage tools that require deeper sequencing for robust results. For example, tools like ASCAT require allele-specific information, which is of poor quality for shallow sequencing owing the lack of depth supporting estimated b-allele frequencies. The proposed pipeline will employ tools that are optimised for the limitations imposed by shallow sequencing, such as QDNAseq and WisecondorX.

Who would be interested?

This pipeline would be of interest to researchers and clinicians in oncology that would like to use cost-effective shallow whole-genome sequencing to profile the copy-number landscape of their samples.

What has been done so far

The pipeline is already complete (meeting & exceeding the minimum viable description above), albeit without being totally compliant with nf-core guidelines. Some refactoring will need to take place in order to make it nf-core compliant.

URL to existing work (if applicable)

https://github.com/Huntsmanlab/swgs-processing-pipeline

Are there any similar existing nf-core pipelines?

sarek, oncoanalyser

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    proposed

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions