New pipeline: nf-core/shallowseq

### Pipeline title/name

shallowseq

### Keywords

genomics, cancer, shallow whole-genome sequencing, copy-number

### What is it about?

A pipeline for processing shallow whole-genome sequencing data, including alignment, duplicate marking, and QC, followed by downstream generation of relative copy-number and estimation of absolute copy-number. The pipeline is optimized for shallow sequencing and formalin-fixed paraffin-embedded (FFPE) samples.

### Please provide a schematic diagram of the proposed pipeline

<img width="2235" height="1838" alt="Image" src="https://github.com/user-attachments/assets/c34ced13-83c7-41d9-91b4-ac36e8e6289b" />

### What would a minimal first release of this pipeline include?

A minimal first release would include read trimming with Trimmomatic, alignment with bwa-mem, duplicate marking with Picard MarkDuplicates, QC with FastQC and MultiQC, relative copy-number estimation with QDNAseq (and/or WisecondorX, IchorCNA), and absolute copy-number estimation with Rascal/ACE (or an equivalent method).

### I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:

- [x] be built with Nextflow.
- [x] pass nf-core lint tests and use standardized parameters.
- [x] be community-owned and developed within the nf-core organization.
- [x] open source under the MIT license with proper credits and acknowledgments.
- [x] have a descriptive, all lowercase, and without punctuation name.
- [x] use the nf-core pipeline template and predominantly use official nf-core modules.
- [x] focus on a specific data/analysis type with appropriate scope.
- [x] have properly maintained documentation.
- [x] be bundled using versioned Docker/Singularity containers.

### Why do we need a new pipeline?

There are no existing pipelines which are optimized for processing shallow whole-genome sequencing data from patient tumors, especially data obtained from FFPE samples, which are generally of poor quality compared to fresh or frozen samples. FFPE samples are very common clinically and shallow sequencing is cost-effective, making it desirable to be able to process these samples in a robust way. While existing pipelines like sarek or oncoanalyser can call variants, including copy-number variants, they leverage tools that require deeper sequencing for robust results. For example, tools like ASCAT require allele-specific information, which is of poor quality for shallow sequencing owing the lack of depth supporting estimated b-allele frequencies. The proposed pipeline will employ tools that are optimised for the limitations imposed by shallow sequencing, such as QDNAseq and WisecondorX.

### Who would be interested?

This pipeline would be of interest to researchers and clinicians in oncology that would like to use cost-effective shallow whole-genome sequencing to profile the copy-number landscape of their samples. 

### What has been done so far

The pipeline is already complete (meeting & exceeding the minimum viable description above), albeit without being totally compliant with nf-core guidelines. Some refactoring will need to take place in order to make it nf-core compliant.

### URL to existing work (if applicable)

https://github.com/Huntsmanlab/swgs-processing-pipeline

### Are there any similar existing nf-core pipelines?

sarek, oncoanalyser

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New pipeline: nf-core/shallowseq #127

Pipeline title/name

Keywords

What is it about?

Please provide a schematic diagram of the proposed pipeline

What would a minimal first release of this pipeline include?

I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:

Why do we need a new pipeline?

Who would be interested?

What has been done so far

URL to existing work (if applicable)

Are there any similar existing nf-core pipelines?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New pipeline: nf-core/shallowseq #127

Description

Pipeline title/name

Keywords

What is it about?

Please provide a schematic diagram of the proposed pipeline

What would a minimal first release of this pipeline include?

I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:

Why do we need a new pipeline?

Who would be interested?

What has been done so far

URL to existing work (if applicable)

Are there any similar existing nf-core pipelines?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions