Pipeline title/name
shallowseq
Keywords
genomics, cancer, shallow whole-genome sequencing, copy-number
What is it about?
A pipeline for processing shallow whole-genome sequencing data, including alignment, duplicate marking, and QC, followed by downstream generation of relative copy-number and estimation of absolute copy-number. The pipeline is optimized for shallow sequencing and formalin-fixed paraffin-embedded (FFPE) samples.
Please provide a schematic diagram of the proposed pipeline
What would a minimal first release of this pipeline include?
A minimal first release would include read trimming with Trimmomatic, alignment with bwa-mem, duplicate marking with Picard MarkDuplicates, QC with FastQC and MultiQC, relative copy-number estimation with QDNAseq (and/or WisecondorX, IchorCNA), and absolute copy-number estimation with Rascal/ACE (or an equivalent method).
I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:
Why do we need a new pipeline?
There are no existing pipelines which are optimized for processing shallow whole-genome sequencing data from patient tumors, especially data obtained from FFPE samples, which are generally of poor quality compared to fresh or frozen samples. FFPE samples are very common clinically and shallow sequencing is cost-effective, making it desirable to be able to process these samples in a robust way. While existing pipelines like sarek or oncoanalyser can call variants, including copy-number variants, they leverage tools that require deeper sequencing for robust results. For example, tools like ASCAT require allele-specific information, which is of poor quality for shallow sequencing owing the lack of depth supporting estimated b-allele frequencies. The proposed pipeline will employ tools that are optimised for the limitations imposed by shallow sequencing, such as QDNAseq and WisecondorX.
Who would be interested?
This pipeline would be of interest to researchers and clinicians in oncology that would like to use cost-effective shallow whole-genome sequencing to profile the copy-number landscape of their samples.
What has been done so far
The pipeline is already complete (meeting & exceeding the minimum viable description above), albeit without being totally compliant with nf-core guidelines. Some refactoring will need to take place in order to make it nf-core compliant.
URL to existing work (if applicable)
https://github.com/Huntsmanlab/swgs-processing-pipeline
Are there any similar existing nf-core pipelines?
sarek, oncoanalyser
Pipeline title/name
shallowseq
Keywords
genomics, cancer, shallow whole-genome sequencing, copy-number
What is it about?
A pipeline for processing shallow whole-genome sequencing data, including alignment, duplicate marking, and QC, followed by downstream generation of relative copy-number and estimation of absolute copy-number. The pipeline is optimized for shallow sequencing and formalin-fixed paraffin-embedded (FFPE) samples.
Please provide a schematic diagram of the proposed pipeline
What would a minimal first release of this pipeline include?
A minimal first release would include read trimming with Trimmomatic, alignment with bwa-mem, duplicate marking with Picard MarkDuplicates, QC with FastQC and MultiQC, relative copy-number estimation with QDNAseq (and/or WisecondorX, IchorCNA), and absolute copy-number estimation with Rascal/ACE (or an equivalent method).
I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:
Why do we need a new pipeline?
There are no existing pipelines which are optimized for processing shallow whole-genome sequencing data from patient tumors, especially data obtained from FFPE samples, which are generally of poor quality compared to fresh or frozen samples. FFPE samples are very common clinically and shallow sequencing is cost-effective, making it desirable to be able to process these samples in a robust way. While existing pipelines like sarek or oncoanalyser can call variants, including copy-number variants, they leverage tools that require deeper sequencing for robust results. For example, tools like ASCAT require allele-specific information, which is of poor quality for shallow sequencing owing the lack of depth supporting estimated b-allele frequencies. The proposed pipeline will employ tools that are optimised for the limitations imposed by shallow sequencing, such as QDNAseq and WisecondorX.
Who would be interested?
This pipeline would be of interest to researchers and clinicians in oncology that would like to use cost-effective shallow whole-genome sequencing to profile the copy-number landscape of their samples.
What has been done so far
The pipeline is already complete (meeting & exceeding the minimum viable description above), albeit without being totally compliant with nf-core guidelines. Some refactoring will need to take place in order to make it nf-core compliant.
URL to existing work (if applicable)
https://github.com/Huntsmanlab/swgs-processing-pipeline
Are there any similar existing nf-core pipelines?
sarek, oncoanalyser