Flynn Lab Pediatric Osteosarcoma Tumor WGS Variant Detection Pipeline

Overview

This pipeline processes whole-genome sequencing (WGS) data from osteosarcoma xenograft tumor samples to detect structural variants (SVs), copy number variants (CNVs), chromothripsis, and single nucleotide variants (SNVs). The workflow is implemented in Nextflow (v24.04.2) with supporting scripts in R (v4.5.1) and Bash (v5.3).

For more information about the pipeline, please visit the user guide.

The pipeline is designed for samples initially xenografted into mouse models, sequenced with Illumina short-read WGS (paired-end, 60× coverage), and delivered in CRAM format. It automatically handles mouse contamination removal and integrates results from multiple SV callers to reduce false positives. CNV is performed using a Panel of Normals and with further computation to find the absolute copy number. Chromothripsis detection is performed using the SV and CNV analysis. Finally, SNV analysis is performed to investigate genes of interest.

Pipeline Workflow

WGS Data Processing

QC: xengsort classify sorts reads into graft (human), host (mouse), both, ambiguous, and neither. Only graft, both, and combined datasets proceed.
Formatting: the final BAM file is sorted and marked for duplicates

Structural Variant Detection

Detection: Three SV detection tools are deployed including Manta, Delly, and SVABA.
Merge: SV calls are merged using SURVIVOR in order to reduce false positive calls.

Copy Number Variant Detection

Detection: GATK4 is used to analyze somatic copy number.
Calculation: DoAbsolute is used to calculate the absolute copy number.

Chromothripsis Analysis

Formatting: SV and CNV results are formatted into proper data formats.
Detection: ShatterSeek is used to detect chromothripsis.

Single Nucleotide Variant Analysis

Detection: GATK Mutect2 detects SNVs in genes of interest.
Annotation: ANNOVAR is used to annotate SNV calls.

Requirements

Software

Core: Nextflow (v24.04.2), Bash (v5.3), R (v4.5.1)
Tools: SAMtools (v1.21), BBMap (v39.26), Xengsort (v2.0.8), BWA (v0.7.19), SeqKit (v2.10.0), Trimmomatic (v0.39), BCFtools (v1.22), Picard (v3.4.0), Manta (v1.6.0), Delly (v1.3.3), SvABA (v1.2.0), SURVIVOR (v1.0.7), Tabix (v1.11), bedtools (v2.30.0), AnnotSV (v3.4.6), GATK (v4.6.2.0), GDC client (v2.3.0), ShatterSeek (v1.1), GATK Mutect2 (v4.6.0.0), ANNOVAR (v2025Mar02)

Reference Data

Human genome: hg38 (UCSC-annotated FASTA)
Mouse genome: mm39 (UCSC-annotated FASTA)
GENCODE annotation: hg38 GFF (recommended)

Input Files

Sample sheet (CSV): Columns for sample_id and cram_path.
CRAM files: Located in refs/ directory unless otherwise specified.
Reference genomes: Stored in refs/ directory.

Running the Pipeline

In order to run the pipeline, the sample CRAM files and reference files must be located in the refs/ directory. Additionally, your sample sheet must be labelled 'samplesheet.csv' and must be located in the home directory of the project.

Create or activate the Nextflow Conda environment If you don’t already have a Nextflow environment:

conda create -n nextflow_base nextflow
conda activate nextflow_base

Or, if you already have it:

conda activate nextflow_base

Run the pipeline

nextflow run main.nf -profile conda,singularity,cluster

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
docs		docs
envs		envs
modules		modules
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
matched_normal.csv		matched_normal.csv
nextflow.config		nextflow.config
samplesheet.csv		samplesheet.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Flynn Lab Pediatric Osteosarcoma Tumor WGS Variant Detection Pipeline

Overview

Pipeline Workflow

Requirements

Input Files

Running the Pipeline

About

Uh oh!

Releases

Packages

Languages

License

BU-BMSIP/Flynn_WGS_Analysis

Folders and files

Latest commit

History

Repository files navigation

Flynn Lab Pediatric Osteosarcoma Tumor WGS Variant Detection Pipeline

Overview

Pipeline Workflow

Requirements

Input Files

Running the Pipeline

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages