Snakemake workflow for exome processing.
The workflow contains 5 separate Snakemake pipelines,
in order to effectively parallelize two key steps that have a long runtime and rather complex execution (MuTect2, VCFtoMAF)
Please follow the steps below to run the workflow (NOTE: ensure you change some file paths in Snakefiles (e.g. /path/to/S04380110_Covered.headless.bed):
git clone
this repocd
into ExomeProcess dir.cd
into each step in order (step 1 - 5) and submit respective step to H4H like the following example:sbatch snake_submit_step1.sh
(NOTE: in step 2, run snake_submit_step2.sh before snake_submit_step2merge.sh)
Below are details of the processes run in each step:
Step 1: (executes bwa alignment, picard MarkDuplicates, and GATK preprocessing steps)
Step 2: (executes MuTect2 in parallelized manner on split BED file - approx 20 min per sample runtime)
Step 2: (merges MuTect2 parallelized outputs per sample)
Step 3: (executes MuTect1, MuTect2 filtering, Varscan (CN, Somatic), Strelka, Sequenza, VCFIntersect)
Step 4: (executes hg19tohg38LiftOver)
Step 5: (executes VCFtoMAF. Make sure you run all .sh files for both hg19 and hg38)
OncoKb-Annotator was ran on all MAF's, but not using Snakemake, as it required Samwise with internet access. Jobs had to be parallelized on Samwise with *screen*, which Snakemake cannot track for validity. Script can be found under `oncokb/run_oncokb.sh`