DKFZ SNVCalling Workflow

An SNV calling workflow developed in the Applied Bioinformatics and Theoretical Bioinformatics groups at the DKFZ. An earlier version (pre Github) of this workflow was used in the Pancancer project.

Your opinion matters! The development of this workflow is supported by the German Network for Bioinformatic Infrastructure (de.NBI). By completing this very short (30-60 seconds) survey you support our efforts to improve this tool.

Installation

To run the workflow you first need to install a number of components and dependencies.

You need a working Roddy installation. The version depends on the workflow version you want to use. You can find it in the buildinfo.txt under 'RoddyAPIVersion'. Please follow the instructions for the installation of Roddy itself and the PluginBase and the DefaultPlugin components. The main reference here is the Roddy documentation.
Install the version you need -- either from the release tarballs or with git clone into your plugin directory.

Furthermore you need a number of tools and of course reference data, like a genome assembly and annotation databases.

Tool installation

The workflow contains a description of a Conda environment. A number of Conda packages from BioConda are required.

First install the BioConda channels:

conda config --add channels r
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda config --add channels bioconda-legacy

Then install the environment

conda env create -n SNVCallingWorkflow -f $PATH_TO_PLUGIN_DIRECTORY/resources/analysisTools/snvPipeline/environments/conda.yml

The name of the Conda environment is arbitrary but needs to be consistent with the condaEnvironmentName variable. The default for that variable is set in resources/configurationFiles/analysisSNVCalling.xml.

Note that the Conda environment not exactly the same as the software stack used for the Pancancer project.

PyPy

PyPy is an alternative Python interpreter. Some of the Python scripts in the workflow can use PyPy to achieve higher performance by employing a fork of hts-python. Currently, this is not implemented for the Conda environment. For most cases you therefore should set the PYPY_OR_PYTHON_BINARY variable to just python to use the Python binary from the Conda environment. You could set up a resources/analysisTools/snvPipeline/environments/conda_snvAnnotation.sh similar to the tbi-lsf-cluster_snvAnnotation.sh file in the same directory.

Reference data installation

TBD

Running the workflow

Configuration Values

Switch	Default	Description
bamfile_list	empty	Semicolon-separated list of BAM files, starting with the control's BAM. Each BAM file needs an index file with the same name as the BAM, but ".bai" suffixed
sample_list	empty	Semicolon-separated list of sample names in the same order as `bamfile_list`
possibleTumorSampleNamePrefixes	"( tumor )"	Bash-array of tumor sample name prefixes
possibleControlSampleNamePrefixes	"( control )"	Bash-array of control sample name prefixes
CHROMOSOME_INDICES	empty	Bash-array of chromosome names to which the analysis should be restricted
CHROMOSOME_LENGTH_FILE	empty	Headerless TSV file with chromosome name, chromosome size columns
CHR_SUFFIX	""	Suffix added to the chromosome names
CHR_PREFIX	""	Prefix added to the chromosome names
extractSamplesFromOutputFiles	true	Refer to the documentation of the COWorkflowBasePlugin for further information
PYPY_OR_PYTHON_BINARY	pypy	The binary to use for a some of the Python scripts. For `filter_PEoverlap.py` using a PyPy binary here also triggers the use of hts-python instead of pysam.

Example Call

roddy.sh run projectConfigurationName@analysisName patientId \
--useconfig=/path/to/your/applicationProperties.ini --configurationDirectories=/path/to/your/projectConfigs \
--useiodir=/input/directory,/output/directory/snv \
--usePluginVersion=SNVCallingWorkflow:1.3.2 \
--cvalues="bamfile_list:/path/to/your/control.bam;/path/to/your/tumor.bam,sample_list:normal;tumor,possibleTumorSampleNamePrefixes:tumor,possibleControlSampleNamePrefixes:normal,REFERENCE_GENOME:/reference/data/hs37d5_PhiX.fa,CHROMOSOME_LENGTH_FILE:/reference/data/hs37d5_PhiX.chromSizes,extractSamplesFromOutputFiles:false"

No Control

TBD

Cross-Species Contaminations

In coding regions, the expected proportion of synonymous mutations compared to the total number of mutations should be low. By contrast, a high proportion of synonymous mutations suggests cross-species contamination. Any value above 0.5 (i.e. at least 50% of mutations are synonymous) is indicating a contamination. A value below 0.35 is considered to be OK. Values in the range of 0.35-0.5 are unclear.

Contributors

Have a look at the Contributors file.

Name		Name	Last commit message	Last commit date
Latest commit History 219 Commits
docs/images		docs/images
resources		resources
src/de/dkfz/b080/co		src/de/dkfz/b080/co
.arcconfig		.arcconfig
.gitignore		.gitignore
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE		LICENSE
README.md		README.md
README.snvCallingAnalysis.txt		README.snvCallingAnalysis.txt
SNVCallingWorkflow.iml		SNVCallingWorkflow.iml
SNVCallingWorkflow.jar		SNVCallingWorkflow.jar
SNVCallingWorkflow_1.3.2.iml		SNVCallingWorkflow_1.3.2.iml
__init__.py		__init__.py
buildinfo.txt		buildinfo.txt
buildversion.txt		buildversion.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DKFZ SNVCalling Workflow

Installation

Tool installation

PyPy

Reference data installation

Running the workflow

Configuration Values

Example Call

No Control

Cross-Species Contaminations

Contributors

About

Uh oh!

Releases

Packages

Languages

License

DKFZ-UNITE-Administration/SNVCallingWorkflow

Folders and files

Latest commit

History

Repository files navigation

DKFZ SNVCalling Workflow

Installation

Tool installation

PyPy

Reference data installation

Running the workflow

Configuration Values

Example Call

No Control

Cross-Species Contaminations

Contributors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages