Skip to content

Commit 7145dc4

Browse files
authored
Update Snakemake 8 and Gather/Scatter Indel Calling (#13)
* added pysam * current changes to changelog * implemented scatter and gather for first round htc * removed optional quantification - should be required * caught error when unknown bases occur in wildtype * remove unwanted print * added routine to select mhc class type * splitting in mutect2 analysis for speedup * rename rule name * combine single-end and paired-end reads to prepare input for mhc-II genotyping * added instructions for Snakemake 8 * updated minimum version of Snakemake to 8.x.x * gather scatter for indel calling * added instructions to Snakemake 8 and apptainer replaces singularity * added routine to ease the use of custom variants * refactor hlatyping to combine read retrieval for MHC-I and MHC-II * outsource rules for custom variants to improve readability * added reference sets for hla alleles (to compare against) * added separate rules for MHC-II prediction tools download * accept wildcard <group> as parameter to improve usability * Remove for check for valid alleles - this is now done later to include also user-provided ones * change to singe file input * add routine for MHC-I and MHC-II into same script * add safety routine is no counts can be found (when no seqdata present) * added custom rules * added parameters for alignment to config * changed order when adding INFO tags * added sorting routine * safety routines added * outsource merging of predicted mhccII alleles * added few parameters * added to feature list * changed path to provided hlahd path * hlhd call as non-file parameter * added changes to path also to testconfig
1 parent 2928171 commit 7145dc4

30 files changed

Lines changed: 4927 additions & 456 deletions

.tests/integration/config_basic/config.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@ data:
1818
hlatyping:
1919
MHC-I:
2020
MHC-II:
21-
readgroups:
2221

2322
### pre-processing (only applied on fastq reads)
2423
preproc:
@@ -80,14 +79,15 @@ indel:
8079
sliprate: 0.1 # frequency of slippage when it is supsected
8180

8281
quantification:
83-
activate: true
8482
mode: BOTH # RNA, RNA or BOTH
8583

8684
hlatyping:
8785
class: I # I, II or BOTH
8886
# specific path for class II hlatyping (only required when class: II, or BOTH)
8987
MHC-I_mode: DNA, RNA # DNA, RNA, or BOTH (if empty alleles have to be specified in custom)
9088
MHC-II_mode: BOTH # DNA, RNA, or BOTH (if empty alleles have to be specified in custom)
89+
90+
hlahd_path: ./hlahd.1.7.0/
9191
freqdata: ./hlahd_files/freq_data/
9292
split: ./hlahd_files/HLA_gene.split.txt
9393
dict: ./hlahd_files/dictionary/

CHANGELOG.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,30 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.2.0] - 2024-02-25
9+
10+
### Features
11+
12+
- ScanNeo2 supports Snakmake>=8
13+
- --use-conda replaced by --software-deployment-method conda
14+
- --use-singularity replaced by --software-deployment-method apptainer
15+
- Gather/scatter of the indel calling speeds up ScanNeo2 on multiple cores
16+
- added script to split bamfiles by chromosome (scripts/split_bam_by_chr.py)
17+
- haplotypecaller first/final round is done per chromosome and later merged
18+
- mutect2 is done per chromosome and later merged
19+
- Genotyping MHC-II works now on both single-end and paired-end
20+
- User-defined HLA alleles are matched against the hla refset
21+
- Added multiple routine to catch errors when only custom variants are provided
22+
- Added additional parameters in config file
23+
24+
### Fix
25+
26+
- When using BAMfiles the HLA typing wrongly expected single-end reads and performed preprocessing
27+
- Each environment is no thoroughly versioned to ensure interoperability
28+
- Missing immunogenicity calculation on certain values of MHC-I fixed
29+
- Fixed prediction of binding affinity in MHC-II (as the columns are different from MHC-I)
30+
31+
832
## [0.1.6] - 2024-02-13
933

1034
### Fix

README.md

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
<div align="left">
22
<h1>ScanNeo2</h1>
3-
<img src="https://img.shields.io/badge/snakemake-≥6.4.1-brightgreen.svg">
3+
<img src="https://img.shields.io/badge/snakemake-≥8.0.0-brightgreen.svg">
44
<img src="https://github.com/ylab-hi/ScanNeo2/actions/workflows/linting.yml/badge.svg" alt="Workflow status badge">
55
</div>
66

@@ -29,9 +29,10 @@ To get started with ScanNeo2, follow the steps below:
2929
mamba activate scanneo2
3030
```
3131

32-
Note: This installs Snakemake v7.32.x. In its current form, ScanNeo2 is not comptabile with Snakemake >= 8.x.x.
33-
If ScanNeo2 is configured to use the exitron module, singularity needs to be installed. For that, the
34-
`environment_singularity.yml' can be used. However, most HPC servers provide their own module installation.
32+
Note: ScanNeo2 requires Snakemake >= 8.x.x is not compatible with Snakemake <= 8.x.x. If ScanNeo2
33+
is configured to use the exitron module, apptainer (formerly singularity) needs to be installed.
34+
For that, the `environment_apptainer.yml` can be used. However, most HPC servers provide their own
35+
module installation (which should be preferred)
3536

3637
2. Deploy ScanNeo2:
3738

@@ -66,13 +67,13 @@ To run the workflow, use the following command:
6667
6768
```bash
6869
cd /path/to/your/working/directory/
69-
snakemake --cores all --use-conda
70+
snakemake --cores all --software-deployment-method conda
7071
```
7172

72-
As mentioned above, when exitron detection is activated the singularity option `--use-singularity` has to be used as well.
73+
As mentioned above, when exitron detection is activated the singularity option `--software-deployment-method apptainer` has to be used as well.
7374

7475
```bash
75-
snakemake --cores all --use-conda --use-singularity
76+
snakemake --cores all --software-deployment-method conda apptainer
7677
```
7778

7879
In addition, custom configfiles can be configured using `--configfile <path/to/configfile>`. In principle, this merely
@@ -101,7 +102,19 @@ ScanNeo2 provides an accessible, efficient method for predicting neoantigens. It
101102

102103
## Citation
103104

104-
If ScanNeo2 has proven useful in your work please cite it using the linked publication.
105+
@article{Schafer2023Nov,
106+
author = {Sch{\ifmmode\ddot{a}\else\"{a}\fi}fer, Richard A. and Guo, Qingxiang and Yang, Rendong},
107+
title = {{ScanNeo2: a comprehensive workflow for neoantigen detection and immunogenicity prediction from diverse genomic and transcriptomic alterations}},
108+
journal = {Bioinformatics},
109+
volume = {39},
110+
number = {11},
111+
pages = {btad659},
112+
year = {2023},
113+
month = nov,
114+
issn = {1367-4811},
115+
publisher = {Oxford Academic},
116+
doi = {10.1093/bioinformatics/btad659}
117+
}
105118

106119
## License
107120

config/config.yaml

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@ data:
1616
hlatyping:
1717
MHC-I:
1818
MHC-II:
19-
readgroups:
2019

2120
### pre-processing (only applied on fastq reads)
2221
preproc:
@@ -29,11 +28,11 @@ preproc:
2928

3029
### alingment
3130
align:
32-
minovlps: 10
33-
chimsegmin: 20
34-
chimoverhang: 10
35-
chimmax: 50
36-
chimmaxdrop: 30
31+
chimSegmentMin: 20
32+
chimScoreMin: 10
33+
chimJunctionOverhangMin: 10
34+
chimScoreDropMax: 30
35+
chimScoreSeparation: 10
3736

3837
### variant calling
3938
# alternative splicing
@@ -77,7 +76,6 @@ indel:
7776
sliprate: 0.1 # frequency of slippage when it is supsected
7877

7978
quantification:
80-
activate: true
8179
mode: BOTH # RNA, RNA or BOTH
8280

8381
hlatyping:
@@ -86,6 +84,9 @@ hlatyping:
8684
# specific path for class II hlatyping (only required when class: II, or BOTH)
8785
MHC-I_mode: BOTH # DNA, RNA, or BOTH (if empty alleles have to be specified in custom)
8886
MHC-II_mode: BOTH # DNA, RNA, or BOTH (if empty alleles have to be specified in custom)
87+
88+
# specific path for class II hlatyping (only required when class: II, or BOTH)
89+
hlahd_path: ./hlahd.1.7.0/
8990
freqdata: ./hlahd_files/freq_data/
9091
split: ./hlahd_files/HLA_gene.split.txt
9192
dict: ./hlahd_files/dictionary/

environment.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,5 @@ channels:
44
- conda-forge
55
- anaconda
66
dependencies:
7-
- snakemake=7.32.3
7+
- snakemake=8.4.11
88
- snakemake-wrapper-utils
Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,7 @@ channels:
44
- conda-forge
55
- anaconda
66
dependencies:
7-
- snakemake=7.32.3
7+
- snakemake=8.4.11
88
- snakemake-wrapper-utils
9+
- apptainer
10+

workflow/Snakefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from snakemake.utils import min_version
22

33
##### set minimum snakemake version #####
4-
min_version("6.4.1")
4+
min_version("8.0.0")
55

66
#### setup #######
77
configfile: "config/config.yaml"
@@ -23,6 +23,7 @@ include: "rules/genefusion.smk"
2323
include: "rules/altsplicing.smk"
2424
include: "rules/exitron.smk"
2525
include: "rules/indel.smk"
26+
include: "rules/custom.smk"
2627
include: "rules/germline.smk"
2728
include: "rules/annotation.smk"
2829
include: "rules/prioritization.smk"

workflow/envs/basic.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,4 @@ dependencies:
99
- pyfaidx
1010
- biopython=1.78
1111
- gffutils
12+
- pysam

workflow/rules/align.smk

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
### align reads to genome using STAR (when reads are in FASTQ)
22
if config['data']['rnaseq_filetype'] == '.fastq' or config['data']['rnaseq_filetype'] == '.fq':
3-
rule star_fq_paired_end:
3+
rule star_align_fastq:
44
input:
55
unpack(get_star_input),
66
faidx = "resources/refs/genome.fasta.fai",
@@ -19,22 +19,23 @@ if config['data']['rnaseq_filetype'] == '.fastq' or config['data']['rnaseq_filet
1919
--outSAMattributes RG HI \
2020
--outSAMattrRGline ID:{wildcards.group} \
2121
--outFilterMultimapNmax 50 \
22-
--peOverlapNbasesMin 20 \
22+
--peOverlapNbasesMin 15 \
2323
--alignSplicedMateMapLminOverLmate 0.5 \
2424
--alignSJstitchMismatchNmax 5 -1 5 5 \
2525
--chimOutType WithinBAM HardClip \
26-
--chimSegmentMin 20 \
27-
--chimJunctionOverhangMin 10 \
28-
--chimScoreDropMax 30 \
26+
--chimSegmentMin {config["align"]["chimSegmentMin"]} \
27+
--chimJunctionOverhangMin {config["align"]["chimJunctionOverhangMin"]} \
28+
--chimScoreDropMax {config["align"]["chimScoreDropMax"]} \
29+
--chimScoreMin {config["align"]["chimScoreMin"]} \
2930
--chimScoreJunctionNonGTAG 0 \
30-
--chimScoreSeparation 1 \
31+
--chimScoreSeparation {config["align"]["chimScoreSeparation"]} \
3132
--chimSegmentReadGapMax 3 \
3233
--chimMultimapNmax 50 \
3334
--outSAMstrandField intronMotif"""
3435
threads: config['threads']
3536
wrapper:
3637
"v2.2.1/bio/star/align"
37-
38+
3839
### align reads to genome using STAR (when reads are in BAM - no preprocessing performed)
3940
if config['data']['rnaseq_filetype'] == '.bam':
4041
checkpoint split_bamfile_RG:
@@ -88,12 +89,17 @@ if config['data']['rnaseq_filetype'] == '.bam':
8889
extra=lambda wildcards: f"""--outSAMtype BAM Unsorted --genomeSAindexNbases 10 \
8990
--readFilesCommand zcat \
9091
--outSAMattributes RG HI --outSAMattrRGline ID:{wildcards.rg} \
91-
--outFilterMultimapNmax 50 --peOverlapNbasesMin 20 \
92+
--outFilterMultimapNmax 50 \
93+
--peOverlapNbasesMin 15 \
9294
--alignSplicedMateMapLminOverLmate 0.5 \
9395
--alignSJstitchMismatchNmax 5 -1 5 5 \
94-
--chimOutType WithinBAM HardClip --chimSegmentMin 20 \
95-
--chimJunctionOverhangMin 10 --chimScoreDropMax 30 \
96-
--chimScoreJunctionNonGTAG 0 --chimScoreSeparation 1 \
96+
--chimOutType WithinBAM HardClip \
97+
--chimSegmentMin {config["align"]["chimSegmentMin"]} \
98+
--chimJunctionOverhangMin {config["align"]["chimJunctionOverhangMin"]} \
99+
--chimScoreDropMax {config["align"]["chimScoreDropMax"]} \
100+
--chimScoreMin {config["align"]["chimScoreMin"]} \
101+
--chimScoreJunctionNonGTAG 0 \
102+
--chimScoreSeparation {config["align"]["chimScoreSeparation"]} \
97103
--chimSegmentReadGapMax 3 --chimMultimapNmax 50 \
98104
--outSAMstrandField intronMotif"""
99105
threads: config['threads']

0 commit comments

Comments
 (0)