Use a Bayesian classifier combined with a homology search to predict a virus replication cycle
conda create -n replidec
conda activate replidec
conda install -c conda-forge -c bioconda replidec
or
conda install -c denglab -c conda-forge -c bioconda replidecdocker pull quay.io/biocontainers/replidec:0.3.5--pyhdfd78af_0
docker run quay.io/biocontainers/replidec:0.3.5--pyhdfd78af_0 Replidec -h
## Example
docker run -v /your/host/data:/data/ quay.io/biocontainers/replidec:0.3.5--pyhdfd78af_0 Replidec -i data/your_inputfile -p
choose_mode_based_on_your_input_type -w dataIf you install using pip, please make sure that mmseqs, hmmsearch, and blastp are set to $PATH, these software can be equal to or higher than the version list below
-
MMseqs2 Version: 13.45111
-
HMMER 3.3.2 (Nov 2020)
-
Protein-Protein BLAST 2.5.0+
pip3 install ReplidecReplidec, Replication cycle prediction tool for prokaryotic viruses
options:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-p , --program { multi_fasta | genome_table | protein_table }
multi_fasta mode:
input is a fasta file and treat each sequence as one virus
genome_table mode:
input is a tab separated file with two columns
___1st column: sample name
___2nd column: path to the genome sequence file of the virus
protein_table mode:
input is a tab separated file with two columns
___1st column: sample name
___2nd column: path to the protein file of the virus
-i , --input_file The input file, which can be a sequence file or an index table
-w , --work_dir Directory to store intermediate and final results (default = ./Replidec_results)
-n , --file_name Name of final summary file (default = prediction_summary.tsv)
-t , --threads Number of parallel threads (default = 10)
-e , --hmmer_Eval E-value threshold to filter hmmer result (default = 1e-5)
-E , --hmmer_parameters
Parameters used for hmmer (default = --noali --cpu 3)
-m , --mmseq_Eval E-value threshold to filter mmseqs2 result (default = 1e-5)
-M , --mmseq_parameters
Parameter used for mmseqs
(default = -s 7 --max-seqs 1 --alignment-mode 3 --alignment-output-mode 0 --min-aln-len 40 --cov-mode 0 --greedy-best-hits 1 --threads 3)
-b , --blastp_Eval E-value threshold to filter blast result (default =1e-5)
-B , --blastp_parameter
Parameters used for blastp (default = -num_threads 3)
-d, --db_redownload Remove and re-download database
The database used in Replidec will be downloaded automatically.
Location: will be downloaded at the location where Replidec is installed
If you want to redownload the database, the -d parameter can be used. The older database will be moved to "discarded_db" in the workdir(-w); This dir can be removed manually by the user.
The input file is different based on different programs
Replidec offers 3 different programs:
- 'multi_fasta'
- 'genome_table'
- 'protein_table',
- input is a fasta file and treat each sequence as one virus.
-
Example: <your_path>/viral_contigs.fasta
>contig_1 TATCGATCGATCGATCGATCGATCGTACGTACGTACGTACG... >contig_2 CATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG... ...
-
-
input is a tab separated file with two columns.
- 1st column: sample name
- 2nd column: path to the genome sequence file of the virus
- Example: <your_path>/example_genomes.tsv
contig_1 your/file/path/contig_1.fasta contig_2 your/file/path/contig_2.fasta contig_3 your/file/path/contig_3.fasta ...
-
input is a tab separated file with two columns
- 1st column: sample name
- 2nd column: path to the protein file of the virus
- Example: <your_path>/example_proteins.tsv
contig_1_prot your/file/path/contig_1.fasta contig_2_prot your/file/path/contig_2.fasta contig_3_prot your/file/path/contig_3.fasta ...
The output directory can be assigned with -w , --work_dir , where the intermediate files and the final prediction results will be stored. The name of the final summary file can be assigned with the -n , --file_name argument.
At the end of the analysis, the output directory would contain the following:
- BC_Inno: This directory contains the result file for dectect Innovirues
- BC_mmseqs: This directory contains the result file for mapping result to our custom database
- BC_pfam: This directory contains the result file for dectect the Integrase and Excisionase
- BC_prodigal: This directory contains the result file for CDS prediction from genome or contig sequence. (If {-p protein_table} is used, this directory will not be created.)
- prediction_summary.tsv: This file is the summary file of the prediction result. It contains multiple columns.
-
sample_name: identifier. Can be a sequence ID or the first column of the plain text input file.
-
integrase_number: the number of genes mapped to integrase meet the creteria(set by -c).
-
excisionase_number: the number of genes mapped to excisionase meet the creteria(set by -c).
-
pfam_label: if it contains integrase or excisionase, the label will be "Temperate". Otherwise "Virulent".
-
bc_temperate: conditional probability of temperate|genes.
-
bc_virulent: conditional probability of virulent|genes.
-
bc_label: if bc_temperate greater than bc_virulent, label will be "Temperate". Otherwise "Virulent".
-
final_label: if pfam_label and bc_label both is Temperate, then label will be "Temperate"; if an Innovirues marker gene exists, then label will be "Chronic"; otherwise "Virulent".
-
match_gene_number: the number of genes mapped to our custom database.
-
path: path of input faa file
-
cd test
## Conda
## test passed - genome_table
replidec -p genome_table -i example/genome_test.small.index -w opt_folder_genome_table
## test passed - multi_fasta
replidec -p multi_fasta -i example/test.contig.small.fa -w opt_folder_multi_fasta
## test passed - protein_table
replidec -p protein_table -i example/example.small.list -w opt_folder_protein_table
## Docker
docker run -v /Your_path_clone_replidec/Replidec/test:/data/ quay.io/biocontainers/replidec:0.3.5--pyhdfd78af_0 Replidec -p multi_fasta -i /data/example/test.contig.small.new.fa -w /data/opt_folder_docker_multi_fasta
If the dataset cannot be automatically downloaded from Zenodo due to regional access restrictions, you may manually add it instead. The same database has also been uploaded to OSF as an alternative source.
-
Locate your Replidec installation path
After installing Replidec via Conda or Docker, locate the installed directory. Typically, it can be found at:your_conda_path/envs/env_name/lib/python*/site-packages/Replidec -
Navigate to the Replidec folder
Use the terminal to move into the directory:
cd your_conda_path/envs/env_name/lib/python*/site-packages/Replidec -
Download the database manually from OSF (Project name: Replidec)
Access the alternative download link here: 👉 https://osf.io/thpkb/files/osfstorage -
Extract the database
After downloading, extract the contents of the archive into the Replidec directory, and a folder named "db" will be created:tar -zxvf db_v0.3.2.tar.gz
✅ Note: Make sure the extracted folder can be found in this pathyour_conda_path/envs/env_name/lib/python*/site-packages/Replidec/db.
For now, everything is fixed. Enjoy playing with Replidec!