PlasmidCommunity: A Software Suite for Klebsiella pneumoniae Plasmid Classification Framework, Assignment, and Transmission Risk Prediction
PlasmidCommunity is an open-source software suite designed to empower researchers in understanding Klebsiella pneumoniae plasmid biology through three analysis modules: classification, community assignment, and transmission risk prediction. Leveraging genomic similarity networks and advanced algorithms, this toolkit addresses critical challenges in tracking plasmid evolution, host interactions, and antimicrobial resistance dissemination.
If you have any inquiries, questions, bug reports, or other feedback, please contact us via the following means:
- Xinmiao Wu: wuxinmiao2024@163.com
- Zhenpeng Li: lizhenpeng@icdc.cn
We appreciate your feedback and are committed to improving PlasmidCommunity to better serve the research community.
Before using PlasmidCommunity, ensure that you have the following prerequisites installed and configured:
- Linux Environment: The software is designed to run on a Linux-based operating system. Familiarity with the Linux command line is required.
This will install the latest version direct from Github.
- git clone https://github.com/wuxinmiao5/PlasmidCommunity.git
- Add environment variables to bashrc
~/.bashrc- Add paths related to PlasmidCommunity
export PATH=${PATH}:/path/to/PlasmidCommunity/assignCommunity
export PATH=${PATH}:/path/to/PlasmidCommunity/PlasmidCommunity
export PATH=${PATH}:/path/to/PlasmidCommunity/PlasmidTransModel- Make the environment variables effective
source ~/.bashrc- R Programming Language: Install R on your system. Basic knowledge of R is necessary for certain parts of the analysis.
- Required R Packages: Install the necessary R packages using the following commands:
install.packages(c("readr", "readxl", "writexl","tidymodels", "tidyverse", "Biostrings", "seqinr","ape","dplyr","igraph","ggraph","tidygraph","ggplot2","vegan","ranger"))- FastANI: Ensure fastANI is installed and accessible in your system’s PATH. It is used for sequence similarity analysis.
- Prodigal: Install Prodigal for gene prediction. Download and install from Prodigal GitHub.
- BLAST: Install BLAST for sequence alignment. Download and install from NCBI BLAST.
conda create -n plasmidcommunity -c bioconda -c conda-forge plasmidcommunity- The comprehensive database containing 7,232 complete plasmid sequences of Klebsiella pneumoniae has been deposited in the ScienceDB platform (https://www.science-db.cn/). All data are publicly accessible via (https://cstr.cn/31253.47.sciencedb.23175.011A8CD2) under an open-access license (GNU GPL).
- Pre-trained Models. Ensure the following pre-trained models are available: 'binaryModel.Rdata' for binary classification. 'threeClassModel.Rdata' for three-class classification.
- Reference Protein Database: Provide a reference protein database (model3.fasta) for BLAST analysis.
Ensure all tools are properly installed and accessible in your system’s PATH. Place all input files in the correct directories as specified in the documentation.
PlasmidCommunity is a software designed for the classification and analysis of plasmid communities based on genomic similarity networks. It provides a robust framework for plasmid classification by setting thresholds for sequence similarity among plasmids, categorizing them accordingly, and constructing networks to predict the ability of each community to acquire new genes. The modes are provided: Silhouette, getCommunity and pan. This tool is particularly useful for researchers studying plasmid diversity, horizontal gene transfer, and microbial evolution.
The software is based on the methodology described in our paper, with minor modifications to enhance user-friendliness. For detailed information, please refer to the documentation.
To display the specific parameters and usage of the software, navigate to the directory containing the plasmid sequences and enter the following command:
$ plasmidCommunity.sh -hThis command will provide detailed information on how to use the Silhouette Coefficient module and its parameters.
The Silhouette Coefficient module is used to analyze the clustering quality of plasmid communities. To run this module, use the following command:
$ plasmidCommunity.sh -a silhouetteCurve -s input_plasmid_seq -o output_tagParameters:
-a|getMode: The input_Mode chosen silhouetteCurve.-s|plasmid_seq: The input_plasmid_seq the path of a directory containing plasmids genomes.-o|output_tag: Outputtag the output tag.
To obtain plasmid communities, use the following command:
$ plasmidCommunity.sh -a getCommunity -c treedist -d 0.13 -m 5 -o output_tagParameters:
-a|getMode: The input mode to choose, here is getCommunity.-c|fastani: The fastani result for input, it's the result saved by silhouetteCurve.-d|discutoff: The distance cutoff to generate community.-m|membercutoff: The minimum of community size.-o|output_tag: The output tag.
For pan-genome analysis, use the following command:
$ plasmidCommunity.sh -a pan -s input_plasmid_seq -f "./membership_info.txt" -m 5 -o output_tag"Parameters:
-a|getMode:The input_Mode chosen pan.-s|plasmid_seq: The path of a directory containing plasmids genomes.-f|membership_info: The membership file of the network nodes.-m|membercutoff: The minimum of community size.-o|output_tag: The output tag.
The assignCommunity is a tool for assigning plasmid community of Klebsiella pneumoniae based on Average Nucleotide Identity (ANI) using the fastANI algorithm. The tool takes a query plasmid and compares it against a collection of known plasmids to determine the most similar plasmid and the community it belongs.
The tool performs the following steps:
- Input Handling: Accepts paths to the query plasmid and a directory containing a collection of plasmids.
- ANI Calculation: Uses
fastANIto compute the ANI between the query plasmid and each plasmid in the collection. - Membership Assignment: Identifies the plasmid with the highest ANI to the query and retrieves its community information.
- Output: Generates a text file containing the query plasmid, the assigned community membership, and the size of the community.
$ assignCommunity.sh -a /data/lizhenpeng/wuxinmiao/plasmids -q GCA_015356015__CP064244.1.fasta -o outputParameters:
-a|allplasmidPath: The path of the directory containing the plasmid genomes.-q|queryPlasmidPath: The query plasmid file.-o|output_tag: The prefix of the output file.
Replace /data/lizhenpeng/wuxinmiao/plasmids with the path to the directory containing your plasmid database and /GCA_015356015__CP064244.1.fasta with the path to your query plasmid.
Output: The script will generate a file named membershipAssigned.txt in the current directory, containing the query plasmid, the assigned membership, and the size of the membership group.
The PlasmidTransModel is a tool for Klebsiella pneumoniae plasmid transmission risk prediction using machine learning models.
This tool provides a framework for plasmid transmission risk prediction using machine learning models. It supports two types of classification:
- Binary Classification: Distinguishing plasmids into two classes.
- Three-Class Classification: Distinguishing plasmids into three classes.
The models leverage k-mer frequency analysis and gene feature extraction to predict plasmid classes based on genomic sequences. The models are built using the tidymodels framework in R, and the script integrates external tools like Prodigal and BLAST for gene feature extraction.
To classify a plasmid into one of two classes, set modeltype="Binary". The script will:
- Extract 5-mer frequencies from the input genome.
- Use the pre-trained binary classification model to predict the class.
Example Command:
$ PlasmidTransModel.sh -a inputGenome -o output -m BinaryParameters:
-a|inputGenome: The input file in FASTA format.-o|output_tag: The prefix of the output file.-m|modelType: The model type to be chosen, two types of models can be chosen:BinaryorThreeClass.
Output:
The prediction results will be saved in a file named modelmerge_prediction_2class.txt.
To classify a plasmid into one of three classes, set modeltype="ThreeClass". The script will:
- Use
Prodigalto predict genes in the input genome. - Use
BLASTto match predicted genes against a reference database. - Extract gene features and 5-mer frequencies.
- Use the pre-trained three-class classification model to predict the class.
Example Command:
$ PlasmidTransModel.sh -a inputGenome -o output -m ThreeClassOutput:
The prediction results will be saved in a file named modelmerge_prediction_3class.txt.