Current release: v0.0.5
RecMpox is a command-line tool that flags potential recombination events in monkeypox viruses. It does not confirm recombination, but highlights genomes that may be recombinant and warrant further investigation. RecMpox works by detecting regions within a genome that appear to originate from two different parental viruses. Such patterns are not conclusive evidence of recombination, as similar signals can also arise from shared ancestral variation, convergent mutations, mixed populations (e.g., co-infections or laboratory contamination), or sequencing and assembly errors.
- References are required: RecMpox compares your genomes against two reference sequences (for example, Clade Ia vs. Ib, or Ib vs. IIb), because recombination can only occur between two distinct lineages.
- Alignment and diagnostic SNPs: The two reference genomes are aligned using Squirrel, so that the same genomic positions correspond across all sequences. RecMpox then identifies positions where the two references differ at the same coordinates. These positions are defined as diagnostic SNPs, because they distinguish between the reference lineages. Positions where the references are identical are ignored, as they do not provide information for detecting recombination.
- Consensus genome classification: Your consensus genomes are aligned to the same references. At each diagnostic SNP, the base is classified as matching reference 1, reference 2, or other (e.g., gaps or ambiguous bases).
- Flagging potential recombinants: If both references contribute at least 10% of the diagnostic positions in a genome, RecMpox flags it as a potential recombinant, since no single lineage clearly dominates.
- Recombination tracts and breakpoints: By examining the pattern of reference matches along the genome, RecMpox infers recombination tracts and identifies their breakpoints (start and end positions).
- Phylogeny of recombinant ancestors (optional): With
-phylogeny, RecMpox builds trees from the extracted Ia and Ib tract sequences and infers the nearest outbreak ancestor for each tract, which helps confirm or reject recombination. - Outputs:
- TSV file: For each genome, reports the number and proportion of diagnostic SNPs matching each reference, the resulting recombinant flag, and summary statistics used for tract inference.
- Interactive HTML report: Provides sortable tables, summary plots, per-sample visualisations, and genome-wide displays of inferred recombination tracts and breakpoints.
- With
-phylogeny: Phylogeny folder with alignment, midpoint-rooted tree, and PDF/SVG figures; the tree is also included in the HTML report.
🔴 Caution — Intra-clade comparisons: When comparing within a clade, set the threshold higher than the default. Intra-clade comparisons yield far fewer diagnostic SNPs (e.g. as low as ~120 between Ia and Ib), meaning each individual SNP carries more weight and small percentages can arise from convergent evolution rather than true recombination. We recommend a minimum threshold of 20% (-m 20) for intra-clade recombinant screening.
First, install conda if you haven't already:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.shThen, ensure you have the required channels:
conda config --add channels conda-forge
conda config --add channels bioconda
conda config --add channels defaults
conda config --set channel_priority strictStandard install RecMpox via Conda:
conda create -n recmpox -c conda-forge -c bioconda recmpox -y
conda activate recmpoxOR, if the above fails:
conda create -y -n recmpox -c conda-forge -c bioconda recmpox python=3.11 --solver libmamba --strict-channel-priority
conda activate recmpox-
Create conda environment with required tools and install RecMpox
git clone https://github.com/DaanJansen94/RecMpox.git cd RecMpox conda env create -f environment-recmpox.yml conda activate recmpox pip install .
-
Re-installation (when updates are available):
conda activate recmpox # Make sure you're in the right environment cd RecMpox git pull # Get the latest updates from GitHub conda env update -n recmpox -f environment-recmpox.yml --prune pip uninstall -y recmpox pip install .
# Use built-in references: consensus of 5 earliest genomes obtained from sustained outbreaks
recmpox -i fasta/ -o output -ref Ia,Ib -t 4
recmpox -i fasta/ -o output -ref Ib,IIb -t 4
recmpox -i fasta/ -o output -ref Ia,Ib -phylogeny
# Input can be: FASTA file, directory of .fa/.fasta/.fna, or NCBI accession(s)
recmpox -i consensus.fa -o output -ref Ia,Ib
recmpox -i OZ375330.1 -o output -ref Ib,IIb # UK recombinant case example
recmpox -i accessions.txt -o output -ref Ia,Ib # one accession per line or comma-separatedNote: Either -ref (e.g. Ia,Ib, IIa,IIb, or Ib,IIb) or both -ref1 and -ref2 are required. With -ref, RecMpox downloads the earliest 5 genomes per selected clade (via Pathoplexus) and builds one consensus reference per clade, which are then used as -ref1/-ref2 for the run.
-i, --input: Input: FASTA file, directory of.fa/.fasta/.fna,.txtfile of accessions (one per line or comma-separated), or NCBI accession(s). Accessions are downloaded and used as queries.
-ref: Reference pair:Ia,IborIIa,IIb. Uses built-in defaults.-ref1,-ref2: Custom references (path or NCBI accession). Use with-ref1_g/-ref2_gfor labels (e.g.-ref1_g Ia -ref2_g Ib).
-o, --output: Output directory-ref1_g,-ref2_g: Genotype labels for TSV/HTML (default from-refor accession)-m, -MDRF: Minor diagnostic recombinant fraction (%) threshold for calling "potential recombinant" (default: 10). Increase to be more conservative (e.g. 15, 20).-include-indels: Include diagnostic indels (default: SNPs only)-min-indel-size: Min indel length (bp) when using-include-indels(default: 100)-extract-tracts: Split recombinant genomes by ancestry-phylogeny: Phylogeny of recombinant ancestors-t, --threads: Number of threads-q, --quiet: Log to file only
# Custom references
recmpox -i fasta/ -o output -ref1 NC_003310.1 -ref2 PP601219.1 -ref1_g Ia -ref2_g Ib -t 4
# Mixed clades (e.g. Ia vs IIb)
recmpox -i fasta/ -o output -ref1 ACC1 -ref2 ACC2 -ref1_g Ia -ref2_g IIb
# Phylogeny of recombinant tracts
recmpox -i fasta/ -o output -ref Ia,Ib -phylogeny- recmpox_results.tsv: Per-genome counts (n_ref1, n_ref2, n_other), percentages (pct_ref1, pct_ref2, pct_other), and recombinant call (no recombinant / potential recombinant).
- recmpox_results.html: Interactive report (summary, sortable table, stacked bar chart, diagnostic SNP positions, diagnostic sites per sample, recombination tracts and breakpoints per sample). Split into multiple files + index when >100 genomes.
- potential_recombinants_diagnostic_sites.tsv: Diagnostic site classification per potential recombinant (when any exist).
- diagnostic_snps.txt: List of diagnostic SNP positions (ref1 vs ref2 alleles).
- .recmpox.log: Log file (in output directory).
- With -phylogeny: phylogeny/ folder with phylogeny_alignment.fasta, phylogeny_tree.treefile (midpoint-rooted), and the tree figure in phylogeny_tree.pdf and phylogeny_tree.svg.
- No recombinant: One ref dominates (minor ref < 10% of diagnostic sites).
- Potential recombinant: Both refs contribute ≥10% (minor ref % ≥ 10%). The HTML report shows recombination tracts (beginning/end of each tract) and breakpoints between tracts. A single tract means the genome is entirely one clade (no recombination).
- With --phylogeny: You see the inferred origin of each recombinant tract—which outbreak each ancestral tract is closest to (e.g. sh2017IIb, sh2023Ib).
- High pct_other: Many Ns, gaps, or non-ref bases at diagnostic sites (poor coverage or alignment).
Example HTML output for one sample:
If you use RecMpox in your research, please cite:
Jansen, D., & Vercauteren, K. RecMpox: A Command-Line Tool for Flagging Potential Recombination Events in Monkeypox Viruses (v0.0.5). Zenodo. https://doi.org/10.5281/zenodo.18495962
RecMpox integrates several external bioinformatics tools. Please also cite these tools as appropriate when using RecMpox:
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
If you encounter any problems or have questions, please open an issue on GitHub.
