A fast, parallel Rust implementation of Prodigal, a tool for finding protein-coding genes in microbial genomes.
Orphos is a high-performance reimplementation of Prodigal, the widely-used prokaryotic gene prediction tool. Written in Rust, Orphos delivers the same accurate gene finding algorithm with improved performance and modern language features.
- Faster Performance: Multi-threaded processing using Rayon for parallel genome analysis
- Memory Efficient: Optimized memory usage for handling large genomes and metagenomes
- Browser Support: Unique WebAssembly build - runs in web browsers
- 100% Compatible: Output formats fully compatible with original Prodigal (GFF3, GenBank, etc.)
- Modern Codebase: Written in safe Rust with excellent error handling
- Multiple Interfaces: CLI, Rust library, Python bindings, and WebAssembly
- Easy Installation: Available via Cargo, pip, Homebrew, and Conda
Orphos is available in multiple forms:
orphos-cli: Command-line interface for gene predictionorphos-core: Rust library for integrating into your own projectsorphos-python: Python bindings (via PyO3)orphos-wasm: WebAssembly module for browser/Node.js usage
- High Performance: Multi-threaded processing using Rayon
- Memory Efficient: Optimized memory usage for large genomes
- Compatible: Output format compatible with original Prodigal
- Cross-Platform: Works on Linux, macOS, and Windows
brew install FullHuman/tap/orphoscargo install orphos-cligit clone https://github.com/FullHuman/orphos.git
cd orphos
cargo install --path orphos-clipip install orphosAdd to your Cargo.toml:
[dependencies]
orphos-core = "0.1.0"# Basic gene prediction
orphos -i input.fasta -o output.gbk
# Output in GFF format
orphos -i input.fasta -f gff -o output.gff
# Use custom training file
orphos -i input.fasta -t training.trn -o output.gbk
# Metagenomic mode
orphos -i metagenome.fasta -p meta -o output.gffimport orphos
# Analyze a FASTA file
result = orphos.analyze_file("genome.fasta")
print(f"Found {result.gene_count} genes")
print(result.output) # GenBank formatted output
# Analyze a sequence string
fasta_string = """>seq1
ATGCGATCGATCGATCGATCG...
"""
result = orphos.analyze_sequence(fasta_string)
# Customize options
options = orphos.OrphosOptions(
mode="meta", # Use metagenomic mode
format="gff", # Output in GFF format
closed_ends=True, # Don't allow genes off edges
translation_table=11 # Use translation table 11
)
result = orphos.analyze_file("genome.fasta", options)use orphos_core::{OrphosAnalyzer, config::OrphosConfig};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create analyzer with default configuration
let mut analyzer = OrphosAnalyzer::new(OrphosConfig::default());
// Analyze a genome sequence
let results = analyzer.analyze_sequence(
"ATGCGATCGATCG...",
Some("MyGenome".to_string())
)?;
println!("Found {} genes", results.genes.len());
Ok(())
}For more advanced usage with type-safe training:
use orphos_core::engine::{UntrainedOrphos, Orphos, Untrained};
use orphos_core::config::OrphosConfig;
use orphos_core::sequence::encoded::EncodedSequence;
// Create an untrained analyzer
let mut untrained = UntrainedOrphos::with_config(OrphosConfig::default())?;
// Encode the sequence
let encoded = EncodedSequence::without_masking(b"ATGCGATCGATCG...");
// Train on the genome (type changes to TrainedOrphos)
let trained = untrained.train_single_genome(&encoded)?;// TODO: Add documentation links
Orphos supports multiple output formats:
- GenBank (GBK): Rich feature annotation format (default)
- GFF3: General Feature Format version 3
- GCA: Gene coordinate annotation
- SCO: Simple coordinate output
- Single Genome Mode: Train on a complete genome for optimal gene prediction (default)
- Metagenomic Mode: Predict genes in fragmented or mixed sequences
- Parallel Processing: Multi-threaded execution using Rayon
- Memory Efficient: Optimized memory usage for large genomes
- High Performance: Significantly faster than the original C implementation
# Run all tests
cargo test
# Run with coverage
cargo install cargo-tarpaulin
cargo cov-fast
# Run benchmarks
cargo benchWe welcome contributions! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the GPL-3.0 License - see the LICENSE file for details.
This project is inspired by the original Prodigal by Doug Hyatt.