BioSeeker

BioSeeker is a Python library and CLI tool for the analysis of codon and bicodon conservation rates across linked species.

This project facilitates calculating codon and bicodon conservation rates for a given genus. It is inspired by the paper Conservation of location of several specific inhibitory codon pairs in the Saccharomyces sensu stricto yeasts reveals translational selection (Ghoneim et al., 2019).

Features

Robust Parsing: Uses Biopython to handle standard FASTA/AFA formats.
Efficient Analysis: Vectorized calculations using NumPy for performance.
Conservation Metrics: Calculates conservation rates for both single codons and codon pairs (bicodons) across three reading frames (ORF 0, 1, 2).
RSCU Analysis: Calculates Relative Synonymous Codon Usage (RSCU) for each species across the entire dataset.
Aggregation: Automatically aggregates results from multiple alignment files.
Dockerized: specific container for easy deployment and usage.

Installation

From Source

Requires Python 3.9+.

git clone https://github.com/SouthernBio/BioSeeker.git
cd BioSeeker
pip install .

Development

python3 -m venv venv
source venv/bin/activate
pip install -e .
pip install pytest

Usage

Command Line

BioSeeker provides a simple CLI. Place your aligned FASTA files (extension .afa, .fasta, or .fa) in a directory (e.g., FASTA_files).

bioseeker --input FASTA_files --output results

Arguments:

--input, -i: Directory containing input alignment files (default: FASTA_files).
--output, -o: Directory to save CSV results (default: results).

RSCU Analysis

To calculate the global RSCU values:

python src/bioseeker/analysis/rscu_analysis.py FASTA_files results/global_rscu.csv

Docker

You can run BioSeeker without installing Python dependencies using Docker.

Build the image:

docker build -t bioseeker .

Run the container: Assuming your data is in the current directory under FASTA_files:

docker run -v $(pwd)/FASTA_files:/data/input -v $(pwd)/results:/data/output bioseeker --input /data/input --output /data/output

Output

The tool generates CSV files in the output directory:

codon_conservation_ORF0.csv, ORF1.csv, ORF2.csv
bicodon_conservation_ORF0.csv, ORF1.csv, ORF2.csv

Each file contains:

codon / codon_pair: The sequence.
ReferenceCount: Occurrences in the reference sequence.
ConservationCount: Number of times the codon was conserved (>90% match) across alignments.
ConservationRate: ConservationCount / ReferenceCount.

License

GNU General Public License v3.0. See LICENSE.

💙 Support this project

Your contribution would help SouthernBio in improving the quality of this project and adding additional features. If you find this project useful and/or interesting, please consider offering your support on Github Sponsors, Ko-Fi or PayPal

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
FASTA_files		FASTA_files
build/lib/bioseeker		build/lib/bioseeker
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
FUNDING.yml		FUNDING.yml
LICENSE.md		LICENSE.md
README.md		README.md
build.sh		build.sh
logo.png		logo.png
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

BioSeeker

Features

Installation

From Source

Development

Usage

Command Line

RSCU Analysis

Docker

Output

License

💙 Support this project

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Uh oh!

License

SouthernBio/BioSeeker

Folders and files

Latest commit

History

Repository files navigation

BioSeeker

Features

Installation

From Source

Development

Usage

Command Line

RSCU Analysis

Docker

Output

License

💙 Support this project

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages