Skip to content

helloftroy/eukaryote_neural_net

Repository files navigation

🧬 Eukaryote Neural Net

Modeling eukaryotic genomes using machine learning to identify functional elements like enhancers, introns, and promoter strength.


🧬 Overview

This project leverages deep learning to parse and predict various genomic features within eukaryotic DNA sequences. By training on established genomic datasets, these models provide high-accuracy mapping of regulatory and structural elements.


🚀 Key Features

🧬 Genomic Encoding

Robust scripts for converting raw FASTA or nucleotide sequences into machine-learning-ready formats using one-hot encoding or k-mer embeddings.

🧠 Deep Learning Architectures

Implementation of specialized neural networks (such as CNNs or Transformers) tailored for long-range sequence analysis.

🔍 Predictive Modeling

Dedicated modules for:

  • Enhancer Regions: Identifying distal regulatory elements
  • Intron/Exon Boundaries: Predicting splice sites and gene structure
  • Promoter Strength: Quantifying potential transcriptional activity

🛠 Installation

To set up the environment and install necessary dependencies, run the following commands in your terminal:

# Clone the repository
git clone https://github.com/helloftroy/eukaryote_neural_net.git

# Navigate to the project directory
cd eukaryote_neural_net

# Install requirements
pip install -r requirements.txt

💻 Usage

1. Data Preparation

Preprocess your raw genomic data into the required tensor format:

python src/preprocess.py --input data/raw_genome.fasta

2. Training

Train the model using a configuration file:

python src/train.py --config configs/model_v1.yaml

3. Inference

Run predictions on a specific sequence:

python src/predict.py --sequence "ATGCGAT..."

🧪 Technical Details

  • Target Organisms: Eukaryotic genomes (adaptable for model organisms like S. cerevisiae or Arabidopsis)
  • Framework: PyTorch
  • Evaluation Metrics: Precision-Recall curves, F1-scores, and Mean Squared Error (MSE) for promoter quantification

⚠️ Notes

This project is a work in progress. Future updates may include support for non-model organisms and expanded regulatory element detection.


About

ML code / models use for modeling eukaryote genome. Determine enhancers, introns, promoter strength, etc

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors