Skip to content

Genomic ML models for predicting and interpreting ribosome footprint profiles under different amino-acid starvation conditions

Notifications You must be signed in to change notification settings

vam-sin/riboclette

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧬🧑🏾‍💻 Riboclette: Conditional Deep Learning Model Reveals Translation Elongation Determinants during Amino Acid Deprivation

Welcome to Riboclette, a transformer-based deep learning model for predicting ribosome densities under various nutrient-deprivation conditions. Follow this tutorial to get started! 🚀


Pip Package

Riboclette can be easily installed as a package using which you can make predictions on new gene sequences, and obtain model derived attributions to understand the predictions!

🧀 Package Documentation: Riboclette on PyPI

pip install riboclette

Web Server 🌐🧬

We provide a web-based server where you can explore codon-level attributions for different genes in the dataset. This server allows you to visualize and analyze the model's predictions and interpretability results interactively.

🔗 Server Link: Ribotly

On the server, you can:

  • Select genes of interest from the dataset.
  • View codon-level attributions for each gene.
  • Analyze how nutrient-deprivation conditions affect ribosome densities at a single codon resolution.

Code Tutorial 📖✨

1️⃣ Download Data and Checkpoints 📂🔗

Download the processed data and the pre-trained model checkpoints from the following link:

Download Data and Checkpoints

After downloading:

  • Place the data in the riboclette/data/ folder. 📁
  • Place the checkpoints in the riboclette/checkpoints/ folder. ✅

2️⃣ Prepare the Dataset 🐁📊

To run the data pre-processing pipeline, run the following command:

cd /riboclette/preprocessing
python processing.py

3️⃣ Train the Riboclette Model 🧠💻

Train the Riboclette model using the following command:

cd /riboclette/models/xlnet/dh
python train.py

4️⃣ Perform Pseudolabeling ➕

Train 5 Seed Models 🌱🌱🌱🌱🌱

To perform pseudolabeling, first train 5 seed models of Riboclette:

cd /riboclette/models/xlnet/dh
python train.py --seed {1, 2, 3, 4, 42}

Generate the Pseudolabeling Dataset 🧬📋

Once all seed models are trained, generate the pseudolabeling dataset:

cd /riboclette/preprocessing
python plabeling.ipynb

Train Pseudolabeling-Based Models 🧠🔄

Train pseudolabeling-based model using the following command:

cd /riboclette/models/xlnet/plabel
python train.ipynb 

5️⃣ Generate Interpretability Results 🔍🧬

Generate codon-level interpretations for all sequences for the testing set:

cd /riboclette/models/xlnet/plabel
python LIGInterpret.py

6️⃣ Generate Interpretability Results 🔍🧬

Generate motifs derived from random windows chosen from the full dataset:

cd /riboclette/models/xlnet/plabel
python beamSearch.py

7️⃣ Downstream Analysis and Figure Recreation 📈🖼️

Recreate the figures from the Riboclette paper using the downstream analysis scripts provided in the repository. These scripts allow you to analyze the model outputs and generate the figures mentioned in the paper.

Steps for Downstream Analysis:

  1. Navigate to the downstream analysis folder:

    cd /riboclette/downstream_analysis
  2. Run the analysis notebooks to generate the respective figures:

    python figure{2,3,4,5}.py
  3. The generated figures will be saved in the riboclette/data/results/figures/ folder. 🖼️


🎉 You're all set! Follow these steps to fully utilize Riboclette for ribosome density prediction, interpretability, and downstream analysis. 🚀

About

Genomic ML models for predicting and interpreting ribosome footprint profiles under different amino-acid starvation conditions

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published