EPACT: Epitope-anchored Contrastive Transfer Learning for Paired CD8+ T Cell Receptor-antigen Recognition
This repository contains the source code for the paper Epitope-anchored contrastive transfer learning for paired CD8 T cell receptor-antigen recognition.
EPACT is developed by a divide-and-conquer paradigm that combines pre-training on TCR or pMHC data and transfer learning to predict TCR$\alpha\beta$-pMHC binding specificity and interaction conformation via epitope-anchored contrastive learning.
-
Clone the repository.
git clone https://github.com/zhangyumeng1sjtu/EPACT.git
-
Create a virtual environment by conda.
conda create -n EPACT_env python=3.10.12 conda activate EPACT_env
-
Download PyTorch>=2.0.1, which is compatible with your CUDA version and other Python packages.
conda install pytorch==2.0.1 pytorch-cuda=11.7 -c pytorch -c nvidia # for CUDA 11.7 pip install -r requirements.txt
Or download the Python package from PyPI.
pip install epact==0.1.1
The following data and model checkpoints are available at Zenodo.
data/binding
: binding data between paired TCR$\alpha\beta$ and pMHC derived from IEDB, VDJdb, McPAS, TBAdb, 10X, and Francis et al.data/pretrained
: human peptides from IEDB, human CD8+ TCRs from 10X Genomics Datasets and STAPLER, peptide-MHC-I binding affinity data from NetMHCpan4.1, and peptide-MHC-I eluted ligand data from BigMHC.data/structure
: Crystal structures of TCR-pMHC protein complexes in STCRDab. Distance matrices were calculated according to the closest distance between heavy atoms from two amino acid residues.checkpoints/paired-cdr3-pmhc-binding
: model checkpoints for predicting TCR$\alpha\beta$-pMHC binding specificity from CDR3 sequences.checkpoints/paired-cdr123-pmhc-binding
: model checkpoints for predicting TCR$\alpha\beta$-pMHC binding specificity from CDR1, CDR2, and CDR3 sequences.checkpoints/paired-cdr123-pmhc-interaction
: model checkpoints for predicting CDR-epitope residue-level distance matrix and contact sites.checkpoints/pretrained
: model checkpoints for pre-trained language model of TCRs and peptides, and peptide-MHC models (binding affinity & eluted ligand).
-
Pre-train peptide and TCR$\alpha\beta$ language models.
# pretrain epitope masked language model. python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-epitope-lm.yml # pretrain paired cdr3 masked language model. python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-cdr3-lm.yml # pretrain paired cdr123 masked language model. python scripts/pretrain/pretrain_plm.py --config configs/config-pretrain-cdr123-lm.yml
-
Train peptide-MHC binding affinity or eluted ligand models.
# pretrain peptide-MHC binding affinity model. python scripts/pretrain/pretrain_pmhc_model.py --config configs/config-pmhc-binding.yml # pretrain peptide-MHC eluted ligand model. python scripts/pretrain/pretrain_pmhc_model.py --config configs/config-pmhc-elution.yml
-
Train TCR$\alpha\beta$-pMHC binding models.
# finetune Paired TCR-pMHC binding model (CDR3). python scripts/train/train_tcr_pmhc_binding.py --config configs/config-paired-cdr3-pmhc-binding.yml # finetune Paired TCR-pMHC binding model (CDR123). python scripts/train/train_tcr_pmhc_binding.py --config configs/config-paired-cdr123-pmhc-binding.yml
-
Predict TCR$\alpha\beta$-pMHC binding specificity.
# predict cross-validation results for i in {1..5} do python scripts/predict/predict_tcr_pmhc_binding.py \ --config configs/config-paired-cdr123-pmhc-binding.yml \ --input_data_path data/binding/Full-TCR/k-fold-data/val_fold_${i}.csv \ --model_location checkpoints/paired-cdr123-pmhc-binding/paired-cdr123-pmhc-binding-model-fold-${i}.pt\ --log_dir results/preds-cdr123-pmhc-binding/Fold_${i}/ done
-
Predict TCR$\alpha\beta$-pMHC binding ranks compared to background TCRs
# predict binding ranks for SARS-CoV-2 responsive TCR clonotypes python scripts/predict/predict_tcr_pmhc_binding_rank.py --config configs/config-paired-cdr123-pmhc-binding.yml \ --log_dir results/ranking-covid-cdr123/ \ --input_data_path data/binding/covid_clonotypes.csv \ --model_location checkpoints/paired-cdr123-pmhc-binding/paired-cdr123-pmhc-binding-model-all.pt \ --bg_tcr_path data/pretrained/10x-paired-healthy-human-tcr-repertoire.csv \ --num_bg_tcrs 20000
-
Train TCR$\alpha\beta$-pMHC interaction model.
# finetune Paired TCR-pMHC interaction model (CDR123). python scripts/train/train_tcr_pmhc_interact.py --config configs/config-paired-cdr123-pmhc-interact.yml
-
Predict TCR$\alpha\beta$-pMHC interaction conformations.
# predict distance matrices and contact sites between MEL8 TCR and HLA-A2-presented peptides. for i in {1..5} do python scripts/predict/predict_tcr_pmhc_interact.py --config configs/config-paired-cdr123-pmhc-interact.yml \ --input_data_path data/MEL8_A0201_peptides.csv \ --model_location checkpoints/paired-cdr123-pmhc-interaction/paired-cdr123-pmhc-interaction-model-fold-${i}.pt \ --log_dir results/interaction-MEL8-bg-cdr123-closest/Fold_${i}/ done
@article{zhang2024epitope,
title={Epitope-anchored contrastive transfer learning for paired CD8+ T cell receptor--antigen recognition},
author={Yumeng Zhang and Zhikang Wang and Yunzhe Jiang and Dene R Littler and Mark Gerstein and Anthony W Purcell and Jamie Rossjohn and Hong-Yu Ou and Jiangning Song},
journal={Nature Machine Intelligence},
doi={10.1038/s42256-024-00913-8},
year={2024},
publisher={Nature Publishing Group UK London}
}
If you have any questions, please contact us at [email protected] or [email protected].