GraphEGFR: Multi-task and Transfer Learning Based on Molecular Graph Attention Mechanism and Fingerprints Improving Inhibitor Bioactivity Prediction for EGFR Family Proteins on Data Scarcity
GraphEGFR is a model specifically designed to enhance molecular representation for the prediction of inhibitor bioactivity (pIC50) against wild-type HER1, HER2, HER4, and mutant HER1 proteins. GraphEGFR incorporates deep learning techniques such as multi-task learning and transfer learning, inspired by a graph attention mechanism for molecular graphs and deep neural networks, and convolutional neural networks for molecular fingerprints.
Easy to use on Google Colab, please click
The file structure of the project is shown as the following diagram
GraphEGFR
├─configs
├─examples
├─graphegfr
├─misc
├─resources
│ └─LigEGFR
├─state_dict
├─run-colab.ipynb
├─run.ipynb
├─run.py
└─README.md
To run the experiment with specific configuration, enter the following script
python3 run.py --config configs/sample.jsonThere are several options to set up in the configuration file:
target- selected proteins used in the studyhyperparam- configuration for model building process (in json format)result_folder[optional] - the directory where the results will be storeddatabase[optional] - identify which database to obtain data (only option available currently isLigEGFR; can be omitted)metrics[optional] - a list of string representing the metrics to report in the experiment. The available options areRMSE,MAE,MSE,PCC,R2,SRCC
The datasets and pretrained models can be retrieved from .
| packages | version |
|---|---|
| arrow | 1.2.2 |
| deepchem | 2.5.0 |
| imbalanced-learn | 0.10.1 |
| numpy | 1.21.5 |
| pandas | 1.3.5 |
| scikit-learn | 1.2.2 |
| scipy | 1.7.3 |
| torch | 2.0.0 |
| torch-geometric | 2.0.4 |
| torchmetrics | 0.11.4 |
| xgboost | 1.6.1 |
| dgl | 1.1.3 |
| dgllife | Any |
@article{https://doi.org/10.1002/jcc.27388,
author = {Boonyarit, Bundit and Yamprasert, Nattawin and Kaewnuratchadasorn, Pawit and Kinchakawat, Jiramet and Prommin, Chanatkran and Rungrotmongkol, Thanyada and Nutanong, Sarana},
title = {GraphEGFR: Multi-task and transfer learning based on molecular graph attention mechanism and fingerprints improving inhibitor bioactivity prediction for EGFR family proteins on data scarcity},
journal = {Journal of Computational Chemistry},
volume = {45},
number = {23},
pages = {2001-2023},
keywords = {drug discovery, epidermal growth factor receptor, graph attention mechanism, multi-task learning, transfer learning},
doi = {https://doi.org/10.1002/jcc.27388},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.27388},
eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/jcc.27388}
}
