This is the code repository for the creation of MoA-Net, as described in "MARS: A neurosymbolic approach for interpretable drug discovery".
The network can be used for benchmarking approaches for interpretable drug discovery.
MoA-Net can be used by graph based algorithms for MoA deconvolution. The figure below shows: A) the graph schema or data model of MoA-Net and B) the node and edge statistics of MoA-Net.
To create MoA-net from scratch, the user can run each of the notebook in the folder:
-
Preprocessing - It includes the code for processing and generating the basic network.
-
Enrichment - It includes the code to enrich compound - biological process (BP) pairs in the data.
-
Validation set - It includes the code to identify validation MoAs from DrugMechDB.
-
Target class annotation - It includes the code to annotate and tag target classes to proteins in the enriched network.
-
Data splitting - It includes the code to split the enriched network into test-train-dev datasets.
-
Metapaths - It includes the code to generate metapath rules on the network node types required for neurosymbolic AI models.
5 p. Metapaths 2 - It includes a variant code to generate metapath rules based on the protein target classes.
NOTE: Notebooks beginning with 00
denote some preliminary searches which are not necessary to run.
To download MoA-net, access the specific versions from the following folders:
data/kg/splits/MoA-net/
: The full, original MoA-net.data/kg/splits/MoA-net-permuted/
: The full, permuted version MoA-net, based on XSwap.data/kg/splits/MoA-net-protclass/
: The full version of MoA-net in which ~55% of the proteins have a protein subclass.
Within each of these folders, the following files can be found:
kg_directory/
├── train.tsv
├── dev.tsv
├── test.tsv
├── kg_with_train_smpls.tsv
├── kg_no_cmp_bp.tsv
├── MARS/
├── ...
train.tsv
: The compound -> BP training triples (60%).dev.tsv
: The compound -> BP validation triples (20%).test.tsv
: The compound -> BP test triples (20%).kg_with_train_smpls.tsv
: The MoA-net KG with the training triples fromtrain.tsv
included.kg_no_cmp_bp.tsv
: The MoA-net KG with no compound -> BP triples. Therefore, it excludes all triples fromtrain.tsv
,dev.tsv
, andtest.tsv
.
The MARS/
directory contains files which are processed and ready for input into MARS: mechanism-of-action retrieval system. This code is contained within the other code repository.
In the MoA-net
folder, a additional directory with 10k is present which is created as a result of MARS' automatic trimming step.
If you found our work useful, please consider citing our preprint:
@article{delong2024mars,
title={MARS: A neurosymbolic approach for interpretable drug discovery},
author={Lauren Nicole DeLong and Yojana Gadiya and Paola Galdi and Jacques D. Fleuriot and Daniel Domingo-Fernández},
year={2024},
eprint={2410.05289},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2410.05289},
}