Skip to content

Latest commit

 

History

History
94 lines (63 loc) · 4.29 KB

README.md

File metadata and controls

94 lines (63 loc) · 4.29 KB

Mechanism-of-Action Network (MoA-Net)

License: CC BY-NC 4.0 Maturity level-1 PyPI pyversions

This is the code repository for the creation of MoA-Net, as described in "MARS: A neurosymbolic approach for interpretable drug discovery".

The network can be used for benchmarking approaches for interpretable drug discovery.

MoA-Net can be used by graph based algorithms for MoA deconvolution. The figure below shows: A) the graph schema or data model of MoA-Net and B) the node and edge statistics of MoA-Net.

MoAnet

Table of Content

Creation

To create MoA-net from scratch, the user can run each of the notebook in the folder:

  1. Preprocessing - It includes the code for processing and generating the basic network.

  2. Enrichment - It includes the code to enrich compound - biological process (BP) pairs in the data.

  3. Validation set - It includes the code to identify validation MoAs from DrugMechDB.

  4. Target class annotation - It includes the code to annotate and tag target classes to proteins in the enriched network.

  5. Data splitting - It includes the code to split the enriched network into test-train-dev datasets.

  6. Metapaths - It includes the code to generate metapath rules on the network node types required for neurosymbolic AI models.

    5 p. Metapaths 2 - It includes a variant code to generate metapath rules based on the protein target classes.

NOTE: Notebooks beginning with 00 denote some preliminary searches which are not necessary to run.

Download MoA-net

To download MoA-net, access the specific versions from the following folders:

Within each of these folders, the following files can be found:

kg_directory/
    ├── train.tsv
    ├── dev.tsv
    ├── test.tsv
    ├── kg_with_train_smpls.tsv
    ├── kg_no_cmp_bp.tsv
    ├── MARS/
        ├── ...
  • train.tsv: The compound -> BP training triples (60%).
  • dev.tsv: The compound -> BP validation triples (20%).
  • test.tsv: The compound -> BP test triples (20%).
  • kg_with_train_smpls.tsv: The MoA-net KG with the training triples from train.tsv included.
  • kg_no_cmp_bp.tsv: The MoA-net KG with no compound -> BP triples. Therefore, it excludes all triples from train.tsv, dev.tsv, and test.tsv.

Data relevant for MARS

The MARS/ directory contains files which are processed and ready for input into MARS: mechanism-of-action retrieval system. This code is contained within the other code repository.

In the MoA-net folder, a additional directory with 10k is present which is created as a result of MARS' automatic trimming step.


Citation

If you found our work useful, please consider citing our preprint:

@article{delong2024mars,
  title={MARS: A neurosymbolic approach for interpretable drug discovery}, 
  author={Lauren Nicole DeLong and Yojana Gadiya and Paola Galdi and Jacques D. Fleuriot and Daniel Domingo-Fernández},
  year={2024},
  eprint={2410.05289},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2410.05289}, 
}