Skip to content

ncbo/BioPortal-to-KGX

Folders and files

NameName
Last commit message
Last commit date
Oct 20, 2022
May 10, 2023
Mar 1, 2022
May 10, 2023
Nov 22, 2022
Mar 1, 2022
Feb 8, 2022
May 10, 2023
Aug 10, 2022
Apr 12, 2022
Apr 12, 2022
Apr 28, 2022
May 11, 2022
Apr 20, 2022
Nov 23, 2022
May 10, 2023
May 10, 2023
Feb 8, 2022
Nov 22, 2022
Jun 27, 2022
Nov 23, 2022

Repository files navigation

BioPortal-to-KGX

Assemble a BioPortal Knowledge Graph through the following steps:

  • Transform the BioPortal 4store data dump to KGX graphs, with ROBOT preprocessing
  • Validate the output graphs with KGX to determine alignment to the Biolink Model
  • Obtain additional ontology metadata through the Bioportal API
  • Retrieve mappings for nodes without clear Bioportal analogues through Bioportal

Usage

Prepare a dump of the Bioportal 4store data with the 4s-dump script.

The dump will be in the form of n-triples, with individual sets of records in nested directories and one line of metadata at the top of each file.

Run BioPortal-to-KGX with all validation and metadata retrieval options as:

python run.py --input ../path/to/your/data/ --kgx_validate --robot_validate --pandas_validate --write_curies --get_bioportal_metadata --ncbo_key YOUR_NCBO_API_KEY_HERE 

Specify individual ontologies to include or exclude with the --include_only and --exclude options, respectively, each followed by a comma-delimited list of the original hashed file ID from the 4store dump.

For example:

python run.py --input ../path/to/your/data/ --include_only dabd4d902360003975fb25ae56f8,7b95f2cc27c8fb0d5df11fbdb078

Output will be written to the /bioportal_to_kgx directory within /transformed, with subdirectories named for the 4store graph and each subgraph.

Each subgraph will contain:

  • node and edge files ({subgraph_name}_nodes.tsv and {subgraph_name}_edges.tsv, respectively)
  • A JSON version of the ontology ({subgraph_name}_relaxed.json)
  • logs containing any validation messages about the transforms

Troubleshooting

  • The --robot_validate option may fail on larger ontologies like NCBITAXON with java.lang.OutOfMemoryError. Consider omitting this option or running ROBOT on files directly, as needed.

About

Assemble a BioPortal Knowledge Graph

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published