Skip to content

mloda-ai/open-kgo

License mloda Python

open-kgo

Open Knowledge Graphs and Ontologies plugin for mloda: nine connector families covering the knowledge-graph landscape, from SPARQL endpoints to SBOMs to agent memory, all behind one declarative Feature interface. Every connector and demo runs offline against in-memory libraries or committed fixtures. No Docker, no network.

At a glance

Section What you'll find
Quickstart Run a SPARQL query against a shipped sample file in under a minute
The nine connector families The core of this repo: a 9-family KG connector taxonomy with two plugins each
Demos Three marimo notebooks and two evaluation harnesses, all offline
Data and acknowledgments Where the sample data comes from
Development setup uv, tox, and the individual checks
Related repositories and documentation mloda core, the plugin registry, and development guides

Quickstart

Install the connectors and run a SPARQL query against the Turtle sample shipped in this repo:

uv sync --extra kg-all
from pathlib import Path

from mloda.user import DataAccessCollection, Feature, Options, mloda

import open_kgo.feature_groups.kg.rdf.rdflib_sparql as rdf_mod
from open_kgo.compute_frameworks.python_dict_kg_framework import KgPythonDictFramework

# Point at any RDF file. Here: the Turtle sample shipped in this repo.
ttl = Path(rdf_mod.__file__).parent / "tests" / "fixtures" / "sample.ttl"

feature = Feature(
    "rdflib_sparql__knows",
    options=Options(context={
        "query_text": "PREFIX foaf: <http://xmlns.com/foaf/0.1/> "
                      "SELECT ?s ?o WHERE { ?s foaf:knows ?o }",
    }),
)

partitions = mloda.run_all(
    [feature],
    compute_frameworks={KgPythonDictFramework},
    data_access_collection=DataAccessCollection(
        credentials=[{"rdflib_sparql": {"locator": str(ttl), "result_limit": 100}}],
    ),
)

for partition in partitions:
    for row in partition:
        print(row[feature.name])

Swap rdflib_sparql for any of the nine connector families below: same Feature to mloda.run_all shape, different reader.

The nine connector families

open_kgo/feature_groups/kg/ ships a connector taxonomy derived from a 103-system survey. Each family is a shared reader and feature-group base plus two concrete plugins running against in-memory libraries or local file fixtures:

Family What it connects to Concrete plugins
network_pg Property-graph databases with a vendor query language (Neo4j, Memgraph, Neptune, ...) KuzuCypherReader, GrandCypherReader
rdf RDF triple stores queried with SPARQL RdfLibSparqlReader, OxigraphSparqlReader
embedded In-process graph libraries with no network endpoint NetworkxEmbeddedReader, IGraphEmbeddedReader
rest_public Public REST (non-SPARQL) KG APIs (OpenAlex, ConceptNet, STRING, ...) FileFixtureRestReader, FileFixturePagedRestReader
lineage Metadata and data-lineage graphs (dbt, OpenLineage, DataHub, ...) DbtManifestReader, OpenLineageReader
code_build Code, build, and SBOM dependency graphs (CycloneDX, SPDX, ...) CycloneDxSbomReader, SpdxSbomReader
saas_authz SaaS and authorization tuple stores (OpenFGA, SpiceDB, Microsoft Graph, ...) InProcessTupleStoreReader, PaginatedTupleStoreReader
agent_memory LLM agent memory and GraphRAG graphs (Letta, Zep, Mem0, ...) NetworkxMemoryReader, GraphWalkMemoryReader
citation_rest Citation and scientific REST APIs (Reactome, OpenAlex citations, ...) FileFixtureCitationReader, PaginatedCitationReader

See open_kgo/feature_groups/kg/README.md for the full family map, the plugin anatomy, and what the prototype does and does not validate.

Install all KG extras with: uv sync --extra kg-all.

One feature per call. KG readers dispatch a single feature per load: every reader rejects a multi-feature FeatureSet rather than silently labelling all rows with one feature name. Request features individually (one Feature per mloda.run_all slot) rather than batching N of them into a single reader call.

No-Docker testing policy. Every connector test runs against rdflib, networkx, kuzu (embedded), or file fixtures. No Docker, no external services, no network calls.

Demos

Three marimo notebooks plus two evaluation harnesses live under demo/:

  • demo/demo_kg_connectors.py: surface tour of all 9 families against the shipped fixtures.
  • demo/demo_kg_build_repo.py: builds an RDF graph from this repo (filesystem repo:contains + Python repo:imports), serializes to Turtle, and runs five SPARQL queries through RdfLibSparqlReader via mloda.run_all.
  • demo/demo_kg_ontology.py: walks the ontology layer end to end.
  • demo/eval_arch1_vs_arch2.py and demo/eval_qa_accuracy.py: evaluation harnesses comparing plain traversal vs. ontology-guided traversal.

Install the demo extras and open any notebook:

uv sync --extra demo
marimo edit demo/demo_kg_connectors.py

Every demo runs offline against a small committed sample graph: no download, no network, no external services.

Data and acknowledgments

The ontology demo and the two evaluation harnesses run against a small hand-authored sample of public movie facts (demo/data/sample_kb.txt) written in the triple format of the MetaQA dataset (Zhang, Yuyu et al., "Variational Reasoning for Question Answering with Knowledge Graph", AAAI 2018, https://github.com/yuyuz/MetaQA). The sample is committed in this repo and is not derived from the MetaQA dataset files. The notebooks call demo.data.ensure_data() at startup, which builds the sample subgraph offline. To run against the full MetaQA benchmark (licensed under CC BY 3.0, not redistributed here), see demo/data/README.md.

Development setup

Install uv (if not already installed):

curl -LsSf https://astral.sh/uv/install.sh | sh

Create virtual environment and install dependencies:

uv venv
source .venv/bin/activate
uv sync --all-extras

Run all checks with tox:

uv tool install tox --with tox-uv
tox

Run individual checks

pytest
ruff format --check --line-length 120 .
ruff check .
mypy --strict --ignore-missing-imports .
bandit -c pyproject.toml -r -q .

Related repositories and documentation

  • mloda: The core library for open data access. Declaratively define what data you need, not how to get it. See mloda.ai for an overview and business context and the documentation for detailed guides.
  • mloda-registry: The central hub for discovering and sharing mloda plugins.
  • Plugin development guides: How to build FeatureGroups, ComputeFrameworks, and Extenders.
  • Claude Code skills: Assisted plugin development for Claude Code users.

About

Open Knowledge Graphs and Ontologies

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages