Skip to content

feat(api): add /np/constellation endpoint with structured FORRT chains#96

Open
annefou wants to merge 1 commit into
mainfrom
feature/nanopub-query-api
Open

feat(api): add /np/constellation endpoint with structured FORRT chains#96
annefou wants to merge 1 commit into
mainfrom
feature/nanopub-query-api

Conversation

@annefou
Copy link
Copy Markdown
Contributor

@annefou annefou commented May 17, 2026

Summary

New authenticated read endpoint GET /np/constellation?uri=… that walks a published FORRT nanopublication chain from a single entry URI and returns the full constellation as structured JSON.

Delivers Week 1 of docs/plans/nanopub-query-api.md, plus most of the per-step structured extraction the plan's response schema specifies (so the downstream forrt-replication-template/import-from-nanopub skill can drop its 350-line TriG parsing in favour of a single HTTP call).

Live KP smoke against the Bombus apex CiTO (np/RA1q6c0fG2bMbiozF8Az2UpIfzAzqp8hoVEl6QIzfUpH8): 19/19 nanopubs reached in ~15-22s. Beats the Python import script's 14/19 because we also mine canonical nanopub URIs from each TriG body, surfacing the Outcome→Claim and Study→AIDA chain edges that KP's npa:refersToNanopub index doesn't materialise.

Response shape

{
  "entry": "https://w3id.org/sciencelive/np/RA...",
  "paperDoi": "https://doi.org/10.1126/science.aax8591",
  "apexCito": { "uri": "...", "relations": ["qualifies"], "citedTargets": [...] },
  "researchSynthesis": {
    "uri": "...", "label": "...",
    "synthesis": "...", "conditions": "...", "limitations": "...", "recommendations": "..."
  },
  "chains": [
    {
      "id": "RA...",
      "outcomeUri": "...",
      "outcomeVerdict": "PartiallySupported",
      "outcomeConfidence": "HighConfidence",
      "citoRelations": ["extends", "qualifies"],
      "steps": [
        { "step": "Quote",            "uri": "...", "text": "...", "targets": ["doi:..."] },
        { "step": "AIDA",             "uri": "...", "text": "..." },
        { "step": "Claim",            "uri": "...", "type": "model_performance", "label": "..." },
        { "step": "Study",            "uri": "...", "scope": "...", "method": "...", "deviations": "..." },
        { "step": "Outcome",          "uri": "...", "verdict": "...", "confidence": "...",
                                       "conclusion": "...", "evidence": "...", "limitations": "...",
                                       "repository": "https://doi.org/10.5281/zenodo.20113787" },
        { "step": "CiTO",             "uri": "...", "relations": ["confirms"], "targets": [...] },
        { "step": "ResearchSoftware", "uri": "...", "label": "...",
                                       "repository": "https://github.com/annefou/weatherxbiodiversity-projection",
                                       "zenodoDoi": "..." }
      ]
    }
  ],
  "nodes": [...], "edges": [...], "externalCitations": [...]
}

Module layout

api/src/np/
├── queries.ts       SPARQL strings (mirrored from frontend/src/lib/queries/*.rq)
├── sparql.ts        executeSparql + fetchTrig with 5xx retry + abort handling
├── trig.ts          regex-based TriG extractors (no n3 dep, ~50 KB saved)
├── constellation.ts BFS traversal + chain assembly
└── index.ts         Hono sub-app mounted at /np

Auth (this PR)

Sits behind the existing better-auth session middleware in api/src/index.ts — same surface as /proxy and /signing. Anonymous calls return 401. API-key auth + paid-tier gating is Week 3 of the plan and stays out of this PR (per the paid-only-from-day-one decision, the auth surface will swap to API keys before public launch).

Test coverage — 212 tests across 4 files

Module Tests What's locked in
trig.ts 117 every extractor + boundaries (20-char hash, 12-char literal) + security boundary (SPARQL-injection-via-uri rejected at canonicalNanopubUri) + escape handling + predicate fallback chains (full URI ↔ prefixed) + AIDA URI decoding + malformed-prefix tolerance
sparql.ts 28 retry on 5xx, no-retry on 4xx, AbortError propagation, malformed JSON, empty body, HTML rejection (incl. leading whitespace), abort signal propagation, network throws
constellation.ts 39 full BFS + TriG-mined edges + cycles + template-def expand-stop + 404 entry + self-ref + 503 mid-traversal recovery + concurrency invariance + full chain assembly with verdict/confidence/CiTO/repo propagation + apex-not-CiTO + paper-DOI heuristic (Zenodo deprioritised) + adversarial TriGs (garbage Turtle, unknown prefixes, base64 noise filter, triple-quoted with embedded dots/semicolons)
index.ts (HTTP) 28 401/400/200/502, non-GET → 404, depth+maxNodes clamping (lower + upper bounds), multi-uri params, URL-encoded %2F path separators, large payload round-trip, unknown routes, content-type, non-Error throw fallback

Bugs found and fixed by writing the tests:

  1. extractExcerpts regex spanning lines (silent data corruption)
  2. Retry loop catching intentional throws (would retry on 4xx)
  3. readObjectSegment terminating on . inside <URI> literals
  4. Predicate-value extractor not handling <URI> a <type>; between subject and CiTO predicate (broke citingEntity for the real-world outcome-level CiTO shape)

Not in this PR

Deliberate scope cuts — happy to file follow-up issues:

  • /np/search?paper=DOI (Week 2 of plan)
  • /np/{uri}/related (Week 2)
  • /api/templates (Week 2)
  • API-key auth + rate limiting + paid-tier gating (Week 3)
  • Cloudflare KV cache layer
  • OpenAPI spec at /api/openapi.json
  • Live-KP recorded-fixture integration test in CI (hitting live KP from CI is too flaky; needs a captured fixture replay, ~half a day)
  • wrangler dev runtime smoke test — verified the logic in Node, not exercised through the Workers runtime end-to-end

Test plan

  • npm run lint -w api — clean (0 errors)
  • cd api && npx tsc --noEmit — clean
  • npm test -w api — 212 tests pass
  • Manual: npx tsx api/scripts/test-constellation.ts from the monorepo root — 19/19 nanopubs on Bombus apex (~15-22s)
  • Manual: npx tsx api/scripts/show-constellation-json.ts | head -200 — inspect the structured payload
  • CI: .github/workflows/monorepo-pr.yml runs the api workspace's lint/typecheck/build/test — all green expected

🤖 Generated with Claude Code

Adds a new authenticated read endpoint that walks a published FORRT
nanopublication chain from a single entry URI and returns the full
constellation as structured JSON — enough to bootstrap a new replication
from an apex CiTO Citation in one HTTP call.

Behaviour on the Bombus apex CiTO
(np/RA1q6c0fG2bMbiozF8Az2UpIfzAzqp8hoVEl6QIzfUpH8): 19/19 nanopubs
reached in ~15-22s via bidirectional SPARQL traversal on the KP
networkGraph + TriG-body mining for the chain edges KP doesn't index
(Outcome→Claim, Study→AIDA).

Response shape (per docs/plans/nanopub-query-api.md):
  - entry, paperDoi (Zenodo DOIs deprioritised)
  - apexCito { uri, relations, citedTargets }
  - researchSynthesis { uri, label, synthesis, conditions, limitations, recommendations }
  - chains[]: one per Outcome — Quote → AIDA → Claim → Study → Outcome → CiTO → ResearchSoftware
  - Each step carries its substantive fields: Quote text, AIDA sentence,
    Claim type, Study scope+method, Outcome verdict+confidence+repository,
    CiTO relations, RS GitHub URL
  - nodes[], edges[], externalCitations for raw access

Module layout under api/src/np/:
  - queries.ts     SPARQL queries mirrored from frontend/src/lib/queries/
  - sparql.ts      executeSparql + fetchTrig with 5xx retry, abort handling
  - trig.ts        regex-based TriG extractors (no n3 dep, ~50KB saved)
  - constellation.ts BFS traversal + chain assembly
  - index.ts       Hono sub-app mounted at /np

Auth model for v1: signed-in better-auth session (same as /proxy and
/signing). API-key auth + paid-tier gating is Week 3 of the plan
(docs/plans/nanopub-query-api.md), not in this PR.

Test coverage (212 tests across 4 files):
  - trig.ts        117 tests — every extractor, boundary conditions,
                   security boundary (SPARQL injection via uri rejected),
                   escape handling, predicate fallback chains, adversarial
                   AIDA URI decoding
  - sparql.ts       28 tests — retry on 5xx, no-retry on 4xx, abort
                   propagation, malformed JSON, empty body, HTML response
                   rejection, leading-whitespace handling
  - constellation.ts 39 tests — cycle protection, template-def
                   expand-stop, mid-traversal 503 recovery, full chain
                   assembly with verdict/confidence/repository/CiTO
                   propagation, apex-not-CiTO, paper-DOI heuristic,
                   malformed/unknown-prefix TriGs
  - index.ts (HTTP) 28 tests — 401/400/200/502, non-GET → 404,
                   depth+maxNodes clamping, multi-uri params, URL-encoded
                   %2F path separators, large payload round-trip

Production bugs found and fixed by writing the tests:
  1. extractExcerpts regex spanning lines (silent data corruption)
  2. Retry loop catching intentional throws (would retry on 4xx)
  3. readObjectSegment terminating on `.` inside `<URI>` literals
  4. Predicate-value extractor not handling `<URI> a <type>;` between
     subject and CiTO predicate (broke citingEntity extraction for the
     real-world outcome-level CiTO shape)

Not in this PR (deferred to follow-ups):
  - /np/search?paper=DOI (Week 2 of plan)
  - /np/{uri}/related (Week 2)
  - /api/templates (Week 2)
  - API-key auth + rate limits + paid-tier gating (Week 3)
  - Cloudflare KV cache layer
  - OpenAPI spec at /api/openapi.json
  - Live-KP recorded-fixture integration tests for CI
  - wrangler dev runtime smoke test (Node-only logic verified)

Scripts:
  - api/scripts/test-constellation.ts — CLI smoke test against live KP
  - api/scripts/show-constellation-json.ts — pretty-print the JSON response

Frontend: adds frontend/src/lib/queries/references-from.rq as the
canonical home for the inverse-direction SPARQL query that
forrt-replication-template/scripts/queries/ already mirrors.
@annefou annefou requested a review from vijay-prema May 17, 2026 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant