Skip to content

refactor(sources): add provider-neutral source substrate#757

Merged
NicholaiVogel merged 1 commit into
mainfrom
nicholai/unified-sources-substrate
May 24, 2026
Merged

refactor(sources): add provider-neutral source substrate#757
NicholaiVogel merged 1 commit into
mainfrom
nicholai/unified-sources-substrate

Conversation

@NicholaiVogel
Copy link
Copy Markdown
Collaborator

Summary

  • Adds provider-neutral source substrate types and a daemon source provider registry, with Obsidian wired through the adapter boundary.
  • Adds source-owned provenance columns on memory_artifacts for source_id, source_root, external ids, parent paths, and provider metadata.
  • Moves Obsidian source chunk indexing to generic source_chunk rows while preserving legacy source_obsidian_chunk recall and purge compatibility.
  • Tightens source job cancellation so native source scans can stop before stale writes after disconnect.

Stack Context

This is PR 1 of the replacement stack after closing #750. Follow-up PRs should migrate more Obsidian behavior behind this provider contract, then add Discord through the shared source path instead of a bespoke Discord pipeline.

Test Plan

  • bun test platform/core/src/sources-config.test.ts platform/core/src/migrations/migrations.test.ts
  • bun run --filter '@signet/core' build
  • bun test platform/daemon/src/obsidian-source-embeddings.test.ts platform/daemon/src/native-memory-sources.test.ts platform/daemon/src/routes/sources-routes.test.ts platform/daemon/src/memory-search.test.ts
  • bun run --filter '@signet/daemon' build
  • git diff --check

PR Readiness (MANDATORY)

  • Spec alignment validated (INDEX.md + dependencies.yaml)
  • Agent scoping verified on all new/changed data queries
  • Input/config validation and bounds checks added
  • Error handling and fallback paths tested (no silent swallow)
  • Security checks applied to admin/mutation endpoints
  • Docs updated for API/spec/status changes
  • Regression tests added for each bug fix
  • Lint/typecheck/tests pass locally

Migration Notes (if applicable)

  • Migration is idempotent
  • Daemon Rust parity reviewed or explicitly N/A
  • Rollback / compatibility note included in PR description

Migration 075 only adds nullable provenance columns and indexes to memory_artifacts. Rollback compatibility is additive: older rows remain readable, and source stats/purge fall back to the existing Obsidian root/path matching when source_id is absent.

@PR-Reviewer-Ant
Copy link
Copy Markdown
Collaborator

Hi @NicholaiVogel - I'm taking a look at the feature work in refactor(sources): add provider-neutral source substrate (commit 1961f273) and will follow up shortly.

This comment is updated in place by pr-reviewer.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 1961f273

I did not find a concrete correctness, security, or data-integrity issue in the changed surface. The implementation matches the stated stack step: it adds the generic source substrate, keeps Obsidian compatibility paths, and adds cancellation checks around source indexing.

Confidence: Medium [sufficient_diff_evidence, targeted_context_included, missing_runtime_repro] - The diff includes the relevant core migration, source config validation, daemon provider registry, Obsidian indexing path, recall fallback, purge/stat queries, and tests covering generic chunk recall plus legacy chunk cleanup. I did not run the listed Bun test plan or reproduce a live source disconnect/index cancellation path, so runtime cancellation behavior is assessed from code only.

@NicholaiVogel NicholaiVogel merged commit 35f7027 into main May 24, 2026
17 checks passed
@NicholaiVogel NicholaiVogel deleted the nicholai/unified-sources-substrate branch May 24, 2026 15:52
NicholaiVogel added a commit that referenced this pull request May 24, 2026
## Summary

Reimplements the GitHub source work from #749 on top of the provider-neutral source substrate from #757. GitHub now plugs into the shared Sources provider/job/purge path instead of adding a parallel daemon bridge and poller, and it is exposed from the dashboard Sources setup flow.

## Changes

- Adds `github` source config support in `@signet/core` using `providerSettings`, with validation for repo patterns, resource types, token/discussion requirements, Markdown doc paths, labels, state, and per-repo bounds.
- Adds a GitHub fetcher for repo expansion, issues, pull requests, discussions, selected Markdown docs, and comments with the #749 hardening carried forward.
- Adds a `githubSourceProvider` that writes source-owned `memory_artifacts` with provenance and failure artifacts, purges through shared source-owned purge, and records unmatched wildcard repo patterns as source failures instead of silently indexing nothing.
- Adds `POST /api/sources/github`, `signet sources add github`, source listing support, and API/Sources docs.
- Surfaces GitHub in the dashboard Sources tab with a setup form for repositories, optional secret reference, resource toggles, state, labels, doc paths, comments, and item cap.
- Tightens shared provider source jobs so provider-reported failures mark the job failed instead of silently completing.

## Notes

Supersedes #749, which was closed because it conflicted with #757/#759 and used a bespoke GitHub bridge/poller. This PR keeps the useful review hardening from #749 while adopting the shared provider substrate.



* feat(sources): add GitHub source provider

* fix(sources): harden GitHub source comments

* fix(sources): reject raw GitHub tokens

* fix(sources): paginate GitHub fetches

* fix(sources): fail discussion comment GraphQL errors

* fix(sources): preserve GitHub doc path separators

* fix(sources): constrain GitHub doc globs

* feat(sources): surface GitHub setup in dashboard

* fix(sources): bound GitHub discussion scans

* fix(sources): surface GitHub source setup honestly

* fix(sources): keep GitHub repo purge scoped

* fix(sources): accept GitHub pull responses without labels

* fix(sources): clear GitHub request timeouts

* fix(sources): clear recovered GitHub failures

* fix(sources): scan filtered GitHub discussions safely

* fix(sources): constrain GitHub repo purge prefix

* fix(sources): hydrate labeled GitHub pulls

* fix(sources): keep GitHub failure artifacts distinct

* fix(sources): enforce GitHub source item cap

* fix(sources): track GitHub comment purge paths

* fix(sources): paginate GitHub discussion comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants