feat(sources): add Discord source indexing#750
Conversation
Assisted-by: Codex:gpt-5
Assisted-by: Codex:gpt-5
|
Hi @aaf2tbz - I'm taking a look at the feature work in This comment is updated in place by pr-reviewer. |
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Review metadata
- Reviewer: pr-reviewer
- Model:
gpt-5.5 - Commit:
a61abdd9
I found two issues worth fixing: the daemon records a Discord source as indexed even when sync was skipped, and the token reference field can persist an obvious raw Discord token into source config.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - Both findings are directly visible in the changed code: syncDiscordSource returns 0 on missing tokens, while daemon.ts marks the source indexed for any resolved sync promise; sources-config.ts only trims tokenRef before saving it into sources.json.
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Review metadata
- Reviewer: pr-reviewer
- Model:
gpt-5.5 - Commit:
b7fade31
I found two data-lifecycle issues in the Discord source indexing path. The new startup sync can outlive source removal, and source-owned participant entities can be reassigned across Discord sources in a way that makes purge behavior unreliable.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - The issues are visible directly in the changed daemon startup sync, delete/purge route, and Discord graph upsert code. I did not need a live Discord repro to see that the sync is not tracked as in-flight and that entity upserts key by canonical_name plus agent_id while later purges delete by source_id.
Since my last review:
- Fixed or likely addressed: 2
- Still blocking: 0
- Newly found: 2
- Suppressed due to human rebuttal/out-of-scope: 0
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Review metadata
- Reviewer: pr-reviewer
- Model:
gpt-5.5 - Commit:
56b4571a
I found a couple of correctness issues in the Discord indexing path: one can leave stale graph rows behind after channels or participants disappear, and one can fail ordinary Discord API calls when the response body is empty or non-JSON.
Confidence: Medium [sufficient_diff_evidence, missing_runtime_repro] - The stale graph behavior follows directly from discord-source-graph.ts only upserting current entities/dependencies and only purging on full source removal; there is no per-sync deletion for channels, threads, conversations, or participants that disappear. The Discord response parsing issue is visible in discord-source-fetch.ts where every non-204 response is parsed with response.json(), including error responses. I did not have a live Discord runtime trace, so confidence is medium rather than high.
Since my last review:
- Fixed or likely addressed: 2
- Still blocking: 0
- Newly found: 2
- Suppressed due to human rebuttal/out-of-scope: 0
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Review metadata
- Reviewer: pr-reviewer
- Model:
gpt-5.5 - Commit:
b1a9d633
I found a data-integrity problem in the Discord graph upsert path: overlapping Discord source configs can make one source take ownership of another source's graph rows. That should be fixed before merge because purge/reconcile can then delete or mutate rows that still belong to another configured source.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - The changed graph code explicitly looks up existing entities by canonical_name and agent_id only, while several Discord canonical names are not source-scoped. The PR also allows distinct Discord source IDs for different guild sets, so overlapping guild configs are reachable from the added source config path.
Since my last review:
- Fixed or likely addressed: 2
- Still blocking: 0
- Newly found: 1
- Suppressed due to human rebuttal/out-of-scope: 0
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Review metadata
- Reviewer: pr-reviewer
- Model:
gpt-5.5 - Commit:
01c3db69
I found one correctness issue in the Discord message fetch path: a malformed or unexpected message id can abort the whole channel sync instead of being reported as a fetch error. The rest of the implementation broadly matches the PR description from the provided diff.
Confidence: Medium [sufficient_diff_evidence, missing_runtime_repro] - The issue is directly visible in platform/daemon/src/discord-source-fetch.ts where BigInt(msg.id) is called inside the message loop without validation or containment. I did not have a live Discord/runtime repro, so confidence is medium rather than high.
Since my last review:
- Fixed or likely addressed: 1
- Still blocking: 0
- Newly found: 1
- Suppressed due to human rebuttal/out-of-scope: 0
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Review metadata
- Reviewer: pr-reviewer
- Model:
gpt-5.5 - Commit:
349f6f6f
I found one lifecycle bug in the new Discord startup sync: it is not tracked as an in-flight source index job, so removal can race with indexing and leave supposedly purged Discord rows behind.
Confidence: Medium [sufficient_diff_evidence, targeted_context_included, missing_runtime_repro] - The diff shows Discord sync is launched directly from daemon startup without source-index in-flight tracking, while DELETE only keeps a tombstone when isSourceIndexInFlight(source.id) is true. I did not run a live daemon reproduction, but the race follows from the changed daemon startup path and existing deletion logic in sources-routes.ts.
Since my last review:
- Fixed or likely addressed: 1
- Still blocking: 0
- Newly found: 1
- Suppressed due to human rebuttal/out-of-scope: 0
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Review metadata
- Reviewer: pr-reviewer
- Model:
gpt-5.5 - Commit:
996dcbd7
I found two lifecycle bugs in the Discord source path: adding a source does not actually kick off indexing, and refreshes can leave stale Discord chunks searchable after the current fetch returns no messages.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - The add route returns queued:false after only writing the source config, and the only sync wiring in daemon.ts runs during startup. The stale-chunk issue follows directly from discord-source-bridge.ts only calling indexDiscordSourceEmbeddings when msgResult.data.length > 0, while stale cleanup lives inside that embedding function.
Since my last review:
- Fixed or likely addressed: 1
- Still blocking: 0
- Newly found: 2
- Suppressed due to human rebuttal/out-of-scope: 0
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Review metadata
- Reviewer: pr-reviewer
- Model:
gpt-5.5 - Commit:
0edad3fe
I found one data-integrity bug in the Discord sync path: transient Discord fetch failures can be interpreted as an authoritative empty state and purge previously indexed graph rows.
Confidence: Medium [sufficient_diff_evidence, missing_cross_module_context] - The new Discord bridge logs fetch errors but still feeds empty/partial results into reconciliation. The provided diff includes the relevant sync and reconcile code, but the full routes file and existing source-index conventions were truncated in the prompt.
Since my last review:
- Fixed or likely addressed: 2
- Still blocking: 0
- Newly found: 1
- Suppressed due to human rebuttal/out-of-scope: 0
| } | ||
| if (isSourceActive()) { | ||
| reconcileDiscordGuildStructure({ | ||
| agentId, |
There was a problem hiding this comment.
This reconciliation runs even when fetchGuildChannels() failed and returned data: [] with errors. In that case currentChannelIds is empty and reconciledChannels is empty, so reconcileDiscordGuildStructure() will treat a transient Discord/API/permission failure as an authoritative empty guild and delete previously indexed channel, conversation, and participant graph rows. The archived-thread path has the same shape: fetchPublicArchivedThreads() errors are ignored, so previously indexed archived thread conversations can be dropped from conversationPaths and purged. Please only reconcile after a successful authoritative listing, or preserve existing rows when fetches are partial/failed.
|
Thanks for the work on this. This PR did several things well: it proved the Discord source shape in Signet, added useful config validation and token-reference guardrails, exercised Discord REST API v10 fetching, wired CLI/API entry points, and surfaced the right lifecycle problems around source removal, stale rows, and partial fetches. After comparing this with Discrawl and the existing Obsidian Sources architecture, we are going to close this PR and replace it with a cleaner stacked implementation. The main issue is not that the code is unusable; it is that the abstraction boundary is too provider-specific. We do not want Discord to grow a second source pipeline beside Obsidian. The replacement work will create a unified Sources substrate first, then move Obsidian and Discord onto that substrate. The replacement direction is:
We will reuse the good pieces from this PR where they fit: token-ref validation, REST fetch helpers/tests, CLI/API ergonomics, and lifecycle regression cases. The Discord-specific bridge/embedding/graph split will not be the foundation for the replacement. |
Summary
Rebuilds #728 as a clean Discord source implementation on current
main. Adds Discord guild/channel/thread indexing through the live Discord REST API v10, source config/CLI/API support, direct embeddings, knowledge graph structure, and docs.Changes
discordsource config support in@signet/core, including guild ID validation, token references, channel filters, message bounds, thread inclusion, and ISOsincenormalization./api/sources/discord.signet sources add discordCLI support.sincefiltering, and stable user-ID participant graph keys.docs/SOURCES.mdanddocs/API.md.Type
feat— new user-facing feature (bumps minor)fix— bug fixrefactor— restructure without behavior changechore— build, deps, config, docsperf— performance improvementtest— test coveragePackages affected
@signet/core@signet/daemon@signet/cli/ dashboard@signet/sdk@signet/connector-*@signet/webpredictorScreenshots
N/A — no UI changes.
PR Readiness (MANDATORY)
INDEX.md+dependencies.yaml)Migration Notes (if applicable)
Rollback / compatibility: no database migration or persisted schema change. Discord source entries are stored in existing
sources.json; rollback is removing the Discord source config entry and reverting these commits. Rust daemon parity is N/A because this extends the TypeScript daemon source indexing path only.Testing
bun testpassesbun run typecheckpassesbun run lintpassesValidated with:
bunx biome check platform/core/src/sources-config.ts platform/core/src/sources-config.test.ts platform/daemon/src/discord-source-fetch.ts platform/daemon/src/discord-source-fetch.test.ts platform/daemon/src/discord-source-bridge.ts platform/daemon/src/discord-source-embeddings.ts platform/daemon/src/discord-source-embeddings.test.ts platform/daemon/src/discord-source-graph.ts platform/daemon/src/discord-source-graph.test.ts platform/daemon/src/routes/sources-routes.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/commands/sources.ts surfaces/cli/src/features/sources.tscd platform/core && bun test src/sources-config.test.tscd platform/daemon && bun test src/discord-source-fetch.test.ts src/discord-source-embeddings.test.ts src/discord-source-graph.test.ts src/routes/sources-routes.test.tscd platform/core && bun run buildcd platform/daemon && bun run build.tsKnown local validation limits:
cd platform/daemon && bun run build:typesstill fails on existing broad reporootDir/SQLite typing issues outside this Discord source change.cd surfaces/cli && bun run build:clirequires connector workspace packages to be built/linked first and failed resolving those connector workspaces in this fresh worktree.AI disclosure
Assisted-bytags in commits)Notes
Replaces #728. Old PR #728 is closed in favor of this clean replacement.