feat(sources): add Discord source with live API indexing + knowledge graph#728
feat(sources): add Discord source with live API indexing + knowledge graph#728aaf2tbz wants to merge 3 commits into
Conversation
…graph Adds a 'discord' source kind that connects to Discord's REST API v10 using a bot token stored in Signet Secrets. Indexes opted-in guilds, channels, and threads as source-labeled conversation context with server/channel progression in the knowledge graph. Architecture follows the Obsidian source pattern (direct embedding into embeddings table + knowledge graph construction) and mirrors the GitHub source PR #727 structure: - discord-source-fetch.ts: Discord REST API client (guilds, channels, messages, threads) with rate limit handling via X-RateLimit-* headers. No external discord.js dependency — raw fetch() only. - discord-source-embeddings.ts: Chunks conversations into embeddable segments grouped by reply chains and time proximity. Source type 'source_discord_chunk'. - discord-source-graph.ts: Knowledge graph hierarchy: source -> guild (community) -> channel -> thread/conversation. Dependencies for containment and participant cross-references. - discord-source-bridge.ts: Sync orchestration, resolves bot token from Signet Secrets, walks channels/threads, indexes on daemon startup. Config: DiscordSourceSettings stored in settings field on SignetSourceEntry. CLI: signet sources add discord --guild-id ID --token-ref REF API: POST /api/sources/discord, DELETE /api/sources/:sourceId (with purge) Tests: 8 new tests (config validation, chunking, graph structure)
- Remove unused purgeDiscordSource import from daemon.ts - Remove unused DISCORD_CHUNK_SOURCE_TYPE import from test - Change let guildName to declaration without initializer in bridge
|
All three CodeQL findings resolved in
|
|
Hi @aaf2tbz - I'm taking a look at the feature work in This comment is updated in place by pr-reviewer. |
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Review metadata
- Reviewer: pr-reviewer
- Model:
gpt-5.5 - Commit:
e925ff83
I found two correctness issues that undermine the claimed live Discord indexing behavior: active threads use a decommissioned API route, and the advertised since filter is never applied. There is also a data-quality issue where participant entities are keyed by mutable display names rather than stable Discord user IDs.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - The active-thread URL is visible in discord-source-fetch.ts and conflicts with Discord API v10 docs, which list /guilds/{guild.id}/threads/active and note /channels/{channel.id}/threads/active was decommissioned. The unused since setting is directly visible because syncDiscordSource parses settings but calls fetchChannelMessages without passing any since-related bound.
| } | ||
|
|
||
| export async function fetchActiveThreads( | ||
| config: DiscordFetchConfig, |
There was a problem hiding this comment.
This active-thread endpoint is wrong for Discord API v10. Discord decommissioned GET /channels/{channel.id}/threads/active in favor of GET /guilds/{guild.id}/threads/active (see the official Discord threads docs: https://docs.discord.com/developers/topics/threads). As written, active threads will fail to fetch, so the PR's claim that it indexes threads via live REST API is only partially true.
|
|
||
| for (const channel of filteredChannels) { | ||
| const channelName = channel.name ?? channel.id; | ||
| try { |
There was a problem hiding this comment.
settings.since is never used during sync. The CLI exposes --since <date> as "Only index messages after this ISO date" and the source config stores since, but the bridge always calls fetchChannelMessages(config, channel.id, settings.maxMessagesPerChannel) with no lower bound. That makes the option a silent no-op and can unexpectedly index the full channel history.
| } catch (err) { | ||
| logger.warn("discord-source", "Failed to sync thread", { | ||
| threadId: thread.id, | ||
| error: err instanceof Error ? err.message : String(err), |
There was a problem hiding this comment.
Participant identity is reduced to global_name ?? username before graph indexing, even though each message has a stable author.id. That will merge distinct Discord users who share a display name and split the same user if they rename themselves, corrupting the source graph over time. Please carry the user ID through and use display names only as labels.
|
Closing in favor of the clean single-commit replacement PR #750, which rebuilds this Discord source work on current main and addresses the reviewer findings: guild-level active thread endpoint, real since filtering, stable Discord user IDs for participants, REST since passthrough, and bounds validation. |
Summary
"discord"source kind that indexes Discord guilds, channels, and threads into Signet's recall system via live Discord REST API v10.embeddingstable + knowledge graph construction — not the connector/document pipeline pattern.Validation
bun test platform/core/src/sources-config.test.ts platform/daemon/src/discord-source-embeddings.test.ts platform/daemon/src/discord-source-graph.test.tsbunx biome check platform/daemon/src/discord-source-fetch.ts platform/daemon/src/discord-source-embeddings.ts platform/daemon/src/discord-source-graph.ts platform/daemon/src/discord-source-bridge.ts platform/core/src/sources-config.ts surfaces/cli/src/commands/sources.ts surfaces/cli/src/features/sources.ts platform/daemon/src/routes/sources-routes.ts platform/daemon/src/daemon.ts7ec5e51Notes
sources.jsonconfig with asettingsfield.discord.jsdependency — rawfetch()against Discord REST v10.rootfield left empty for Discord sources (no filesystem root, like GitHub sources).discord-parser.tsleft untouched — this is a live API path, not static DiscordChatExporter parsing.PR Readiness (MANDATORY)
INDEX.md+dependencies.yaml)Migration Notes (if applicable)
Rollback / compatibility: no migration or persisted data change; rollback is removing the source config entry and reverting the commit.