Skip to content

feat(sources): add GitHub source provider#764

Merged
NicholaiVogel merged 21 commits into
mainfrom
codex/github-source-provider
May 24, 2026
Merged

feat(sources): add GitHub source provider#764
NicholaiVogel merged 21 commits into
mainfrom
codex/github-source-provider

Conversation

@NicholaiVogel
Copy link
Copy Markdown
Collaborator

@NicholaiVogel NicholaiVogel commented May 24, 2026

Summary

Reimplements the GitHub source work from #749 on top of the provider-neutral source substrate from #757. GitHub now plugs into the shared Sources provider/job/purge path instead of adding a parallel daemon bridge and poller, and it is exposed from the dashboard Sources setup flow.

Changes

  • Adds github source config support in @signet/core using providerSettings, with validation for repo patterns, resource types, token/discussion requirements, Markdown doc paths, labels, state, and per-repo bounds.
  • Adds a GitHub fetcher for repo expansion, issues, pull requests, discussions, selected Markdown docs, and comments with the feat(sources): add GitHub source indexing #749 hardening carried forward.
  • Adds a githubSourceProvider that writes source-owned memory_artifacts with provenance and failure artifacts, purges through shared source-owned purge, and records unmatched wildcard repo patterns as source failures instead of silently indexing nothing.
  • Adds POST /api/sources/github, signet sources add github, source listing support, and API/Sources docs.
  • Surfaces GitHub in the dashboard Sources tab with a setup form for repositories, optional secret reference, resource toggles, state, labels, doc paths, comments, and item cap.
  • Tightens shared provider source jobs so provider-reported failures mark the job failed instead of silently completing.

Type

  • feat — new user-facing feature (bumps minor)
  • fix — bug fix
  • refactor — restructure without behavior change
  • chore — build, deps, config, docs
  • perf — performance improvement
  • test — test coverage

Packages affected

  • @signet/core
  • @signet/daemon
  • @signet/cli / dashboard
  • @signet/sdk
  • @signet/connector-*
  • @signet/web
  • predictor
  • Other: docs

Screenshots

Dashboard Sources tab visual preview was captured locally with Playwright at /tmp/signet-pr-764/github-source-dashboard.png. The preview shows GitHub as an available source with copy scoped to issues, PRs, discussions, and docs. The GitHub CLI cannot attach binary images to this PR body from the local environment, but the UI path was smoke-tested through the dashboard dev server.

PR Readiness (MANDATORY)

  • Spec alignment validated (INDEX.md + dependencies.yaml)
  • Agent scoping verified on all new/changed data queries
  • Input/config validation and bounds checks added
  • Error handling and fallback paths tested (no silent swallow)
  • Security checks applied to admin/mutation endpoints
  • Docs updated for API/spec/status changes
  • Regression tests added for each bug fix
  • Lint/typecheck/tests pass locally

Migration Notes (if applicable)

  • Migration is idempotent
  • Daemon Rust parity reviewed or explicitly N/A
  • Rollback / compatibility note included in PR description

No schema migration. GitHub sources use the existing sources.json provider settings and source-owned artifact provenance from #757. Rollback removes GitHub source config entries and purges source-owned artifacts through DELETE /api/sources/:sourceId. Rust daemon parity is N/A because this extends the TypeScript daemon source provider path only.

Testing

  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts
  • bunx biome check surfaces/dashboard/src/lib/api.ts surfaces/dashboard/src/lib/components/tabs/SourcesTab.svelte platform/core/src/sources-config.ts platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.ts surfaces/cli/src/features/sources.ts
  • bun run --filter '@signet/core' build
  • bun run --filter '@signet/daemon' build
  • bun run --filter '@signet/cli' build
  • cd surfaces/dashboard && bun run build
  • git diff --check
  • Dashboard dev server smoke at http://127.0.0.1:5173/#sources
  • Tested against running daemon
  • N/A

Note: cd surfaces/dashboard && bun run check currently fails on an existing duplicate @codemirror/view KeyBinding type mismatch in CodeEditor.svelte; the production dashboard build passes.

AI disclosure

  • No AI tools were used in this PR
  • AI tools were used (see Assisted-by tags in commits)

Notes

Supersedes #749, which was closed because it conflicted with #757/#759 and used a bespoke GitHub bridge/poller. This PR keeps the useful review hardening from #749 while adopting the shared provider substrate.

@PR-Reviewer-Ant
Copy link
Copy Markdown
Collaborator

Hi @NicholaiVogel - I'm taking a look at the feature work in feat(sources): add GitHub source provider (commit b3dede29) and will follow up shortly.

This comment is updated in place by pr-reviewer.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: b3dede29

I found two correctness issues in the GitHub source indexing path: discussion comments can overwrite each other, and comment-fetch failures are not propagated to the shared job failure state.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - Both issues are directly visible in the new GitHub source files. The discussion comment id conversion is in github-source-fetch.ts, and the swallowed job failure is in github-source-provider.ts where a failure artifact is written but the returned failures array is not updated.

Comment thread platform/daemon/src/github-source-fetch.ts Outdated
Comment thread platform/daemon/src/github-source-provider.ts Outdated
@NicholaiVogel
Copy link
Copy Markdown
Collaborator Author

Addressed the two reviewer findings in caf36a7:

  • Preserved opaque GraphQL discussion comment IDs as strings instead of coercing them to numbers, with a regression test for duplicate-safe IDs.
  • Propagated comment-fetch failures through the provider failures array so the shared source job reports failure, with a provider regression test.

Validation run:

  • bun test platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts
  • bunx biome check platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts
  • git diff --check
  • bun run --filter '@signet/daemon' build

Reviewer, please re-check the latest head.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: caf36a71

I found one security gap: the new GitHub source path documents token references, but the implementation will accept and persist raw GitHub tokens as tokenRef.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The PR description and docs claim GitHub tokens are secret references and not stored in source config. The changed GitHub config builder trims and persists any non-empty tokenRef string without the raw-token rejection already used for Discord.

Since my last review:

  • Fixed or likely addressed: 2
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/core/src/sources-config.ts
Copy link
Copy Markdown
Collaborator Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline replies posted after addressing the reviewer findings in caf36a7.

Comment thread platform/daemon/src/github-source-provider.ts Outdated
@NicholaiVogel
Copy link
Copy Markdown
Collaborator Author

Addressed the latest reviewer finding in 394c6da:

  • Added a GitHub raw-token guard before tokenRef is persisted to source config.
  • Covered common raw GitHub token prefixes and auth-header forms.
  • Added regression coverage in core source config and /api/sources/github route tests.

Validation run:

  • bun run --filter '@signet/core' build
  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts
  • bunx biome check platform/core/src/sources-config.ts platform/core/src/sources-config.test.ts platform/daemon/src/routes/sources-routes.test.ts platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts
  • git diff --check
  • bun run --filter '@signet/daemon' build

Reviewer, please re-check the latest head.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 394c6da5

I found a pagination gap in the new GitHub fetcher: some configured resource classes silently stop at GitHub's first 100 results even though the PR advertises maxItemsPerRepo bounds and defaults that are higher. I also found one CLI validation edge case worth tightening.

Confidence: Medium [sufficient_diff_evidence, targeted_context_included, missing_runtime_repro] - The missing pagination is visible directly in the new fetch functions: fetchDiscussions sends only one GraphQL request with first: Math.min(maxItems, 100), and the label-filtered PR search sends one REST search request with per_page capped at 100. I did not run this against a live repo with more than 100 matching items, so the review is based on API behavior and the changed code.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 3
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-fetch.ts Outdated
Comment thread platform/daemon/src/github-source-fetch.ts
Comment thread surfaces/cli/src/features/sources.ts
@NicholaiVogel
Copy link
Copy Markdown
Collaborator Author

Addressed the latest reviewer findings in 9c04c98:

  • Paginated GraphQL discussion fetches with pageInfo/endCursor until maxItems or the final page.
  • Paginated the label-filtered PR search path with GitHub search page parameters up to maxItems.
  • Replaced permissive CLI parseInt handling with a strict integer parser before core validation, including --max-items 25oops coverage.

Validation run:

  • bun test platform/daemon/src/github-source-fetch.test.ts surfaces/cli/src/features/sources.test.ts
  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts
  • bunx biome check platform/core/src/sources-config.ts platform/core/src/sources-config.test.ts platform/daemon/src/routes/sources-routes.test.ts platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts surfaces/cli/src/features/sources.ts surfaces/cli/src/features/sources.test.ts
  • git diff --check
  • bun run --filter '@signet/daemon' build
  • bun run --filter '@signet/cli' build

Reviewer, please re-check the latest head.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 9c04c98f

I found one correctness issue in the GitHub discussion sync path: a failed GraphQL comment response can be silently treated as an empty comment list. That undercuts the PR's stated failure-artifact behavior for partial GitHub failures.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The changed GraphQL discussion fetch path is included in the diff. fetchDiscussionComments only checks HTTP status, while fetchDiscussions explicitly handles GraphQL errors; writeResourceWithComments depends on thrown errors to record provider failures.

Since my last review:

  • Fixed or likely addressed: 3
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-fetch.ts Outdated
Copy link
Copy Markdown
Collaborator Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline replies posted after addressing GitHub pagination and CLI validation findings in 9c04c98.

Comment thread platform/daemon/src/github-source-fetch.ts Outdated
Copy link
Copy Markdown
Collaborator Author

@NicholaiVogel NicholaiVogel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline replies posted after addressing GitHub pagination and CLI validation findings in 9c04c98.

Comment thread platform/daemon/src/github-source-fetch.ts
@NicholaiVogel
Copy link
Copy Markdown
Collaborator Author

Addressed the latest reviewer finding in c38ecf2:

  • fetchDiscussionComments now treats HTTP 200 GraphQL errors as failures and throws, so the provider comment-failure path records source_github_failure and returns a failed source job result.
  • Added regression coverage for discussion comment GraphQL errors.

Validation run:

  • bun test platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts
  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts
  • bunx biome check platform/core/src/sources-config.ts platform/core/src/sources-config.test.ts platform/daemon/src/routes/sources-routes.test.ts platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts surfaces/cli/src/features/sources.ts surfaces/cli/src/features/sources.test.ts
  • git diff --check
  • bun run --filter '@signet/daemon' build

Reviewer, please re-check the latest head.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: c38ecf2e

I found one correctness issue in the GitHub docs fetch path: configured nested Markdown docs are requested with encoded slashes, so the advertised docs/**/*.md / docs/API.md support will miss normal repository files.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The new docs provider builds the GitHub contents URL directly in platform/daemon/src/github-source-fetch.ts, and the changed code encodes the whole path segment with encodeURIComponent. That turns docs/API.md into docs%2FAPI.md, which does not match the GitHub contents API path shape used for nested files. The PR description and docs explicitly claim selected Markdown docs and Markdown globs are supported.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-fetch.ts
@NicholaiVogel
Copy link
Copy Markdown
Collaborator Author

Addressed the latest reviewer finding in ca07621:

  • GitHub content paths are now encoded segment-by-segment, preserving / for nested repository paths while still escaping individual path segments.
  • Added regression coverage for docs/API.md so the request uses /contents/docs/API.md and not docs%2FAPI.md.

Validation run:

  • bun test platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts
  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts
  • bunx biome check platform/core/src/sources-config.ts platform/core/src/sources-config.test.ts platform/daemon/src/routes/sources-routes.test.ts platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts surfaces/cli/src/features/sources.ts surfaces/cli/src/features/sources.test.ts
  • git diff --check
  • bun run --filter '@signet/daemon' build

Reviewer, please re-check the latest head.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: ca076213

I found one correctness/scope issue in the GitHub doc glob handling. The provider is otherwise wired through the shared source config/job path as described.

Confidence: Medium [sufficient_diff_evidence, missing_runtime_repro] - The issue is visible directly in platform/daemon/src/github-source-fetch.ts: globToRegexSource translates every '' to '.', so the matching behavior for configured doc paths does not match the documented path semantics. I did not run the tests or a live daemon in this read-only review context.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

Comment thread platform/daemon/src/github-source-fetch.ts Outdated
@NicholaiVogel
Copy link
Copy Markdown
Collaborator Author

Addressed the latest reviewer finding in 0bd6ebc:

  • Doc glob translation now keeps single * and ? within one path segment.
  • ** / **/ remain recursive, with docs/**/*.md matching both direct and nested Markdown files.
  • Added regression coverage for docs/*.md excluding nested files and docs/**/*.md including direct plus nested docs.

Validation run:

  • bun test platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts
  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts
  • bunx biome check platform/core/src/sources-config.ts platform/core/src/sources-config.test.ts platform/daemon/src/routes/sources-routes.test.ts platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts surfaces/cli/src/features/sources.ts surfaces/cli/src/features/sources.test.ts
  • git diff --check
  • bun run --filter '@signet/daemon' build

Reviewer, please re-check the latest head.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 0bd6ebc5

I found one bounds bug in the GitHub discussions fetch path. The implementation adds the provider-neutral source path, but the configured cap does not actually bound GraphQL discussion scanning for filtered states.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The PR description and docs claim GitHub discussion fetches honor maxItemsPerRepo. In platform/daemon/src/github-source-fetch.ts, fetchDiscussions only applies maxItems to accepted resources while filtering state client-side, so a repository with many non-matching discussions can paginate far beyond the configured cap.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

}`;
let cursor: string | null = null;
while (resources.length < maxItems) {
const response = await githubRequest(GRAPHQL_URL, config.token, "POST", {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop is not actually bounded by maxItems when state filters out most discussion nodes. Because the GraphQL query fetches all discussions and applies state client-side, resources.length can stay below maxItems while the loop keeps following pageInfo.hasNextPage across the entire repository. That diverges from the PR/docs claim that discussion fetches honor maxItemsPerRepo, and it also makes a configured small cap ineffective on large repos. Please push the state filter into the GraphQL query if GitHub supports it, or add a separate scanned-page/node cap tied to maxItemsPerRepo.

@NicholaiVogel NicholaiVogel force-pushed the codex/github-source-provider branch 2 times, most recently from 77f440d to 5e83fe5 Compare May 24, 2026 18:14
@NicholaiVogel
Copy link
Copy Markdown
Collaborator Author

Rebased the branch onto current origin/main and added the dashboard path for the GitHub source setup in 5e83fe54:

  • GitHub now shows as an available source in the dashboard instead of planned.
  • Added a setup form for repositories, optional secret reference, issue/PR/docs/discussion toggles, comments, labels, doc paths, state, and item cap.
  • Kept public-repo setup low-friction: repo + defaults is enough; token refs are only needed for private repos, higher rate limits, and discussions.
  • Dashboard connected-source counts, scan labels, and disconnect copy now include GitHub sources.

Validation run on the rebased head:

  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts
  • bunx biome check surfaces/dashboard/src/lib/api.ts surfaces/dashboard/src/lib/components/tabs/SourcesTab.svelte platform/core/src/sources-config.ts platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-provider.ts platform/daemon/src/routes/sources-routes.ts surfaces/cli/src/features/sources.ts
  • git diff --check
  • cd surfaces/dashboard && bun run build
  • bun run --filter '@signet/daemon' build
  • Dashboard dev server smoke: http://127.0.0.1:5173/ served the app successfully.

Note: cd surfaces/dashboard && bun run check still hits the existing unrelated CodeMirror duplicate @codemirror/view KeyBinding type mismatch in CodeEditor.svelte; the dashboard production build passes.

Reviewer, please re-check the latest head.

@NicholaiVogel
Copy link
Copy Markdown
Collaborator Author

Addressed the outstanding discussion pagination bound finding in a51d9a42:

  • fetchDiscussions now counts scanned GraphQL discussion nodes against maxItemsPerRepo, not only accepted resources.
  • A state filter like open can no longer page through unlimited closed discussions looking for matches.
  • Added regression coverage where a rejected discussion node plus hasNextPage: true still stops after the configured cap.

Additional validation on the latest head:

  • bun test platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts
  • bunx biome check platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-fetch.test.ts
  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts
  • git diff --check
  • bun run --filter '@signet/daemon' build

Reviewer, please re-check the latest head.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 5e83fe54

I found two non-blocking issues: wildcard GitHub sources can silently index nothing, and the dashboard now exposes GitHub as available while still implying code indexing. The backend direction mostly matches the PR description, but those edges are worth tightening before a human final pass.

Confidence: Medium [sufficient_diff_evidence, targeted_context_included, missing_runtime_repro] - The findings are directly visible in platform/daemon/src/github-source-provider.ts and surfaces/dashboard/src/lib/components/tabs/SourcesTab.svelte. I did not run the PR tests or a live daemon sync, so runtime behavior around GitHub API failures is inferred from the changed control flow.

Note: This PR touches UI files but no screenshots were referenced in the description. Consider adding visual previews for reviewers.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 3
  • Suppressed due to human rebuttal/out-of-scope: 0

async function resolveRepos(settings: GitHubSourceSettings, token?: string): Promise<ResolvedRepo[]> {
const resolved: ResolvedRepo[] = [];
for (const pattern of settings.repos) {
const [owner, repoPart] = pattern.split("/");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A wildcard repo pattern that expands to zero repositories currently just leaves resolved empty, and syncGitHubSource will return total: 0, no failures, and let the shared job mark the source indexed. For owner/* typos, auth visibility problems, or a too-narrow glob, that makes the source look successfully indexed while it produced no data. Since the docs advertise wildcard repos and the PR calls out provider-reported failures, this should record a provider failure when any configured pattern resolves no repos.

indexes: ["Repository files", "Issues and PRs", "Project documentation"],
tags: ["code", "cloud", "issues", "pull-requests", "docs"],
status: "available",
indexes: ["Issues and pull requests", "Discussion and issue comments", "README and docs paths"],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the GitHub connector available in the dashboard, but the same connector card still advertises Issues, PRs, code, docs. The provider and docs only support issues, pulls, discussions, comments, and selected Markdown docs; they intentionally do not index repo code. Once the connector is available, that stale code claim becomes user-facing scope drift.

indexes: ["Repository files", "Issues and PRs", "Project documentation"],
tags: ["code", "cloud", "issues", "pull-requests", "docs"],
status: "available",
indexes: ["Issues and pull requests", "Discussion and issue comments", "README and docs paths"],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says screenshots are N/A because there are no UI changes, but this diff adds the dashboard GitHub connector form and marks the connector available. That needs at least a screenshot or an explicit note of the checked dashboard state so reviewers can validate the new flow.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: a51d9a42

I found one review concern: the PR changes the dashboard Sources UI while the description says there are no UI changes and provides no screenshot evidence.

Confidence: Medium [sufficient_diff_evidence, missing_runtime_repro] - The diff clearly changes surfaces/dashboard/src/lib/components/tabs/SourcesTab.svelte and surfaces/dashboard/src/lib/api.ts to expose GitHub source connection from the dashboard, while the PR description marks screenshots as N/A because there are no UI changes. I did not have a runtime screenshot or browser check to assess layout behavior.

Note: This PR touches UI files but no screenshots were referenced in the description. Consider adding visual previews for reviewers.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

name: "GitHub Repositories",
detail: "Issues, PRs, code, docs",
description: "Bring repository context, issues, pull requests, and project docs into recall.",
description:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does include user-visible dashboard changes: GitHub is now marked available and the Sources tab gets a GitHub connection flow. The PR description says “N/A — no UI changes,” so reviewers have no screenshot or runtime evidence for this new UI path. Please update the PR description with screenshots or equivalent visual verification for the changed Sources tab.

@NicholaiVogel
Copy link
Copy Markdown
Collaborator Author

Addressed the latest non-blocking reviewer cleanup in 74ba4c49 and updated the PR description:

  • Wildcard GitHub repo patterns that match nothing now record a source_github_failure instead of silently indexing zero repositories.
  • Dashboard GitHub source copy now says Issues, PRs, discussions, docs and no longer implies general source-code indexing.
  • PR description now documents the dashboard UI change and the local Playwright screenshot/smoke path instead of saying screenshots are N/A.

Additional validation:

  • bun test platform/daemon/src/github-source-provider.test.ts platform/daemon/src/github-source-fetch.test.ts
  • bunx biome check platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts surfaces/dashboard/src/lib/components/tabs/SourcesTab.svelte
  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts
  • bunx biome check surfaces/dashboard/src/lib/api.ts surfaces/dashboard/src/lib/components/tabs/SourcesTab.svelte platform/core/src/sources-config.ts platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.ts surfaces/cli/src/features/sources.ts
  • git diff --check
  • bun run --filter '@signet/daemon' build
  • Playwright screenshot smoke for http://127.0.0.1:5173/#sources captured locally at /tmp/signet-pr-764/github-source-dashboard.png.

Reviewer, please re-check the latest head.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 74ba4c49

I found one correctness issue in the GitHub provider's stale purge path: a partial failure in one repository prevents stale-artifact cleanup for later repositories in the same source sync. The rest of the changed surface generally matches the PR description and includes targeted validation and tests.

Confidence: Medium [sufficient_diff_evidence, targeted_context_included, missing_runtime_repro] - The provider code shows a single source-wide failures array being used to decide per-repo purge behavior, and the condition is directly visible in platform/daemon/src/github-source-provider.ts. I did not run the daemon or reproduce with a persisted database, so the confidence is medium rather than high.

Since my last review:

  • Fixed or likely addressed: 4
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

context.source,
agentId,
repo.fullName,
config,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This skips stale-artifact purging for every repo after any earlier repo records a failure, even if the current repo synced successfully. failures is source-wide, so a harmless failure like one unmatched wildcard or one comment fetch error can leave stale issue/PR/doc rows undeleted for other repositories in the same GitHub source. The purge decision needs to be scoped to failures for this specific repo/sync segment, otherwise multi-repo sources can accumulate stale source-owned rows indefinitely after a partial failure.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9a514f2.

The stale purge decision is now scoped to the current repository segment by snapshotting the failure count before each repo sync and only skipping that repo's purge when a new failure was recorded while syncing that repo. A prior unmatched wildcard or another repo failure no longer blocks stale cleanup for later successful repos.

I added a regression test that preloads a stale issue for Signet-AI/signetai, runs a source with one failing wildcard plus one successful repo, and verifies the successful repo purges the stale row while still preserving the provider failure artifact.

Validation:

  • bun test platform/daemon/src/github-source-provider.test.ts platform/daemon/src/github-source-fetch.test.ts -> 18 pass, 0 fail
  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts -> 61 pass, 0 fail
  • bunx biome check platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts -> passed
  • git diff --check -> passed
  • bun run --filter '@signet/daemon' build -> passed

@NicholaiVogel
Copy link
Copy Markdown
Collaborator Author

Latest reviewer finding is addressed on the new head, 9a514f2.

What changed:

  • Scoped GitHub stale purging to the current repository sync segment, so a partial failure in one configured repo no longer prevents stale-artifact cleanup for later successful repos.
  • Added a regression test covering an unmatched wildcard plus a successful repo with a stale preexisting artifact.

Validation on the pushed head:

  • bun test platform/daemon/src/github-source-provider.test.ts platform/daemon/src/github-source-fetch.test.ts -> 18 pass, 0 fail
  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts -> 61 pass, 0 fail
  • bunx biome check platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts -> passed
  • git diff --check -> passed
  • bun run --filter '@signet/daemon' build -> passed, with existing dashboard Svelte warnings outside this change

Requesting another reviewer pass on 9a514f2.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 9a514f2e

I found one blocking correctness issue in the new GitHub PR fetch path: the default source configuration can crash on real pull request responses.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The default GitHub resource set includes pulls, fetchPullRequests maps real /pulls responses through pullResource, and pullResource unconditionally dereferences pull.labels. GitHub's List pull requests REST response schema does not include labels on pull request objects; labels are available via issue/search-style responses instead.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

body: pull.body ?? "",
state: pull.state,
url: pull.html_url,
labels: pull.labels.map((label) => label.name),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will throw for the normal /repos/{owner}/{repo}/pulls response because pull request objects do not include a labels array in GitHub's REST API response. Since pulls are enabled by default for tokenless GitHub sources, a default signet sources add github --repo ... can fail as soon as it indexes PRs. Either fetch PRs through the issues/search surface when labels are needed, tolerate pull.labels ?? [], or separately hydrate issue metadata for labels. Reference: https://docs.github.com/en/rest/pulls/pulls#list-pull-requests

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 202d603.

GitHubPullRequest.labels, comments, and review_comments are now optional in the REST pull-list shape, and pullResource normalizes missing labels to [] and missing comment counts to 0. That keeps the default tokenless pull sync path from throwing on real /repos/Signet-AI/signetai/pulls responses.

I added a regression test that mocks a pull-list response without labels and verifies it maps to a pull resource with empty labels and zero comment count.

Validation:

  • bun test platform/daemon/src/github-source-fetch.test.ts -> 13 pass, 0 fail
  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts -> 62 pass, 0 fail
  • bunx biome check platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-fetch.test.ts -> passed
  • git diff --check -> passed
  • bun run --filter '@signet/daemon' build -> passed

@NicholaiVogel
Copy link
Copy Markdown
Collaborator Author

Latest reviewer finding is addressed on the new head, 202d603.

What changed:

  • Normalized real GitHub /pulls responses that omit issue labels so default PR sync no longer crashes.
  • Added a regression test for pull-list responses without labels.

Validation on the pushed head:

  • bun test platform/daemon/src/github-source-fetch.test.ts -> 13 pass, 0 fail
  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts -> 62 pass, 0 fail
  • bunx biome check platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-fetch.test.ts -> passed
  • git diff --check -> passed
  • bun run --filter '@signet/daemon' build -> passed, with existing dashboard Svelte warnings outside this change

Requesting another reviewer pass on 202d603.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 202d6034

I found one reliability issue in the new GitHub fetcher: failed or aborted requests leave their timeout timer behind. The rest of the provider/config path looks consistent with the stated shared Sources integration from the diff provided.

Confidence: Medium [sufficient_diff_evidence, missing_runtime_repro] - The timer cleanup issue is directly visible in platform/daemon/src/github-source-fetch.ts: the timeout is only cleared after fetch resolves successfully. I did not run the daemon or the PR tests in this read-only review context, so I am not claiming runtime reproduction.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

const MAX_RETRIES = 3;
const RETRY_BASE_DELAY_MS = 1_000;
const MAX_ISSUE_SCAN_MULTIPLIER = 5;
const MAX_ISSUE_SCAN_FLOOR = PER_PAGE * 5;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If fetch() rejects or the abort signal fires, this timeout is never cleared because clearTimeout(timeout) only runs on the success path. In a source sync with many network failures, that leaves a pile of pending timers/controllers for up to 30 seconds per retry. Wrap the fetch in a try/finally inside the attempt so the timeout is always cleared before retrying or surfacing the error.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8859a02.

githubRequest now clears the per-attempt abort timeout on both successful and failed fetch attempts. The timeout handle is nulled after successful fetch completion and cleared in finally for rejected/aborted attempts, so retrying or surfacing an error no longer leaves pending request timers behind.

I added a regression test that forces fetch attempts to reject, tracks the request timeout handles, and verifies each one is cleared.

Validation:

  • bun test platform/daemon/src/github-source-fetch.test.ts -> 14 pass, 0 fail
  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts -> 63 pass, 0 fail
  • bunx biome check platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-fetch.test.ts -> passed
  • git diff --check -> passed
  • bun run --filter '@signet/daemon' build -> passed

@NicholaiVogel
Copy link
Copy Markdown
Collaborator Author

Latest reviewer finding is addressed on the new head, 8859a02.

What changed:

  • Cleared GitHub request abort timeouts on rejected/aborted fetch attempts as well as successful responses.
  • Added a regression test that verifies failed request attempts clear their timeout handles.

Validation on the pushed head:

  • bun test platform/daemon/src/github-source-fetch.test.ts -> 14 pass, 0 fail
  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts -> 63 pass, 0 fail
  • bunx biome check platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-fetch.test.ts -> passed
  • git diff --check -> passed
  • bun run --filter '@signet/daemon' build -> passed, with existing dashboard Svelte warnings outside this change

Requesting another reviewer pass on 8859a02.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: c627e07f

I found one correctness issue in the GitHub provider's purge path: comment artifacts are not tracked as seen during a successful sync, so unchanged comments can be deleted immediately after being refreshed if their artifact timestamp remains older than the sync start.

Confidence: Medium [sufficient_diff_evidence, missing_cross_module_context] - The provider only adds parent resource paths to seenPaths in platform/daemon/src/github-source-provider.ts, while comment artifacts are written under child source paths. The purge query deletes all older rows under the repo prefix that are not in seenPaths. I did not have the implementation of indexExternalMemoryArtifact in this context, so confidence depends on whether it always updates memory_artifacts.updated_at to wall-clock write time rather than preserving artifact timestamps.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

if (!context.shouldContinue()) break;
const written = await writeResourceWithComments(
context.source,
agentId,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stale purge only marks the parent resource path as seen. writeResourceWithComments can also write source_github_comment rows at ${resourcePath(...)}#comment-${commentId}, but those paths never enter seenPaths. On a successful repo sync, purgeStaleGitHubArtifacts scans the whole github://${repo}/ prefix and deletes older rows not in seenPaths, so existing comment artifacts can be purged even when their parent issue/PR/discussion was successfully refreshed. Please track the written comment paths too, or otherwise exclude/refine comment purge semantics so comment rows are not treated as unseen just because the parent resource path was seen.

@NicholaiVogel NicholaiVogel force-pushed the codex/github-source-provider branch from c627e07 to 4fec2f9 Compare May 24, 2026 18:51
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 4fec2f90

I found one correctness issue in the GitHub source fetcher: label-filtered PR indexing currently drops PR labels from the indexed artifacts, which makes the new provider diverge from its own metadata contract.

Confidence: Medium [sufficient_diff_evidence, missing_runtime_repro] - The issue is visible directly in platform/daemon/src/github-source-fetch.ts: label-filtered PRs come from the search/issues endpoint and are mapped through searchPullResource, but that mapper reuses issueResource without preserving PR-specific fields and sets the type to pull. I did not run the PR against a live GitHub repository, so confidence is medium rather than high.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

if (response.status === 404) return null;
if (response.status !== 200) throw new Error(`Doc fetch failed for ${path}: ${response.status}`);
const body = response.body as { content?: string; encoding?: string; sha?: string; html_url?: string };
if (body.encoding !== "base64" || !body.content) return null;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Label-filtered PR indexing goes through fetchPullRequestsBySearch, which receives search/issues items and then maps them with searchPullResource. Those items do include labels, but the resulting artifact no longer has real PR metadata like base/head/draft/merged status, and mergedAt is always forced to null. Since this PR claims GitHub pull requests are indexed as first-class source artifacts with provenance, label-filtered PR sources silently get lower-fidelity and partly wrong PR rows than unfiltered PR sources. The provider should either hydrate each search hit through the pulls API before writing it, or document and encode this as a distinct limited search result shape rather than pretending it is the same pull artifact.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 274df5dd.

The label-filtered PR path now hydrates each search result through /repos/{owner}/{repo}/pulls/{number} before writing a pull artifact. The artifact keeps the search result labels and uses the hydrated PR fields for title/body/state, merged timestamp, comment counts, draft, base, and head metadata. If the detail fetch fails, the provider records an explicit fetch error instead of silently pretending the search-shaped data is complete.

Added regression coverage for the labeled PR search path to verify labels are preserved and PR-specific metadata is populated.

Validation:

  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts -> 75 pass
  • bunx biome check platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-fetch.test.ts -> pass
  • git diff --check -> pass
  • bun run --filter '@signet/daemon' build -> pass

@NicholaiVogel
Copy link
Copy Markdown
Collaborator Author

Rebased on current origin/main and pushed 4fec2f90.

Latest reviewer finding addressed:

  • narrowed GitHub repo stale cleanup to the slash-delimited repo path prefix so sibling repositories with shared name prefixes are not purged
  • added a regression covering Signet-AI/signetai plus Signet-AI/signetai-extra in one source

Validation on the rebased head:

  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts -> 74 pass
  • bunx biome check platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.ts surfaces/cli/src/features/sources.ts surfaces/cli/src/features/sources.test.ts -> pass
  • git diff --check -> pass
  • bun run --filter '@signet/daemon' build -> pass

@PR-Reviewer-Ant please re-review the latest head.

@NicholaiVogel
Copy link
Copy Markdown
Collaborator Author

Pushed 274df5dd for the latest reviewer finding.

Change:

  • label-filtered PR search results are now hydrated via the PR detail endpoint before indexing
  • search labels are preserved on the hydrated pull artifact
  • added regression coverage for labels plus PR-specific metadata (mergedAt, comment counts, draft/base/head)

Validation:

  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts -> 75 pass
  • bunx biome check platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-fetch.test.ts -> pass
  • git diff --check -> pass
  • bun run --filter '@signet/daemon' build -> pass

@PR-Reviewer-Ant please re-review the latest head.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 274df5dd

The implementation generally matches the PR description and stays on the shared source-provider path. I found one persistence issue in the failure-artifact path that can collapse multiple GitHub failures into one stored artifact.

Confidence: Medium [sufficient_diff_evidence, targeted_context_included, missing_runtime_repro] - The finding is directly visible in platform/daemon/src/github-source-provider.ts: each failure artifact path is keyed only by failedAt, while multiple failureState calls can happen in the same sync loop. I did not run the PR tests or inspect the live daemon job table in this read-only review environment.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

sourceId: source.id,
sourceRoot: source.root,
sourceExternalId: `failure:${failure.failedAt}:${failure.message}`,
sourcePath: `github://source/${source.id}/failures/${encodeURIComponent(failure.failedAt)}`,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure artifacts use only failure.failedAt in sourcePath. Several failures can be created in the same millisecond during one sync, especially when a repo returns multiple provider errors or several wildcard patterns miss quickly. Since source artifacts are normally upserted by source path, those failures can overwrite/collapse each other even though the PR explicitly promises source-owned failure artifacts for partial GitHub failures. Include a stable discriminator such as phase/repo plus an index or a hash of the failure message/metadata in the path.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3aa115de.

GitHub failure artifacts now use a source path with the failure timestamp plus a SHA-256-derived fingerprint of the failure message and metadata. That keeps multiple failures created in the same millisecond from upserting over one another while preserving the existing github://source/{source.id}/failures/ purge prefix.

Added a regression that freezes Date so two wildcard misses share the exact same failedAt, then verifies two distinct source_github_failure rows are stored.

Validation:

  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts -> 76 pass
  • bunx biome check platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts -> pass
  • git diff --check -> pass
  • bun run --filter '@signet/daemon' build -> pass

@NicholaiVogel
Copy link
Copy Markdown
Collaborator Author

Pushed 3aa115de for the latest reviewer finding.

Change:

  • GitHub failure artifact paths now include timestamp plus a deterministic failure fingerprint, so same-millisecond failures do not collapse into one artifact
  • added a regression that freezes Date and verifies two wildcard-miss failures produce two distinct source_github_failure rows

Validation:

  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts -> 76 pass
  • bunx biome check platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts -> pass
  • git diff --check -> pass
  • bun run --filter '@signet/daemon' build -> pass

@PR-Reviewer-Ant please re-review the latest head.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 3aa115de

I found one bounds issue worth discussing: the advertised per-repo item cap is applied once per resource type, so a configured repo can index several times the requested cap before comments are counted.

Confidence: Medium [sufficient_diff_evidence, targeted_context_included, missing_runtime_repro] - The provider passes settings.maxItemsPerRepo independently to issues, pulls, discussions, and docs in platform/daemon/src/github-source-provider.ts. I did not run a live GitHub sync, so the finding is based on the changed control flow and the PR/docs wording around maxItemsPerRepo bounds.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

if (settings.resourceTypes.includes("issues")) {
const result = await fetchIssues(config, undefined, settings.state, settings.maxItemsPerRepo, settings.labels);
resources.push(...result.resources);
writeFetchFailures(source, failures, repo.fullName, "issues", result.errors);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxItemsPerRepo is being used as a per-resource-type cap here and again for pulls, discussions, and docs below. With all four resource types enabled, a source configured with maxItemsPerRepo: 500 can index up to roughly 2,000 top-level artifacts for one repo before comment artifacts are added, which does not match the PR's “per-repo bounds” / dashboard “item cap” behavior. This also weakens the abuse-control value of the cap for large repos. Consider tracking a remaining per-repo budget across fetchRepoResources calls, or rename/document the setting as a per-resource-type cap if that is intentional.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1f581211.

maxItemsPerRepo is now enforced as one per-repo artifact budget across enabled primary resource types and fetched comments, instead of being applied once per resource type. The sync loop also stops writing once the per-repo budget is exhausted, and comment indexing is sliced to the remaining artifact budget.

I added regression coverage for both parts of the bound:

  • primary resource types share one cap, so issues can consume the repo budget before pulls/docs are requested
  • comments count against the same cap, so a repo capped at 2 writes one issue plus one comment, not all comments

Validation:

  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts -> 78 pass
  • bunx biome check platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts && git diff --check -> clean
  • bun run --filter '@signet/daemon' build -> passes, with existing unrelated Svelte warnings

@NicholaiVogel
Copy link
Copy Markdown
Collaborator Author

Pushed 1f581211 for the latest reviewer finding.

What changed:

  • maxItemsPerRepo is now a single per-repo artifact budget shared across issues, pulls, discussions, docs, and comments.
  • The provider stops requesting later resource types after the repo budget is exhausted.
  • Comment indexing is bounded to the remaining artifact budget.
  • Added regressions for shared primary-resource caps and comments counting against the cap.

Validation:

  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts -> 78 pass
  • bunx biome check platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts && git diff --check -> clean
  • bun run --filter '@signet/daemon' build -> passes, with existing unrelated Svelte warnings

@PR-Reviewer-Ant please re-review the latest head.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 1f581211

I found one correctness issue in the GitHub sync purge bookkeeping: comment artifacts are written but never marked as seen for the successful repo pass, so a later successful sync can delete still-current comments.

Confidence: Medium [sufficient_diff_evidence, missing_cross_module_context] - The changed provider writes comment artifacts in writeResourceWithComments, but the sync loop only records the parent resource path in seenPaths before calling purgeStaleGitHubArtifacts for successful repos. The exact delete predicate is in the truncated portion of github-source-provider.ts, so confidence is medium rather than high, but the provider-local bookkeeping is visibly incomplete.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

const yielder = yieldEvery(5);
let repoIndexed = 0;

for (const resource of await fetchRepoResources(context.source, settings, config, repo, failures)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only marks the issue/PR/discussion/doc path as seen, but writeResourceWithComments can also write one or more source_github_comment artifacts under resourcePath(...)#comment-.... On a successful repo sync, purgeStaleGitHubArtifacts(...) receives a seenPaths set that omits every current comment, so stale cleanup can remove valid comment artifacts even though they were fetched and rewritten in this pass. The provider should add the written comment paths to the seen set, or have the write helper return the concrete paths it persisted.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6ab56267.

writeResourceWithComments now returns the concrete artifact paths it persisted, and the sync loop adds all of them to seenPaths before repo-scoped stale purge runs. That includes the top-level issue/PR/discussion/doc path plus each bounded comment path, so purge bookkeeping no longer relies on memory_artifacts.updated_at behavior to keep refreshed comments alive.

I also added a regression that forces freshly written artifacts to have an older updated_at than the sync start. Under the previous bookkeeping, the refreshed comment would be purged; with this change both the parent issue and #comment-* row survive.

Validation:

  • bun test platform/daemon/src/github-source-provider.test.ts -> 12 pass
  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts -> 79 pass
  • bunx biome check platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts && git diff --check -> clean
  • bun run --filter '@signet/daemon' build -> passes, with existing unrelated Svelte warnings

@NicholaiVogel
Copy link
Copy Markdown
Collaborator Author

Pushed 6ab56267 for the latest reviewer finding.

What changed:

  • writeResourceWithComments now returns every artifact path it persisted.
  • The sync loop adds parent and comment paths to seenPaths before repo-scoped stale purge.
  • Added a regression that forces freshly written rows to be purge candidates unless their paths are explicitly marked seen.

Validation:

  • bun test platform/daemon/src/github-source-provider.test.ts -> 12 pass
  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts -> 79 pass
  • bunx biome check platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts && git diff --check -> clean
  • bun run --filter '@signet/daemon' build -> passes, with existing unrelated Svelte warnings

@PR-Reviewer-Ant please re-review the latest head.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: 6ab56267

I found one concrete correctness bug in the GitHub discussion comment path. The provider can discover discussions, but comment indexing for them will fail against GitHub's GraphQL API as written.

Confidence: Medium [sufficient_diff_evidence, missing_cross_module_context] - The GraphQL variable passed by fetchDiscussionComments is visible in platform/daemon/src/github-source-fetch.ts and uses MAX_COMMENTS_PER_RESOURCE, which is 200. GitHub GraphQL connection page sizes are capped at 100, so this path fails whenever discussion comments are fetched. I did not have the full repository checkout available in the task cwd, so route/UI wiring was reviewed from the provided diff rather than directly inspected.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 1
  • Suppressed due to human rebuttal/out-of-scope: 0

comments { totalCount }
}
pageInfo { hasNextPage endCursor }
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This passes first: MAX_COMMENTS_PER_RESOURCE, and MAX_COMMENTS_PER_RESOURCE is 200. GitHub GraphQL connections reject first values above 100, so discussion comment indexing will fail whenever includeComments is enabled for discussions. The provider catches that as a failure artifact, but the PR claims discussion comments are indexed. Use a page size of at most 100 and paginate if the intended cap remains 200.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in fc39c33b.

fetchDiscussionComments now uses GitHub GraphQL-safe 100-item pages and paginates with pageInfo.endCursor until the existing 200-comment per-resource cap is reached. That keeps the intended overall cap without sending an invalid first: 200 connection request.

I added a regression that inspects the GraphQL variables and requires discussion comment requests to use first: 100, then follow the returned cursor for the next page.

Validation:

  • bun test platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts -> 29 pass
  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts -> 80 pass
  • bunx biome check platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts && git diff --check -> clean
  • bun run --filter '@signet/daemon' build -> passes, with existing unrelated Svelte warnings

@NicholaiVogel
Copy link
Copy Markdown
Collaborator Author

Pushed fc39c33b for the latest reviewer finding.

What changed:

  • Discussion comment GraphQL requests now use first <= 100.
  • fetchDiscussionComments paginates with pageInfo.endCursor up to the existing 200-comment per-resource cap.
  • Added a regression that verifies the request variables and cursor pagination.

Validation:

  • bun test platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts -> 29 pass
  • bun test platform/core/src/sources-config.test.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.test.ts platform/daemon/src/routes/sources-routes.test.ts surfaces/cli/src/features/sources.test.ts -> 80 pass
  • bunx biome check platform/daemon/src/github-source-fetch.ts platform/daemon/src/github-source-fetch.test.ts platform/daemon/src/github-source-provider.ts platform/daemon/src/github-source-provider.test.ts && git diff --check -> clean
  • bun run --filter '@signet/daemon' build -> passes, with existing unrelated Svelte warnings

@PR-Reviewer-Ant please re-review the latest head.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review metadata
  • Reviewer: pr-reviewer
  • Model: gpt-5.5
  • Commit: fc39c33b

I did not find a concrete correctness, security, data integrity, or breaking-change issue worth flagging from the supplied diff. The GitHub source work appears to follow the shared provider path described in the PR, with validation, bounded fetches, provenance-backed artifacts, failure artifacts, and focused tests covering the high-risk fetch and purge cases.

Confidence: Medium [sufficient_diff_evidence, targeted_context_included, missing_cross_module_context] - The supplied context includes the PR description plus substantial changed contents for core config, GitHub fetch/provider implementation, docs, and targeted tests, which was enough to review the main source-indexing behavior. The prompt did not include full changed contents for the route, CLI, dashboard, and source-provider registration files, so I could not independently verify every exposed surface beyond the PR description and test references.

Since my last review:

  • Fixed or likely addressed: 1
  • Still blocking: 0
  • Newly found: 0
  • Suppressed due to human rebuttal/out-of-scope: 0

@NicholaiVogel NicholaiVogel dismissed stale reviews from PR-Reviewer-Ant, PR-Reviewer-Ant, and PR-Reviewer-Ant May 24, 2026 19:20

Superseded by PR-Reviewer-Ant review 4353290812 on fc39c33, which found no concrete correctness, security, data integrity, or breaking-change issue and reported newly found 0 / still blocking 0.

@NicholaiVogel NicholaiVogel merged commit 48ae6da into main May 24, 2026
17 checks passed
@NicholaiVogel NicholaiVogel deleted the codex/github-source-provider branch May 24, 2026 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants