refactor(web): migrate to TanStack Query + add WS event accounting (Phase 1)#1130
refactor(web): migrate to TanStack Query + add WS event accounting (Phase 1)#1130carlosflorencio wants to merge 56 commits into
Conversation
|
Important Review skippedToo many files! This PR contains 299 files, which is 149 over the limit of 150. To get a review, narrow the scope: ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (299)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Claude finished @carlosflorencio's task in 5m 43s —— View job FindingsSuggestion (recommended, doesn't block)
Summary
Verdict: Ready with suggestions. The accounting architecture is solid — monotonic seq stamping, per-connection ring buffers, a clean |
|
Claude finished @carlosflorencio's task in 5m 15s —— View job Claude finished reviewing PR #1130 — Tasks
FindingsBlockerNone. Suggestion (4 total — 2 from prior review still open, 2 new)Prior review (still outstanding):
New from this pass: Verdict: Ready with suggestions. Findings 3 and 4 are inexpensive to fix now and will block the Phase 2 |
FindingsBlocker (must fix before merge)None. Suggestion (recommended, doesn't block)Previous review (still open — not yet addressed):
New findings from this review pass:
Summary
Verdict: Ready with suggestions. The accounting architecture is sound — monotonic seq on every outbound envelope, broadcast-clone isolation verified by test, bounded ring buffers on both sides, and a clean pure-function diff. Findings 3 and 4 are latent blockers for Phase 2's global |
|
Claude finished @carlosflorencio's task in 6m 47s —— View job Claude finished reviewing PR #1130 Tasks
FindingsBlockerNone. Suggestion (5 total — 4 carried from prior passes, 1 new)Carried from prior reviews (still not addressed):
New from this review pass:
Architecture notes on new commitsThe per-session sequencing logic (
Summary
Verdict: Ready with suggestions. Findings 1 and 4 are latent blockers for the Phase 2 |
|
Claude finished @carlosflorencio's task in 5m 50s —— View job Claude finished reviewing PR #1130 Tasks
FindingsBlockerNone. Suggestion (7 total — 5 carried, 2 new)Carried from prior reviews (all still unaddressed):
New from this review pass (
Architecture notes on
|
| Severity | Count |
|---|---|
| Blocker | 0 |
| Suggestion | 7 (5 carried + 2 new) |
Verdict: Ready with suggestions. Findings 1 and 4 are latent blockers for Phase 2's WS_ASSERT_ENABLED flip.
|
Claude finished @carlosflorencio's task in 5m 9s —— View job Tasks
FindingsBlocker (must fix before merge)New from
Suggestion (recommended, doesn't block)
Carried from prior passes (7 total — none yet addressed):
Summary
Verdict: Blocked — fix the two missing-test blockers. The |
|
Claude finished reviewing PR #1130 Tasks
FindingsBlocker (must fix before merge)New from
Suggestion (recommended, doesn't block)New from
Carried from prior review passes (all still unaddressed — 7 total):
Summary
Verdict: Blocked — fix the two missing-test blockers first. The |
|
Claude finished @carlosflorencio's task in 3m 18s —— View job Claude finished reviewing PR #1130 Tasks
FindingsBlocker (must fix before merge)Carried from prior reviews (both still unaddressed):
Suggestion (recommended, doesn't block)New from
Carried from prior reviews (all still unaddressed — 8 total):
Architecture note on
|
| Severity | Count |
|---|---|
| Blocker | 2 (carried) |
| Suggestion | 9 (1 new + 8 carried) |
Verdict: Blocked — the two missing-test blockers from c6b202a are still open. The new commit is a correct one-line fix; the only new finding is a suggestion to pin it with a regression test.
|
Claude finished @carlosflorencio's task in 8m 50s —— View job Claude finished reviewing PR #1130 Tasks
Resolved since last review pass✅ ✅ Control-plane ack false-positives — closed by both the structural ✅ ✅ ✅ FindingsBlocker (2 carried — still unaddressed)
Suggestion (2 new + 8 carried)
Summary
Verdict: Blocked — the two missing-test blockers from |
|
Claude finished reviewing PR #1130 Tasks
Resolved since last review pass✅ Finding #4 (blocker-level for Phase 2): ✅ Control-plane ack false-positives: The ✅ ✅ ✅ FindingsBlocker (must fix before merge — both carried from
|
| Severity | Count |
|---|---|
| Blocker | 2 (carried) |
| Suggestion | 10 (2 new + 8 carried) |
Verdict: Blocked — the two missing-test blockers from c6b202a remain open.
The 8f8d849 commit is architecturally sound. The dual approach (structural type skip + exhaustive static allowlist) is a solid Phase 2 foundation, and the applyIfNewer primary-session fix is a meaningful correctness improvement. New findings 3 and 4 are pre-emptive gaps that are inexpensive to close before WS_ASSERT_ENABLED=1 is flipped globally.
|
Claude finished @carlosflorencio's task in 3m 33s —— View job Tasks
FindingsBlocker (must fix before merge — both carried from
|
| Severity | Count |
|---|---|
| Blocker | 2 (carried) |
| Suggestion | 10 (9 carried + settleDrops/sinceSeq amplification noted) |
Verdict: Blocked — the two missing-test blockers from c6b202a remain open.
f429612 itself is correct and well-commented. The settleDrops polling approach is sound for absorbing genuinely in-flight events — the only new concern is that it amplifies teardown latency for the existing sinceSeq false-positive case (suggestion #1). Closing the two blockers is the prerequisite for merge.
|
Claude finished @carlosflorencio's task in 3m 9s —— View job Tasks
Claude finished reviewing PR #1130 Resolved in
|
| Severity | Count |
|---|---|
| Blocker | 2 (carried) |
| Suggestion | 10 (all carried) |
Verdict: Blocked — the two missing-test blockers from c6b202a remain open.
d86e613 is a correct and well-targeted fix. The method-spy approach for wrapBridgeHandler is the right solution to the false-negative problem: it captures handler intent (the call to invalidateQueries) independently of whether matching cache entries exist. The implementation is safe because all bridge handlers are synchronous — the spy count is captured before async work starts, and the method restoration in finally runs after the synchronous call completes.
|
Claude finished @carlosflorencio's task in 4m 38s —— View job Tasks
Claude finished reviewing PR #1130 Resolved in
|
| Severity | Count |
|---|---|
| Blocker | 2 (carried) |
| Suggestion | 11 (1 new + 10 carried) |
Verdict: Blocked — the two missing-test blockers from c6b202a remain open. The three new commits are all correct: the mergeAuthoritativeFields expansion is well-reasoned and tested, the E2E spec rewrite is sound, and enabling strict-mode accounting in CI is the right milestone to land with this PR.
Additive D4+D6 Stage 1 infra: qk.taskSession.byId key, sessionByIdQueryOptions (observe-only), bridge/session-state.ts mirroring session.state_changed / agentctl_starting / agentctl_ready / agentctl_error into TQ by-id + by-task caches, SSR seed in StateHydrator, and D6 Zustand side-effects (env mapping + prepare-progress) kept populated in parallel during the staged migration. Zustand slices are not deleted yet (Phase A additive).
…nd mirror Remove `worktrees` / `sessionWorktreesBySessionId` Zustand slices and their setters (`setWorktree` / `setSessionWorktrees`). Rewrite `useSessionWorktrees` and `useWorktree` to derive worktree view-models from `qk.taskSession.byId` (worktree_id / worktree_path / worktree_branch fields), eliminating the agentctl_ready WS handler that kept the mirror in sync.
…us flicker, health poll)
…status, empty-turn notice)
…rminal on switch)
a6595c3 to
50ee027
Compare
|
Claude finished @carlosflorencio's task in 7m 26s —— View job Claude finished reviewing PR #1130 Tasks
Resolved in this batch✅ Voice-mode TQ migration ( ✅ Idle-input + terminal-busy guards ( FindingsBlocker (must fix before merge)
Suggestion (recommended, doesn't block)
Carried from prior reviews (3 suggestions, all still unaddressed):
Summary
Verdict: Blocked — finding #1 ( |
| const routine = useAppStore((s) => | ||
| run.routine_id ? s.office.routines.find((r) => r.id === run.routine_id) : undefined, | ||
| ); | ||
| const candidates = useTaskCandidates(); |
There was a problem hiding this comment.
useTaskCandidates() is the wrong data source for a display component — tasks outside the 50-result window silently show "Task"
useTaskCandidates() calls searchTasks(wsId, "", 50) — a picker endpoint capped at 50 results. The hook's own docstring says: "The pickers don't need the full paginated/filtered list." LinkedEntity is not a picker — it's a display cell that resolves any historical run's linked task. Any run referencing a task ranked >50 in the search will silently fall back to displaying "Task" (line 194 in the current file) instead of the task's identifier and title.
The original code used s.office.tasks.items which was populated from listTasks (the full workspace task list). Replace with the equivalent TQ read:
const workspaceId = useAppStore((s) => s.workspaces.activeId);
const { data: allTasks = [] } = useQuery({
...officeQueryOptions.tasks(workspaceId ?? ""),
enabled: !!workspaceId,
});
const task = run.task_id ? allTasks.find((t) => t.id === run.task_id) : undefined;officeQueryOptions.tasks(wsId) is already subscribed by the tasks list page, so this is a zero-cost cache read on any page that mounts after the office layout has loaded. Drop the useTaskCandidates import from this file.
| n := v.(*atomic.Int64).Add(-1) | ||
| if n <= 0 { | ||
| h.sessionSubscriberCounts.Delete(sessionID) | ||
| h.sessionSeqs.Delete(sessionID) |
There was a problem hiding this comment.
Seq counter deleted when subscriber count → 0 but focus clients still receive session events
decSessionSubscribers deletes sessionSeqs when the ref-count reaches zero. But BroadcastToSession routes events to focus clients AND subscribed clients (see getSessionRecipientsLocked). If a client is focused but has unsubscribed (the "subscriber count churns to 0 during task-switch/resume" case documented in the getSessionRecipientsLocked comment), the seq counter is deleted here, then lazily re-created at 0 in nextSessionSeq. The first post-deletion event arrives with session_seq: 1, but the FE's WsAccountBucket already holds entries with higher seqs — producing a false gap that KANDEV_E2E_WS_ASSERT=1 will flag.
Fix: only delete the seq counter when no focus clients remain for the session. Since decSessionSubscribers is called outside the hub lock, the simplest approach is to skip the sessionSeqs.Delete here and instead delete it from recomputeSessionMode once both subscriber count and focus client count are zero:
if n <= 0 {
h.sessionSubscriberCounts.Delete(sessionID)
// sessionSeqs is deleted from recomputeSessionMode once both
// subscriber count AND focus-client count are zero — deleting here
// would reset the seq for focus-only recipients, producing false gaps.
}Then in recomputeSessionMode (where you already hold h.mu), after confirming both maps are empty for sessionID, call h.sessionSeqs.Delete(sessionID).
| const agent = agents.find((a) => a.id === id); | ||
| const activeSlug = activeSlugFromPath(pathname, id); | ||
|
|
||
| if (isPending) { |
There was a problem hiding this comment.
isPending on a disabled query shows an infinite spinner when workspaceId is empty
When workspaceId is null/"", the query is enabled: false. TanStack Query sets isPending: true for queries that are disabled and have never fetched (the query has no cached data). The if (isPending) guard therefore renders the spinner forever for any user whose workspaceId is not yet available.
Use isLoading instead (isPending && fetchStatus === "fetching") or gate on workspaceId:
| if (isPending) { | |
| if (workspaceId && isPending) { |
isLoading is equivalent and safe here because it's false when the query is disabled.
| // to all-off when the backend is unreachable. See | ||
| // docs/decisions/0007-runtime-feature-flags.md. | ||
| const ssrQueryClient = makeQueryClient(); | ||
| await ssrQueryClient.prefetchQuery(featuresQueryOptions()); |
There was a problem hiding this comment.
Add retry: false for the SSR feature-flag prefetch
TanStack Query's default retry: 3 applies to SSR prefetchQuery calls. If the backend is unreachable, this blocks layout rendering for multiple retry attempts before surfacing the default feature flags. Feature flags should fail fast to defaults during SSR so a backend blip doesn't degrade page load time.
| await ssrQueryClient.prefetchQuery(featuresQueryOptions()); | |
| await ssrQueryClient.prefetchQuery({ ...featuresQueryOptions(), retry: false }); |

Summary
Supersedes #1052. Bundles the TanStack Query server-state migration (waves 0–5) with the first phase of WebSocket event accounting so the two ship as one coherent reliability unit. The migration alone introduces a parallel write path that can silently desync from Zustand during the transition; the accounting alone leaves the FE on a Zustand model we're trying to retire. Together they deliver "every server-state event the backend emits is observably received, parsed, and applied to the FE cache."
Important Changes
b95b168b) — server state for features, comments, integrations-shell, automations, workspace, settings, jira, linear, gitlab, github, kanban, office, session, session-runtime, and session-runtime-streams now flows throughlib/query/{keys,query-options,bridge}with a WS → query bridge so partially-migrated screens keep working during the transition.seq+connection_idon every outbound WS envelope, bounded backend ring buffer (wsSentLog, 5000 entries), FE receive-sideWsAccountring buffer (5000), a new/api/v1/e2e/ws-sentendpoint, and a Playwright fixture hook that diffs "what the backend sent" vs "what the FE processed" at end-of-test. Strict mode viaKANDEV_E2E_WS_ASSERT=1turns any drop into a test failure that names the missing event.docs/specs/ws-event-accounting/plan.md(force-added sincedocs/specs/*/plan.mdis gitignored) describes the per-session sequencing, ack channel + production metrics, bridge-handler audit (wrapBridgeHandler+BRIDGE_SKIPPED_ACTIONS), and audit/migration workstreams that finish the reliability guarantee.Validation
pnpm --filter @kandev/web typecheck— pending re-run in CI (localpnpm installhit a half-broken state with the rebased lockfile; CI runs in a clean container).pnpm --filter @kandev/web lint --max-warnings 0— pending CI.pnpm --filter @kandev/web test— Phase 1 adds 23 unit tests forWsAccount/detectGaps; bridge tests fromb95b168bcover the WS → query mirroring.go test ./internal/gateway/websocket/... ./cmd/kandev/...— covers seq monotonicity, broadcast clone isolation, ring-buffer eviction,/api/v1/e2e/ws-senthappy + error paths.KANDEV_E2E_WS_ASSERT=1left off by default in CI; Workstream 3 of the Phase 2 plan flips it on after the bridge instrumentation and per-session work land.Possible Improvements
The accounting layer currently records at the parse seam (
handleParsedMessage), which catches "envelope never arrived" but not "envelope arrived and didn't mutate state." Phase 2'swrapBridgeHandlercloses that gap by asserting that every observed WS event with asession_idcorresponds to a bridge audit entry withcacheChanged === true(or is on the documented Zustand-only allowlist). Without that follow-up, a bridge wiring bug — wrong query key, unmounted slice — can still silently break a UI screen during the migration window. Phase 2 is scoped indocs/specs/ws-event-accounting/plan.md.Checklist
Related issues
Supersedes #1052.
Preview Environment
50ee027