From 59c72ca4928a46fbb109ab9eed22b37b16c349d8 Mon Sep 17 00:00:00 2001
From: Cursor Agent <cursoragent@cursor.com>
Date: Wed, 3 Jun 2026 05:20:56 +0000
Subject: [PATCH] docs(specs): add prioritized plan for highest-value designs

Co-authored-by: glassBead <glassBead-tc@proton.me>
---
 .specs/PRIORITIZED-IMPLEMENTATION-PLAN.md | 490 ++++++++++++++++++++++
 1 file changed, 490 insertions(+)
 create mode 100644 .specs/PRIORITIZED-IMPLEMENTATION-PLAN.md

diff --git a/.specs/PRIORITIZED-IMPLEMENTATION-PLAN.md b/.specs/PRIORITIZED-IMPLEMENTATION-PLAN.md
new file mode 100644
index 00000000..3825a0f7
--- /dev/null
+++ b/.specs/PRIORITIZED-IMPLEMENTATION-PLAN.md
@@ -0,0 +1,490 @@
+# Prioritized Implementation Plan — Highest-Value Designs in `.specs/`
+
+> **Status**: Proposal (review artifact)
+> **Created**: 2026-06-03
+> **Author**: Cloud agent review of the `.specs/` corpus (171 files)
+> **Scope**: This document does **not** change product behavior. It reviews the
+> spec corpus, selects the highest-value ideas, and proposes a sequenced plan
+> with explicit reasoning for each choice. It is intended as input to the HDD
+> lifecycle (`.adr/staging/`) — each accepted initiative below should graduate
+> into its own staging ADR + spec pair before implementation.
+
+---
+
+## 1. Purpose
+
+The `.specs/` directory contains a large, valuable, but uneven body of design
+work: ~171 files spanning ~30 spec suites. Some specs are already implemented,
+some are half-built, many are prose-only research, and a few are stale relative
+to the current Code Mode architecture.
+
+This plan answers one question: **if we want the most product and trust value
+per unit of implementation risk, what should we build next, and why?**
+
+It is deliberately opinionated. Selection reasoning is given for every pick, and
+— just as importantly — for the high-profile ideas that are **deliberately
+deferred**.
+
+---
+
+## 2. Grounding: what Thoughtbox is meant to be
+
+Choices below are anchored to settled product intent
+(`.specs/product-shape/PRODUCT-INTENT-AND-DIVERGENCE.md`) and to verified
+production reality (`.specs/production-overview/PRODUCTION-SYSTEM-MAP.md`):
+
+- **Thoughtbox is a reasoning workstation for agents.** The primary user is an
+  autonomous agent, not a human.
+- **The core promise is durable, versioned reasoning** over long, complex
+  problems — a "Git-like reasoning repository."
+- **Code Mode is the public interaction model.** The live MCP surface is exactly
+  three tools: `thoughtbox_search`, `thoughtbox_execute`, `thoughtbox_peer_notebook`
+  (verified: `src/server-factory.ts:581-583`).
+- **North star**: a single mandatory, intelligent control plane through which
+  agents act, plus "stateless ML" — the environment learns to shape itself for
+  future agents between sessions.
+- **The web app should make multi-agent reasoning legible** as a structure that
+  grows visibly.
+
+A recurring pattern in this corpus, confirmed by code inspection, is that
+**backends are frequently ahead of their surfaces**: the audit engine, knowledge
+graph, evaluation harness, hub profiles, and peer-notebook control plane all have
+working server-side code that is not yet exposed, wired, or made legible. The
+cheapest high-value wins therefore tend to be *finishing the last mile* of
+already-built capabilities, not greenfield builds.
+
+---
+
+## 3. Selection rubric
+
+Every candidate was scored on four axes. The reasoning column in each initiative
+references these.
+
+| Axis | Question | Why it matters |
+| --- | --- | --- |
+| **Intent fit** | Does it advance the settled product center of gravity (durable reasoning, Code Mode, legibility, control-plane learning)? | Avoids polishing accidental surfaces. |
+| **Leverage on existing code** | Is the hard part already built and merely unexposed/unwired? | Last-mile work is the cheapest, lowest-risk value. |
+| **Friction / risk** | How invasive is the diff? New subsystem vs wiring/copy? | Lower friction ships sooner with fewer regressions. |
+| **Trust / correctness floor** | Does it fix data loss, security exposure, or actively misleading agent guidance? | Correctness and safety dominate feature polish. |
+
+**Decision rule**: prioritize high *intent fit* × high *leverage* × low *friction*,
+and treat any **trust/correctness floor** issue (data loss, security, misleading
+agents) as non-negotiable and top of queue regardless of feature appeal.
+
+---
+
+## 4. Corpus landscape at a glance
+
+What the field looks like after cross-referencing each suite against `/workspace/src`:
+
+| Suite / idea | Headline status | Notes |
+| --- | --- | --- |
+| Code Mode public surface | **Shipped** | 3-tool surface live; discoverability copy partly stale. |
+| Protocol tools (Theseus/Ulysses) | **Shipped via `tb.*`** | Handlers + HTTP enforcement + Supabase live. |
+| Auditability (AUD-001..010) | **Backend ahead of UI** | Structured fields, gap/critique engine exist; UI partial. |
+| Cognitive harness (CHX-01..11, HARNESS T1-T3) | **Mostly absent / partial** | Several are docs+schema-only changes. |
+| Knowledge graph | **Backend live, no web UI** | Populated via MCP, invisible in app. |
+| Peer notebooks (ADR-022) | **Control plane live, runtime mocked** | Durable plane done; lifecycle + inspection + real runtime absent. |
+| Notebooks / Agentic Runbooks | **Scaffold + stub runtime** | Domain model real; execution in-memory stub. |
+| Eval system (SPEC-EVAL-001) | **Coded, loop open** | LangSmith path real; gatekeeper pass-through, no baselines. |
+| Sleep-time / evolution-check | **Prompt + queue exist, classifier absent** | Worker skeleton present. |
+| Hub roles (HUB-002) | **Phase 1 done, priming unwired** | `getProfilePriming` used only in tests. |
+| Srcbook port (SRC-001..006) | **Mostly not started** | SRC-006 is a data-loss fix; rest is notebook DX. |
+| Observability sidecar (OBS-001) | **Not started** | Infra-heavy; partial `tb.observability()` exists. |
+| Security audits | **Partially remediated, open gaps** | Unauthenticated hub/SSE paths still open. |
+| Canonical IR / TBX-C1, RLM, MAP-Elites, Unified Autonomy Loop | **Prose / research** | High ambition, high friction, speculative. |
+
+---
+
+## 5. The plan — four waves
+
+Initiatives are grouped into waves by dependency and risk, not calendar time.
+Within each wave they are roughly ordered by priority.
+
+### Wave 0 — Trust & safety floor (do first, non-negotiable)
+
+These are not features; they are correctness and security obligations for a
+product whose entire value proposition is *durable, trustworthy* reasoning.
+
+---
+
+#### 0.1 — Lock down unauthenticated workspace surfaces
+
+- **Source**: `.specs/workspace-isolation-audit.md`, `.specs/security/identity-binding-audit.md`
+- **Idea**: The local/non-multi-tenant HTTP surfaces accept client-controlled
+  workspace identity with no auth.
+- **Why high value (reasoning)**: Trust/correctness floor. Verified directly:
+  `GET /events` defaults `workspace_id` to `"*"` (`src/http/event-stream.ts:39-44`)
+  and `POST /hub/api` reads `agentId` from the request body with no
+  authentication (`src/http/hub-http.ts:12-37`). These mount in non-multi-tenant
+  mode; any deployment that binds `0.0.0.0` without the multi-tenant path leaks
+  cross-workspace hub state and event streams. For a product selling auditable,
+  isolated reasoning, an "any workspace" wildcard is a credibility-ending bug.
+- **Status**: Open. `/mcp` and OTLP are correctly scoped; hub/SSE are not.
+- **Scope**: Require an auth/workspace assertion on `/hub/api` and `/events`;
+  reject the `"*"` wildcard from untrusted callers; align with the `/mcp`
+  multi-tenant auth pattern. Add regression tests.
+- **Friction**: Low–Medium. Localized to `src/http/*` and mount conditions in
+  `src/index.ts`.
+- **Companion fixes (same wave, from the identity audit)**: OAuth callback open
+  redirect via unvalidated `next` (`apps/web/src/app/api/auth/callback/route.ts`);
+  finish the C6 workspace-scoping sweep across knowledge/persistence/otel
+  handlers (a `WorkspaceScopedClient` wrapper) rather than relying on per-call
+  `.eq("workspace_id", …)` discipline; defend `updateProfileAction` at the action
+  layer.
+- **Acceptance**: No unauthenticated path accepts a client-supplied or wildcard
+  workspace id; tests prove cross-workspace reads/writes are rejected.
+
+---
+
+#### 0.2 — Session recovery via MCP root (data-loss prevention)
+
+- **Source**: `.specs/SPEC-SRC-006-session-recovery-via-mcp-root.md`
+- **Idea**: Bind sessions to the MCP root URI so `load_context`/resume without a
+  `sessionId` auto-recovers the most recent session for that project directory.
+- **Why high value (reasoning)**: Trust/correctness floor + maximum intent fit.
+  The core product promise is *durable* reasoning. Today, when an MCP client
+  session times out (~15 min idle), the agent loses the session id and the
+  thought chain is effectively orphaned — the agent must remember an id it has no
+  reliable way to hold. Verified: `Session`/`SessionFilter` carry no
+  `mcpRootUri` field (`src/persistence/types.ts`, grep: 0 matches). This is a
+  small server-only diff that directly protects the thing the product exists to
+  protect.
+- **Status**: Not implemented.
+- **Scope**: Add `mcpRootUri` to the session model and storage filter; persist it
+  on session create from the bound root; add auto-recovery-by-root to the
+  resume/load path.
+- **Friction**: Medium, server-only. No web or infra changes.
+- **Acceptance**: After a simulated client timeout, an agent resumes the correct
+  session by root with no id supplied; wrong-root never recovers another
+  project's session.
+
+---
+
+### Wave 1 — Cheap correctness & ergonomics on the shipped surface
+
+These make the **already-public Code Mode surface** stop misleading agents and
+stop charging token tax. All are low-friction, high-leverage.
+
+---
+
+#### 1.1 — Code Mode discoverability cleanup
+
+- **Source**: `.specs/code-mode/cleanup-review-notes.md`, `.specs/code-mode/target-state.md`
+- **Idea**: Purge stale `init` / `gateway` / `THOUGHTBOX_PROJECT` / `tb.hub`
+  guidance from the content that `thoughtbox_search` ingests and from resource
+  text agents read.
+- **Why high value (reasoning)**: Intent fit + trust. The public surface already
+  shipped, but discoverability still teaches dead workflows
+  (`src/resources/server-architecture-content.ts` references `THOUGHTBOX_PROJECT`;
+  catalogs reference removed `init`). Misleading guidance on a live agent-facing
+  surface causes failed tool calls and wasted context — pure negative value that
+  costs almost nothing to remove.
+- **Status**: Partial. `init`/`gateway` tools are unregistered, but their copy
+  lingers.
+- **Scope**: Edit resource/prompt/search-index content; align example payloads
+  with the real `tb.*` SDK; remove `tb.hub` references the surface test already
+  asserts are absent.
+- **Friction**: Low (copy + metadata).
+- **Acceptance**: No discoverability content references unregistered tools or env
+  vars; search results describe only the real surface.
+
+---
+
+#### 1.2 — Finish the project-scoping contract (ADR-013)
+
+- **Source**: `.specs/knowledge-storage-project-scoping.md`
+- **Idea**: Resolve project scope from `workspaceId` / MCP roots at runtime;
+  remove the `THOUGHTBOX_PROJECT` startup path and the silent `"default"`
+  fallback; make pre-scope operations fail loudly with accurate messages.
+- **Why high value (reasoning)**: Trust/correctness + intent fit (multi-tenant).
+  The dangerous failure mode is silent: a mis-scoped session returns an **empty
+  knowledge graph** instead of an error, so agents conclude "nothing is known"
+  when the data is simply in another scope. `resolveProject()` already exists in
+  `server-factory.ts`; the work is removing fallbacks and fixing error copy that
+  still says "Call bind_root or start_new" (no longer public tools).
+- **Status**: Partial.
+- **Scope**: Remove `THOUGHTBOX_PROJECT`/`default` fallbacks; correct error
+  strings; confirm Supabase construction-time scoping is consistent.
+- **Friction**: Medium, mostly wiring/copy.
+- **Acceptance**: A mis-scoped operation errors loudly; no silent empty-graph
+  result; no references to removed tools in error text.
+
+---
+
+#### 1.3 — Cognitive-harness quick wins (CHX-01, HARNESS-T2 defaults, CHX-03, CHX-07/08)
+
+- **Source**: `.specs/cognitive-harness-improvements/*`, `.specs/harness-optimization/SPEC-HARNESS-T2.md`, `.specs/SPEC-CHX-001-…md`
+- **Idea**: A bundle of small ergonomics/correctness fixes derived from a real
+  146-thought session:
+  - **CHX-01 + T2 defaults**: stop documenting/requiring fields the server
+    already auto-assigns (`thoughtNumber`) or defaults (`thoughtType`,
+    `nextThoughtNeeded`). The handler already auto-numbers and defaults
+    `thoughtType` to `reasoning`; only the Zod schema/SDK/docs disagree.
+  - **CHX-03 mid-session recall**: expose `getThought(N)` / `recent(n)` /
+    `searchWithin` on `tb.session`. `ThoughtboxStorage.getThought()` already
+    exists; only the SDK wiring is missing.
+  - **CHX-07/08 deliberation + typed audit**: allow a `decision_frame` to record
+    "leaning toward X" without forcing exactly one selection, and route gap
+    detection by `sessionType` so research sessions stop getting false
+    "decision-without-action" flags. Verified: the handler hard-requires exactly
+    one `selected: true` (`src/thought-handler.ts:482-485`) and `detectGaps`
+    hardcodes the decision→action window (`src/audit/manifest-generator.ts:178-195`).
+- **Why high value (reasoning)**: Leverage + intent fit. These are the friction
+  points agents hit *while doing the core activity*. The hard logic is already in
+  the handler/storage; the changes are schema, SDK surface, and audit routing.
+  CHX-02's estimate alone is ~17k tokens saved per long session. The false audit
+  flags actively erode trust in the audit engine we want agents to rely on.
+- **Status**: Partial/absent at the SDK & schema layer; foundations exist.
+- **Scope**: Schema relaxations + SDK helpers + audit routing + doc/example
+  updates. Each item is independently shippable.
+- **Friction**: Low–Medium per item; no new subsystems.
+- **Acceptance**: Required fields drop where the server already supplies them;
+  `tb.session.getThought/recent/searchWithin` work; deliberating decision frames
+  persist; research sessions no longer raise action-gap false positives.
+- **Note**: CHX-04/11 (structured subagent attachment) are **excluded** here —
+  high impact but high friction (multi-agent runtime contract). Defer to Wave 3
+  thinking.
+
+---
+
+#### 1.4 — Citation fields on thoughts
+
+- **Source**: `.specs/architecture-exa-composition-and-thoughtbox-upgrades.md` (punch-list #2)
+- **Idea**: Add `citesThoughtIds` / `citesEntityIds` to the thought schema.
+- **Why high value (reasoning)**: Intent fit + leverage. The flagship vision is a
+  *versioned reasoning graph*. Citations are the edges that make reasoning
+  provenance a real DAG instead of a flat list, and they are the prerequisite for
+  later groundedness tooling and the "merge evidence" model. It is a small,
+  additive schema change with no new subsystem — a foundational primitive bought
+  cheaply now.
+- **Status**: Not started.
+- **Scope**: Schema + validation in `src/thought/tool.ts` / handler; persist and
+  surface in export/UI.
+- **Friction**: Low.
+- **Acceptance**: Thoughts can reference prior thoughts/entities; citations
+  round-trip through storage and export.
+
+---
+
+### Wave 2 — Make built capabilities legible (product surface)
+
+The product intent says the web app should make reasoning structure *visible*.
+Several powerful backends are currently invisible. These waves turn existing data
+into product.
+
+---
+
+#### 2.1 — Surface the audit engine in the web session UI
+
+- **Source**: `.specs/auditability/SPEC-AUD-003`, `SPEC-AUD-004`; supported by `SPEC-AUD-001/010`
+- **Idea**: Render the audit manifest (decision/action gaps, unaddressed
+  critiques, action blast-radius, confidence trail) that is already generated at
+  session close — plus a dedicated Actions view and inline critique display.
+- **Why high value (reasoning)**: Maximum leverage. `src/audit/manifest-generator.ts`
+  already computes gaps, critique overrides, and action stats at session close,
+  and `critique` is already passed through web view-models — but **nothing
+  renders it**. This is the difference between "we have an audit engine" and
+  "agents and operators can actually answer *where did it go wrong?*" The web app
+  is already ahead of Observatory on filters and the fork rail, so it is the right
+  home.
+- **Status**: Backend ~50%+, UI ~15%. Observatory structured rendering (AUD-010)
+  is ~90% done and can be the reference.
+- **Scope**: Web components reading already-persisted audit data; Actions list
+  with decision-proximity linking; collapsible critique sections with a
+  "not addressed" badge.
+- **Friction**: Medium, mostly UI on existing data.
+- **Acceptance**: A finished session shows its audit manifest, action log, and
+  unaddressed critiques in the web UI without new backend work.
+
+---
+
+#### 2.2 — Knowledge Graph web UI
+
+- **Source**: `.specs/monorepo-merge/SPEC-KNOWLEDGE-GRAPH-UI.md`, `INTEGRATION-MAP.md`
+- **Idea**: A `/w/[slug]/knowledge` page rendering `entities`/`relations` as an
+  interactive graph with an observations detail panel.
+- **Why high value (reasoning)**: Intent fit (the "graph that grows" vision) +
+  leverage. The knowledge graph is written live via MCP
+  (`src/knowledge/supabase-storage.ts`) and workspace RLS policies already exist,
+  but the graph is **invisible** in the product — the single most under-marketed
+  capability. This is read-only UI over existing, already-secured data.
+- **Why after 2.1**: 2.1 reuses an existing engine with near-zero new surface;
+  2.2 adds a new page and a graph dependency (Cytoscape), slightly more.
+- **Status**: Not started in web; backend live.
+- **Scope**: Web page + RLS-scoped reads + Cytoscape. **Security note**: every
+  query must enforce `workspace_member_access` — treat as part of Wave 0's
+  isolation discipline.
+- **Friction**: Medium (~8 web files + dep).
+- **Acceptance**: A workspace's KG renders interactively, scoped strictly to that
+  workspace.
+
+---
+
+#### 2.3 — Wire hub profile priming into the thought path (HUB-002 Phase 3)
+
+- **Source**: `.specs/SPEC-HUB-002-hierarchical-agent-roles.md`
+- **Idea**: Inject `getProfilePriming()` into the thought/branch response path so
+  agent roles (MANAGER/ARCHITECT/DEBUGGER/…) become behavioral, not documentary.
+- **Why high value (reasoning)**: Leverage + intent fit (multi-agent Hub). The
+  registry, `register`/`whoami`, priming builder, and tests already exist
+  (`src/hub/*`); `getProfilePriming` is currently **only called in tests**.
+  Wiring it into the runtime is a bounded change that turns a built-but-dormant
+  feature into the scaffolding that makes multi-agent teams adhere to process via
+  infrastructure rather than memory.
+- **Status**: Phase 1 done; Phase 3 unwired.
+- **Scope**: Hub/thought-handler integration; optional per-profile critique
+  validation.
+- **Friction**: Medium, contained to `src/hub` + thought handler.
+- **Acceptance**: A registered profile measurably changes thought-loop priming at
+  runtime, not just in tests.
+
+---
+
+#### 2.4 — Coherent "one run" view (SHIP-CHECKLIST / Usability cutline P0)
+
+- **Source**: `.specs/thoughtbox-v1-finalstretch/SHIP-CHECKLIST.md`, `SPEC-USABILITY-CUTLINE.md`
+- **Idea**: A single work-period view joining reasoning thoughts + OTEL tool
+  events + files changed + cost for one real coding session.
+- **Why high value (reasoning)**: Intent fit (legibility) + leverage. Run binding
+  (`runs` table), OTLP ingestion, auto-bind, and the session detail page already
+  exist and join the data; the remaining work is an end-to-end real-run
+  verification pass plus explicit empty/error states. This is the lowest-friction
+  path to a *demoable* product narrative ("here is exactly what the agent did").
+- **Status**: ~70% per checklist.
+- **Scope**: E2E verification with a fresh real run; explicit failure/empty states
+  when OTEL or session data is missing.
+- **Friction**: Low–Medium (verification + UX polish).
+- **Acceptance**: A fresh Claude Code work period renders as one coherent,
+  unambiguous run view, including graceful degradation.
+
+---
+
+### Wave 3 — Strategic bets toward the north star
+
+Higher friction, higher ambition, but each builds on a real foundation rather
+than starting from prose. These should each get their own staging ADR.
+
+---
+
+#### 3.1 — Close the sleep-time learning loop (automate evolution-check)
+
+- **Source**: `.specs/agent-governance-substrate/SPEC-THOUGHTBOX-SLEEP-TIME.md`, `SPEC-EVOLUTION-CHECK-GENERALIZED.md`
+- **Idea**: Run the evolution-check classifier asynchronously in
+  `supabase/functions/process-thought-queue` so prior thoughts/knowledge are
+  revised between sessions — the concrete instance of "stateless ML / the
+  environment learns."
+- **Why high value (reasoning)**: Highest intent fit (this *is* the north star)
+  with surprisingly contained scope. The queue, the worker skeleton, and the
+  evolution-check prompt content already exist; what is missing is the classifier
+  wiring (~150–250 lines). It converts the most strategic product thesis from
+  prose into a running loop.
+- **Status**: Scaffolding only; classifier absent.
+- **Scope**: Implement the classifier in the queue worker; apply UPDATE/NO_UPDATE
+  revisions; guard with idempotency and cost limits.
+- **Friction**: Medium; depends on Anthropic API in edge functions.
+- **Acceptance**: New significant thoughts trigger durable, bounded revision of
+  prior reasoning, observably and idempotently.
+
+---
+
+#### 3.2 — Peer notebooks: manifest lifecycle + web inspection (ADR-022 Parts 2 & web)
+
+- **Source**: `.specs/mcp-peer-notebooks/PROMPT-PART-2-MANIFEST-LIFECYCLE.md`, `SPEC-CONTROL-PLANE.md`, `NEXT-IMPLEMENTATION-HANDOFF.md`
+- **Idea**: Compile `peer.manifest.json` from real notebook sources (parse-only),
+  add explicit draft→approve→activate→retire transitions, enforce the active hash,
+  and add read-only web views over the existing Supabase peer tables.
+- **Why high value (reasoning)**: Intent fit (control plane + notebooks-as-peers)
+  + leverage. The durable control plane (Part 1) is **shipped**: broker,
+  persistence, trace/artifact readback, and rejection of non-active manifests all
+  exist and are tested. The next units are the governance boundary (manifest
+  lifecycle) and legibility (web inspection) — both build directly on shipped
+  infrastructure without requiring the hard, deferred parts (real runtime,
+  smolvm isolation).
+- **Status**: Part 1 done; Part 2 + web inspection absent; runtime still mocked.
+- **Scope**: Manifest compiler wiring + lifecycle transitions + broker guards
+  (Part 2); Next.js read-only peer routes (invocations, denied-outbound traces,
+  artifacts).
+- **Friction**: Medium each; **explicitly excludes** the real runtime
+  (`thoughtbox-s7f`) and smolvm isolation (`thoughtbox-vdw`), which are the
+  high-friction tail.
+- **Acceptance**: A notebook can be graduated to a governed peer only via explicit
+  manifest approval; peer activity is inspectable in the web app.
+- **Process note**: Follow `peer-notebook-delivery-guard` skill — no mock
+  substitution for acceptance evidence.
+
+---
+
+## 6. Deliberately deferred (and why)
+
+Choosing what *not* to do is half of prioritization. These ideas are valuable but
+fail the rubric for *next* — almost always on friction/risk or speculativeness,
+not on ambition.
+
+| Idea | Source | Why deferred |
+| --- | --- | --- |
+| **Canonical IR + TBX-C1 codec** | `SPEC-CORE-002` (layers B–D) | Large new persistence + codec layer with dual-write risk; the pragmatic near-term slice (the `tb` authoring API) is already shipped. High friction, low marginal user value now. |
+| **Standalone `rlm_repl` tool** | `SPEC-RLM-001` | Explicitly superseded by `SPEC-NOTEBOOK-RLM` (durable notebooks over an ephemeral tool); zero code exists. Build the notebook-native version later if needed. |
+| **Agentic Runbooks full engine + Cloud Run runner** | `.specs/agentic-runbooks.md` | Domain model is real but the runtime is an in-memory stub; the full engine is a large async-execution build. Peer-notebook lifecycle (3.2) is the higher-leverage notebook investment first. |
+| **Srcbook preview lifecycle (SRC-001/002/003)** | `SPEC-SRC-001..003` | Arbitrary process spawn is a HIGH security surface and a notebook-DX feature, not core to durable reasoning. SRC-006 (data loss) is pulled forward; the rest waits. |
+| **OBS-001 sidecar stack (Prometheus/Grafana)** | `SPEC-OBS-001` | Infra-heavy operator tooling; the agent-facing `tb.observability()` already covers health/sessions. Operator dashboards are valuable but not the agent-user center of gravity. |
+| **MAP-Elites + Unified Autonomy Loop** | `quality-diversity/*`, `unified-autonomy-loop/*` | Mostly prose; depends on a `research-workflows/workflows.db` and orchestration that do not exist. ADR-017 itself marks the loop folder non-authoritative. High ambition, very high friction; revisit after 3.1 proves the simplest learning loop. |
+| **Eval flywheel closure (SPEC-EVAL-001 loop)** | `SPEC-EVAL-001`, eval strategy | Infrastructure is coded but the gatekeeper is pass-through and there are no baselines/datasets. Valuable, but needs dataset engineering investment; pairs naturally with 3.1 once a learning signal exists. |
+| **Theseus v0.2 reference monitor** | `theseus-improvements.md` | v1 protocol works; v0.2 is a large security/governance build (LLM tollbooth, expiring visas, Supabase write mediator). Defer until protocol usage justifies it. |
+| **Connection tracking / Failure state machine** | `SPEC-CONNECTION-TRACKING`, `SPEC-FAILURE-STATE-MACHINE` | New tables + lifecycle hooks; partly overlaps the cheaper "one run view" (2.4). Do 2.4 first, then reassess whether connection granularity is still needed. |
+| **Control-plane CI gate** | `unified-autonomy-control-plane` | Worth doing (generator/checker exist; just add `check:control-plane` to CI) — a cheap governance win, but it is repo-tooling rather than product. Fold into Wave 1 housekeeping if capacity allows. |
+
+---
+
+## 7. Sequencing and dependencies
+
+```text
+Wave 0  (safety/correctness floor)
+  0.1 isolation hardening ─────────────┐ (also gates 2.2 KG UI security)
+  0.2 session recovery by MCP root      │
+                                        │
+Wave 1  (cheap correctness on shipped surface)
+  1.1 discoverability cleanup           │
+  1.2 project-scoping contract ─────────┘
+  1.3 harness quick wins
+  1.4 citation fields ───────────────► enables future provenance/merge-evidence
+                                        │
+Wave 2  (legibility of built backends)
+  2.1 audit UI         (reuses audit engine)
+  2.2 knowledge graph UI   (needs 0.1 isolation discipline)
+  2.3 hub priming wiring
+  2.4 coherent run view
+                                        │
+Wave 3  (strategic, ADR each)
+  3.1 sleep-time evolution loop  ──► later pairs with eval flywheel
+  3.2 peer-notebook lifecycle + web inspection (builds on shipped Part 1)
+```
+
+Independence: every Wave 0/1 item is independently shippable. Wave 2.2 should not
+ship before 0.1's isolation discipline is in place. Wave 3 items each warrant a
+staging ADR before code.
+
+---
+
+## 8. Recommended immediate next step
+
+Per repo workflow (`AGENTS.md` → HDD is mandatory for architectural decisions and
+new features), the next action is **not** to start coding. It is to graduate the
+Wave 0 and Wave 1 picks into staging ADR + spec pairs in `.adr/staging/`, since
+they are the highest-confidence, lowest-friction value and several are correctness
+obligations. Wave 0.1 (isolation) and 0.2 (session recovery) should be staged
+first.
+
+---
+
+## 9. Sources reviewed
+
+This plan synthesizes the full `.specs/` corpus. Primary anchors:
+`product-shape/PRODUCT-INTENT-AND-DIVERGENCE.md`,
+`production-overview/PRODUCTION-SYSTEM-MAP.md`, the Code Mode suite, the
+auditability suite, the cognitive-harness suite, the MCP peer-notebooks suite,
+the agent-governance-substrate suite, `monorepo-merge/INTEGRATION-MAP.md`, the
+security audits, and the v1 finalstretch suite. Implementation-status claims were
+cross-referenced against `/workspace/src` and `/workspace/apps/web`; the
+load-bearing security, session-model, and surface claims were verified directly
+against source.