Your local-first, multi-repo, 24/7 autonomous coding coworker. The Mac stays on; Claude Code authenticated locally drives Docker-isolated workers across every repo in your registry, opens draft PRs on GitHub, runs external validators, scores risk, and merges low-risk changes automatically — gated by your phone if anything bigger.
v1.0.0 GA shipped 2026-05-25 (with an owner-waived 24h soak gate). v2.2.0-rc2 is the current TypeScript production-grade tag for this single-operator system. Per ADR-0013 Path A, no
v2.2.0GA tag is created under the current release policy.Latest:
conversational-cockpit-v1— the P0-P7 workbook is complete: a chat-first Operator Cockpit now uses local Claude Code for clarification and planning, local Codex for implementation, and an isolated Gemini validator as the evidence-only PR hard gate. The strict real smoke, browser quality smoke, in-app browser validation, full test gate, and external Gemini evidence-only validator all passed on 2026-06-03. See WORKBOOK_v3.md and docs/SESSION_LOG_v3.md.The architecture has converged to a TypeScript-only, event-sourced, three-plane design per ADR-0010; the dual-kernel section further down is retained as v1.0.0 history.
The fastest way to try the current product surface is the Operator Cockpit.
It starts the TypeScript daemon on 7247 and the web app on 7248.
# 1. Install workspace dependencies
pnpm install
# 2. Start the local daemon + Operator Cockpit web UI
pnpm cockpit:dev
# 3. Open the chat-first cockpit
open http://127.0.0.1:7248Then run the core operator flow:
- Describe the mission in the chat composer.
- Let Claude ask as many clarification questions as needed; the server will not unlock roadmap or coding until confidence is at least 95%.
- Generate PRD / ADR / Roadmap once the clarification gate unlocks.
- Approve the roadmap only when it matches the intended outcome.
- Start execution; Codex performs the implementation in the repo-bound worker.
- Read progress, evidence, Gemini verdicts, and PR-gate status inline in the same conversation thread.
- Re-check the Draft PR gate only after Gemini PASS and repo policy allow it.
The cockpit defaults to local CLI / subscription-style usage: claude-cli for
planner/clarifier and codex-cli for coding. It must not silently fall back to a
paid API. If either local engine cannot run, the UI shows a current HOLD with a
recovery action rather than fake success. Gemini is the only default external
validator, and it sees evidence only.
The new cockpit is intentionally simple: one conversation stream plus a thin status strip. Clarification, roadmap generation, worker progress, evidence, Gemini decisions, and PR-gate results all appear inline as chat bubbles or cards. The old multi-panel cockpit surface is no longer the product default.
The P0-P7 workbook closed with these verified properties:
- Local engine split — Claude Code is the clarifier/planner; Codex is the
coder. Evidence records
planner=claude-cli/coder=codex-cliand local subscription auth modes. - 95% understanding gate — roadmap and coding are server-blocked until the planner reaches at least 95% confidence and there are no pending questions. Claude may ask zero, one, or many follow-up questions; the contract is confidence, not a fixed question count.
- Conversation UI — the cockpit renders as a single chat thread and thin status strip. Legacy sidebar, inspector, tabs, and Project Pulse panels are no longer part of the default flow.
- Gemini hard gate — Gemini is the default evidence-only validator. Draft PR creation is blocked unless the latest Gemini verdict is PASS, and remote writes are still blocked unless repo policy explicitly enables them.
- Team memory — repo/operator Tier 1 memory and event-derived Tier 2 memory are injected into worker context; repeated Gemini rejection lessons can be compiled back into repo memory.
- Hybrid execution — small missions use a single worker run; large roadmaps can execute through a per-node DAG with node evidence and honest failure handling.
- Real end-to-end validation — strict real smoke, browser quality smoke,
full tests, in-app browser checks, and an independent Gemini evidence-only
validator all passed. Latest strict report:
evidence/launch/operator-cockpit-real-smoke-2026-06-03T17-03-35-037Z.md.
The core value loop has now run end-to-end on real LLM work for the first time: a subscription Claude coder running inside Docker writes real code → evidence is collected → two independent validator families (OpenAI + Gemini) judge it on evidence only → a real draft PR is opened on GitHub → token/cost usage is persisted. Architecture decision: ADR-0019.
Historical stage:
ProductionHardened_v2.4_Ready(superseded by WORKBOOK_v3.md). This is a single-operator system;system.allow_remote_writesdefaults tofalseand gates every outward write —git push, PR creation, and merge alike.
-
claude-in-Docker runner —
packages/runner/src/claude-docker-runner.tsruns the subscription Claude CLI inside a container against a per-task git worktree, honoring the image's/entrypoint.shcontract (writes/workspace/prompt.txt; setsCLAUDE_ROLE/CLAUDE_MODEL/CLAUDE_PERMISSION_MODE/CLAUDE_ALLOWED_TOOLS; reads back/workspace/result.json). A static + runtime preflight (preflightClaudeDockerEnvironment/preflightRuntime) fails fast with aHOLD-CLAUDE-DOCKER-IMAGEorHOLD-CLAUDE-AUTH-IN-DOCKERreason rather than ever falling back silently to the paid API. -
runner:e2e1derived image —packages/runner/docker/Dockerfile.e2e1entrypoint-e2e1.sh. The patched entrypoint writes/workspace/cli-envelope.json(the raw CLI usage envelope) before normalization, so authoritative token counts survive (the stockresult.jsonreported0/0).
-
Subscription auth via OAuth token — inject
AEDEV_CLAUDE_OAUTH_TOKEN(fromclaude setup-token) →CLAUDE_CODE_OAUTH_TOKENinside the container. The macOS keychain credential is host-bound and 401s inside a Linux container, so the token path is the proven, keychain-free option. AllANTHROPIC_*paid-API env vars are stripped from the container. -
model_usageaccounting + live cost roller —insertModelUsagepersists input/output tokens + cost per run and emits amodel.usage.recordedevent. Local subscription usage is tracked by run count + cost, never reported as$0. The daemon now feeds a long-livedCostRoller(seeded frommodel_usageon boot so spend survives a restart) and exposescost_total_usd/cost_per_pr_usd_7d/cost_event_counton/metrics. Only known costs are summed — subscription-unknown stays 0, never fabricated. -
Dual-family validators — OpenAI- and Gemini-family judges score the evidence package only (never the coder's conversation or chain-of-thought). The merge policy requires two independent families to pass.
-
Structured ClarificationGate (ADR-0020) —
packages/daemon/src/clarification-gate.tsscores mission ambiguity deterministically (no LLM, no token spend) over four signals; above the threshold (trigger_threshold: 50inconfig/policies.yaml) it asks ≤4 questions before any coder runs and writes a verifiableclarified-spec.md. Decision: ADR-0020. -
Autonomous draft-PR closure — the daemon's mission loop now opens a real draft PR on an
AUTO_MERGEdecision viaDraftPrGateoverGhGitRemoteWriter/GhDraftPrCreator(runner plane), instead of stopping at a mock merge. The gate fail-closes onallow_remote_writes,repo.enabled, and forbidden paths, so the no-push default is preserved — with the flagfalse(default) the loop opens nothing. This folds the provenscripts/e2e1-real-loop.tspath into the loop. -
Real-diff forbidden-path gate — forbidden-path detection reads the runner's
changed-paths.json(the actualgit difffile list) rather than regexing evidence prose, and feeds the merge policy's hard BLOCK (mission-runner.ts). -
/github/syncis gated — the GitHub PR-sync route now fails closed withREMOTE_WRITES_DISABLEDunlesssystem.allow_remote_writesis true (it was previously guarded only by the presence of a GitHub token).
Operator Cockpit is the human control plane for the local-first coding coworker. It is intended to feel more like Claude Code Desktop than a passive dashboard: chat first, explicit clarification, visible execution progress, and safety gates that stay obvious.
The current conversational surface includes:
- Single chat workspace — one conversation thread for clarification, planning, execution, evidence, Gemini verdicts, and PR-gate outcomes.
- Thin status strip — the only persistent chrome is stage, current action, progress, and pending approval count.
- Structured clarification cards — Claude's follow-up questions are answerable through choices and free-form replies, with the original question and answer transcript sent back to Claude on follow-up.
- Provider and token transparency — major planner/worker/validator actions
expose whether they used
claude-cli,codex-cli, mock/test mode, or Gemini, plus token/cost data when available. - Current-only HOLDs — active blockers are shown prominently, while superseded historical HOLDs remain in logs/events instead of stale top banners.
- Safety-preserving PR gate — draft PR creation remains blocked unless the
latest Gemini verdict is PASS,
system.allow_remote_writesis true, and repo policy explicitly permits outward writes. - Repo-bound worker (trust model) — when you select a repo and press
Start, the worker executes inside an isolatedgit worktreeof that repo (checked out at the committedHEAD, so your working tree and branches are untouched), never an empty scratch directory. If the selected repo is missing, disabled, or not a git repository, the mission HOLDs (HOLD-TARGET-REPO-UNAVAILABLE) rather than writing throwaway files and reporting "done". Evidence records the realchanged-paths.json, repo path, and worktree path; touching a forbidden path (.env*,secrets/**,.github/**,AGENTS.md,CLAUDE.md) blocks the merge gate.
For the detailed UX v2 implementation brief, see
docs/handoff/operator-cockpit-ux-v2-prd-2026-05-31.md.
# 0. One-time: capture a keychain-free subscription token
claude setup-token # store the sk-ant-oat... value where your secrets live
# 1. Build the runner:e2e1 image (authoritative token counts)
docker build -f packages/runner/docker/Dockerfile.e2e1 \
-t claude-code-247/runner:e2e1 packages/runner/docker
# 2. Real end-to-end loop: docker Claude coder → dual-family → draft PR → model_usage
# (draft-only; never merges. Needs the OAuth token + OPENAI/GEMINI keys in env.)
node_modules/.bin/tsx scripts/e2e1-real-loop.ts
# 3. ClarificationGate shadow walk (deterministic; spends no LLM tokens)
node_modules/.bin/tsx scripts/e2e2-clarification-shadow-walk.tsSafety model: these scripts pass
allowRemoteWrites: truein-process to a draft-only PR gate; the globalsystem.allow_remote_writesstaysfalse. Because they pre-approve the mission, they deliberately bypass the daemon's approval path — so no ntfy phone approval is requested. To exercise the real approval flow (medium/high-risk merge, API fallback, etc.), run a mission through the daemon'sIntakeService, which pushes an ntfy notification to your phone for approve/reject.
The dual-kernel layout below is the current state as of v1.0.0 GA. v2.0 collapses it to a single TypeScript control plane and removes the Python tree entirely. See V2_ARCHITECTURE.md for the target architecture and the stage-by-stage plan.
claude-code-247 is one product OS with two cooperating kernels:
| Layer | Implementation | Role |
|---|---|---|
| Control plane | TypeScript aedev (pnpm monorepo) |
Primary CLI, daemon, dashboard, state machine, mission intake, roadmap, task graph, approvals, memory, risk, preview/deploy orchestration, evidence bundle. |
| Execution kernel | Python claude247 (v1.0.0 GA) |
Mature Docker worker runtime, headless claude --print invocation, Gemini + OpenAI judges, GitHub PR creation. Invoked by aedev during the parity window. |
| Bridge | @aedev/claude247-bridge |
Enqueues tasks into the Python state DB, polls status, imports evidence back into aedev's SQLite. |
This dual-kernel design is recorded in
ADR-0009, which supersedes
ADR-0008.
aedev is the primary entry point for new product-OS work; the Python kernel
continues to drive worker execution and validator orchestration until the
TypeScript runtime reaches parity (see
docs/aedev-prototype-status.md for the
parity gate list). Both ADRs will be superseded by ADR-0010 in Stage A of
the v2.0 plan.
- Multi-repo from day one. One registry, many repos. Per-repo budget, risk policy, allowed/forbidden paths.
- Local-first execution. Mac + Docker. Your authenticated Claude Code session is the default; the paid API is opt-in.
- Mobile control.
claude247 status --plainandclaude247 status-board --plainare built for SMS-sized output. ntfy.sh pushes for approvals and stuck tasks. - External validator isolation. Gemini 2.5 Pro and an OpenAI-compatible judge see only the evidence package — never the Coder's conversation.
- Low-risk auto-merge with score 0–100; medium asks your phone, high blocks.
- Long-term memory that compiles failures, lessons, and decisions
back into per-repo
.agent/*.mdfiles. - Failure replay for any task.
- Live read-only watchdog dashboard (new in v1.0.0 / M22b) — see below.
aedev is the primary control plane. The Python claude247 kernel is
installed alongside it during the parity window and handles worker execution
underneath.
# 1. Install the Python execution kernel (mature, GA v1.0.0)
make install # creates venv + installs deps + launchd plists
claude247 doctor # verify kernel environment
# 2. Install the TypeScript control plane
pnpm install
pnpm -r build
# 3. Initialize aedev home (~/.aedev/)
aedev init
# 4. Start the aedev daemon (port 7247) — control plane + dashboard
aedev daemon start
open http://localhost:7247
# 5. Submit a mission via the control plane (two-step approval)
aedev intake "refactor the auth middleware in repo my-repo"
aedev mission list # find the mission id
aedev mission approve <id> # explicit approval — no self-approve
# 6. Inspect status / tasks via the control plane
aedev status --plain
aedev task list
# 7. Read-only watchdog (Python kernel) — phone-friendly
claude247 status-board --plain
claude247 watchdog --plain
claude247 status-board --json
claude247 status-board --write-md M22_WATCHDOG_DASHBOARD.mdDuring the parity window, some kernel-level operations are still invoked
directly via claude247 (worker launch, validator orchestration, GitHub
PR creation). The @aedev/claude247-bridge package routes aedev missions
through the Python kernel automatically — see
ADR-0009 and
docs/aedev-prototype-status.md.
A read-only operations dashboard for "is the 24/7 daemon actually OK
right now?" Designed to be safe to run from a phone while the
dispatcher is mid-tick — the SQL is SELECT-only and the contract is
asserted by a regression test
(tests/unit/test_status_board.py::test_read_only_does_not_mutate_db).
Web (Apple-style): http://127.0.0.1:8423/status-board
- Activity-ring soak progress (recolors green / blue / red by state) using only inline SVG + CSS — no charting library
- Auto-refresh every 15s (configurable 5 / 15 / 30 / 60s / off);
fetches
/status-board.json, updates DOM in place, briefly tints cards that changed — no full reload, no flicker - EN ↔ 中文 language toggle with
localStoragepersistence - Dark mode follows
prefers-color-scheme - Live indicator dot in the top bar — pulsing green when live, amber when paused, red when a fetch fails
- Pause / resume / refresh-now controls with a morphing play/pause SVG button
- Zero external dependencies — no CDN, no font files, no JS library; the whole page is ~25KB inline
CLI:
claude247 status-board --plain
# Claude247 Watchdog Dashboard
# Generated: 2026-05-25T...
#
# Release State / Soak Progress / Runtime Health
# Queue / Task State / Recent Signals / GA Gates / UsageJSON: http://127.0.0.1:8423/status-board.json
{
"generated_at": "...",
"release_state": { "main_sha": "...", "ga_status": "..." },
"soak": { "t0": "...", "progress_percent": 38, "result": "PARTIAL" },
"runtime_health":{ "launchd_loaded": 4, "dispatcher": "healthy", ... },
"queue": { "active_tasks": 0, "orphan_commands": 0, ... },
"signals": { "new_critical_errors": 0, "alert_storm": false, ... },
"ga_gates": { "passed": 18, "total": 19, "recommendation": "..." },
"usage": { "runs_total": 0, "active_workers": 0, ... }
}The watchdog reads M20_SOAK_RESULT.md to auto-discover the
dispatcher T0; pass --t0 2026-05-24T21:46Z to override.
v1.0.0GA — released 2026-05-25 (Pythonclaude247kernel).- The first GA release. See RELEASE_NOTES_GA.md for the full notes, GA_GATE.md for the 19-gate GA contract, and M22_GA_DECISION_REPORT.md for the GA decision record.
- Soak gate was explicitly waived by the owner after ~9h 12m of
healthy soak evidence (4/4 launchd loaded, ~1182 dispatcher idle
ticks, backup completed, 0 alerts, 0 orphan commands, $0 Anthropic
worker spend). Final T+24h observation is a post-GA follow-up —
the watchdog dashboard will auto-flip
soak.resulttoPASSorFAILonce wall-clock crosses2026-05-25T21:46Z. - Pre-release history (
alpha.0→beta.2) preserved on GitHub. v2.2.0-rc2is production grade for the TypeScript line — single TypeScript daemon, Python tree removed, HOLD as first-class state, closed-loop approval (ntfy/Tailscale), push-time security gate, resumable moves, cross-platform supervisor, chaos drills, Agent Mesh, RoadmapAgent, and Sentinel. The formal policy is docs/operations/release-policy.md.- No
v2.1.0orv2.2.0GA tag is expected under the current policy. The expected v2 release references arev2.1.0-rc1,v2.1.0-rc2,v2.2.0-rc1, andv2.2.0-rc2.
v2 TypeScript line:
- V2_ARCHITECTURE.md — full v2.0 architecture and stage-by-stage implementation plan (start here)
- docs/operations/release-policy.md — current release-grade and tag policy
v1.0.0 (current GA):
- RELEASE_NOTES_GA.md — v1.0.0 release notes
- GA_GATE.md — 19-gate GA contract + owner-waiver policy
- M22_GA_DECISION_REPORT.md — GA decision record
- M20_SOAK_RESULT.md — soak observation + waiver record
- DEFINITION_OF_DONE.md — DoD checklist
- CHANGELOG.md — release history
docs/ARCHITECTURE.md— module map and data flow (v1.0.0)docs/INSTALL.md— full install + uninstall + doctordocs/REMOTE_DISPATCH.md— phone / Remote / Dispatch operating guidedocs/SECURITY.md— secret hygiene, forbidden paths, approval flowdocs/MEMORY.md— vector + .agent file architecturedocs/AUTO_MERGE_POLICY.md— risk scoring and merge gatesdocs/VALIDATORS.md— Gemini + OpenAI judge contractsdocs/REPO_ONBOARDING.md— adding reposdocs/OPERATIONS.md— day-to-day operating playbook
# Install dependencies (Node.js ≥ 20, pnpm ≥ 10 required)
pnpm install
# Run all tests
pnpm test
# Type-check across the workspace
pnpm typecheck
# Lint
pnpm lint
# Opt-in real subprocess smoke tests (require `claude` and/or Docker on PATH)
AEDEV_SMOKE_CLAUDE=1 pnpm test --filter @aedev/runner
AEDEV_SMOKE_DOCKER=1 pnpm test --filter @aedev/runner
# Start the daemon (port 7247) — serves the dashboard + REST API
cd packages/daemon && pnpm start
open http://localhost:7247Architecture decisions for aedev: docs/adr/ (ADR-0001 through ADR-0009).
TS runtime parity gates: docs/aedev-prototype-status.md.
Internal.