Chorus-AIDLC · ChenNima · Apr 20, 2026 · Apr 20, 2026
diff --git a/docs/chorus-observability-design.md b/docs/chorus-observability-design.md
diff --git a/docs/design.pen b/docs/design.pen
diff --git a/docs/token-observability-bugfix-checklist.md b/docs/token-observability-bugfix-checklist.md
@@ -0,0 +1,61 @@
+# Token Observability — Bug Fix Checklist
+
+## Root Cause: Claude API usage fields have mixed semantics
+
+Claude API transcript `message.usage` per turn:
+- `input_tokens` — per-turn (incremental)
+- `output_tokens` — per-turn (incremental)
+- `cache_creation_input_tokens` — per-turn (incremental)
+- `cache_read_input_tokens` — **cumulative** across the session
+
+CC calculates `total_tokens = input + output + cache_create + cache_read` from the **last turn only**, because cache_read in the last turn already contains the full session total.
+
+The old code sent ALL turns and summed them → cache_read was double-counted N times (once per turn). Front-end only showed `input + output`, hiding the real magnitude.
+
+## Bug #1: Entity attribution — carry-forward wrong for sub-agents
+
+**Symptom**: Reviewer tokens attributed to elaboration (idea entity) instead of review (proposal entity).
+
+**Root cause**: carry-forward picks the last timeline entry before each turn. Reviewer reads idea for context before touching proposal → early turns attributed to idea. For sub-agents, ALL tokens belong to one primary entity regardless of which other entities they read.
+
+**Fix**: `findPrimaryEntity()` picks highest-priority entity from timeline (task > proposal > idea > document). Sub-agent turns all go to that entity. Main agent still uses carry-forward.
+
+- [x] `src/services/observability.service.ts` — replace sentinel with primary entity model
+- [x] `src/services/__tests__/token-attribution.test.ts` — 16 tests covering both models
+- [x] Verify: `pnpm test` — all pass
+
+## Bug #2: Shell scripts send all turns → cumulative cache_read over-counted
+
+**Symptom**: Token totals are 10-50x higher than CC reports.
+
+**Root cause**: All three shell scripts (`on-stop.sh`, `on-subagent-stop.sh`, `on-session-end.sh`) extracted every assistant turn's usage and sent them as separate records. Server summed all records. Since `cache_read_input_tokens` is cumulative, summing N turns counts the same cache tokens N times.
+
+**Fix**: Extract only the **last assistant turn** (which contains session totals). Also switch to temp files + `--slurpfile` + `curl -d @file` in all scripts.
+
+- [x] `public/chorus-plugin/bin/on-stop.sh` — last turn only
+- [x] `public/chorus-plugin/bin/on-subagent-stop.sh` — last turn + temp files + sourceSessionId
+- [x] `public/chorus-plugin/bin/on-session-end.sh` — last turn + temp files + sourceSessionId
+- [x] Verify: `bash public/chorus-plugin/bin/test-syntax.sh` — all pass
+
+## Bug #3: Frontend tokensSum missing cache fields
+
+**Symptom**: Page shows ~1.9k when CC reports ~23k.
+
+**Root cause**: `tokensSum()` only summed `input_tokens + output_tokens`, missing `cache_creation_input_tokens` and `cache_read_input_tokens`. CC's formula includes all 4 fields.
+
+**Fix**: `tokensSum = input + output + cache_create + cache_read` in all 4 components.
+
+- [x] `tokens-view.tsx` — fixed
+- [x] `agent-observability.tsx` — fixed
+- [x] `task-tokens-view.tsx` — fixed
+- [x] `token-usage-card.tsx` — fixed
+- [x] Verify: `npx tsc --noEmit` — clean
+
+## Verification
+
+With a fresh CC session + new project:
+1. Run full yolo pipeline (idea → proposal → reviewer → approve → dev → verify)
+2. Check observability page: total tokens should match CC's reported total_tokens
+3. Review phase should show reviewer's tokens (~23k), not 0 or 3.6k
+4. Execution phase should show dev's tokens (~34k)
+5. No tokens should leak into elaboration phase from reviewers
diff --git a/docs/token-observability-changes.md b/docs/token-observability-changes.md
@@ -0,0 +1,89 @@
+# Token Observability Changes Summary
+
+## Architecture Overview
+
+Token usage is tracked per-assistant-turn via `TokenUsageRecord` table (decoupled from AgentSession). The CC Stop hook fires every assistant turn, uploading the full transcript turns + entity timeline to the server. Server does attribution and dedup.
+
+## Changed Files
+
+### Shell Scripts (Plugin)
+
+**`public/chorus-plugin/bin/on-stop.sh`**
+- Stop hook fires every assistant turn (async)
+- Extracts turns (per-assistant-message usage with timestamp) from transcript
+- **NEW**: Extracts entity timeline from transcript's MCP tool_use blocks (was: tool-log.jsonl)
+- Builds payload via temp files + jq --slurpfile (avoids shell arg length limits)
+- POSTs to `/api/agent-report/token-usage` with `sourceSessionId` for dedup
+
+**`public/chorus-plugin/bin/on-subagent-stop.sh`**
+- Layer 3 added: parses sub-agent transcript for turns + timeline, POSTs to server
+- **NEW**: Extracts entity timeline from sub-agent's own transcript (was: tool-log.jsonl filtered by agent_id)
+- Passes `sessionUuid` (Chorus session) so server can distinguish sub-agent vs main agent records
+
+**`public/chorus-plugin/bin/on-session-end.sh`**
+- Same transcript-based timeline extraction as on-stop.sh
+- Final upload on session close
+
+**`public/chorus-plugin/hooks/hooks.json`**
+- Added Stop hook entry with nested format: `{matcher: "", hooks: [{type, command, async: true}]}`
+
+### Server (API + Service)
+
+**`src/app/api/agent-report/token-usage/route.ts`**
+- Accepts `{sourceSessionId?, sessionUuid?, turns[], timeline[]}`
+- Calls `attributeTokenUsage()` for per-turn entity attribution
+- **NEW**: Calls `resolveProjectUuids()` (plural) for per-record projectUuid resolution
+- Each record gets projectUuid based on its own entityUuid, not a single shared value
+
+**`src/services/observability.service.ts`**
+
+Key functions:
+- `attributeTokenUsage()` — per-turn records with timeline entity matching via `findActiveEntity()` (carry-forward: last timeline entry before turn timestamp)
+- **NEW**: `resolveProjectUuids()` — batch-queries all entity UUIDs, returns Map<entityUuid, projectUuid>. Replaces old `resolveProjectUuid()` which returned a single value for all records
+- `insertAttributedTokenUsage()` — uses `prisma.createMany` with `skipDuplicates` on `(sourceSessionId, turnTimestamp)` unique constraint
+- `getIdeaLifecycleTokens()` — **NEW phase token logic**: uses `sessionUuid` to split tokens between phases:
+  - proposal entity + sessionUuid → review (sub-agent reviewer)
+  - proposal entity + no sessionUuid → proposal drafting (main agent)
+  - task entity + sessionUuid → execution (sub-agent dev)
+  - task entity + no sessionUuid → verify (main agent admin)
+  - idea entity → elaboration
+
+### Database
+
+**`prisma/schema.prisma`**
+- Added `TokenUsageRecord` model with `@@unique([sourceSessionId, turnTimestamp])` for dedup
+- Removed `tokenUsage` JSON field from `AgentSession`
+
+## Key Design Decisions
+
+1. **Timeline from transcript, not tool-log.jsonl** — Each agent's transcript is independent. tool-log.jsonl is shared and mixes agents. Transcript has MCP tool_use blocks with entity UUIDs in input params.
+
+2. **Per-record projectUuid** — A single CC session may span multiple projects. Each record's entityUuid resolves to its own projectUuid. Records without entity get null projectUuid.
+
+3. **sessionUuid distinguishes main agent vs sub-agent** — Sub-agents have Chorus sessions (sessionUuid set). Main agent records have sessionUuid null. This is used for phase attribution in lifecycle views.
+
+4. **Server-side dedup** — Stop hook uploads full transcript every turn. `skipDuplicates` on `(sourceSessionId, turnTimestamp)` prevents re-insertion. Only new turns get inserted.
+
+5. **Server resolves projectUuid** — Client doesn't need to track project. Server looks up entity → project via DB (task→proposal→project, idea→project, proposal→project).
+
+## Known Limitations
+
+- **Long CC sessions across projects**: The carry-forward entity attribution means turns between entity tool calls inherit the last entity. In a single-project session this is correct. In a multi-project session, turns after switching projects but before the first entity tool call in the new project may still be attributed to the previous project's entity.
+- **Stale data from old uploads**: Records inserted by older code versions (single projectUuid, tool-log.jsonl timeline) remain in the DB. They are dedup-protected and won't be overwritten. For clean testing, use a brand new project in a fresh CC session.
+
+## Testing Checklist
+
+To verify E2E in a **new CC session** (important — avoids stale data):
+
+1. Create a new project
+2. Create idea, claim, skip elaboration (generates idea entity timeline entries)
+3. Create proposal with doc + task drafts, submit (generates proposal entity entries)
+4. Spawn proposal-reviewer sub-agent (generates reviewer token upload via on-subagent-stop)
+5. Approve proposal (materializes tasks)
+6. Spawn dev sub-agent to execute task (generates task entity + dev token upload)
+7. Verify task as admin
+8. Check observability page: project total should be reasonable (not millions)
+9. Check idea detail → Tokens tab: lifecycle phases should have distinct non-zero values
+10. Check proposal detail: Total Tokens and Tool Calls should reflect actual work
+11. Verify Review phase shows reviewer's tokens (non-zero), separate from Proposal drafting
+12. Verify Execution phase shows dev's tokens, separate from Verify phase
diff --git a/messages/en.json b/messages/en.json
@@ -132,6 +132,7 @@
     "proposals": "Proposals",
     "tasks": "Tasks",
     "activity": "Activity",
+    "observability": "Observability",
     "logout": "Sign out",
     "newProject": "New Project",
     "backToProjects": "Projects",
@@ -1126,6 +1127,7 @@
         "elaboration": "Elaboration",
         "proposal": "Proposal",
         "tasks": "Tasks",
+        "tokens": "Tokens",
         "activity": "Activity"
       },
       "timeline": {
@@ -1192,5 +1194,70 @@
       "project": "Project",
       "project_group": "Group"
     }
+  },
+  "observability": {
+    "title": "Agent Observability",
+    "subtitle": "Token usage and tool call metrics across all agents",
+    "range7d": "7d",
+    "range30d": "30d",
+    "range90d": "90d",
+    "totalTokens": "Total Tokens",
+    "toolCalls": "Tool Calls",
+    "sessions": "Sessions",
+    "cacheRead": "Cache Read",
+    "cacheWrite": "Cache write",
+    "outputTokens": "Output",
+    "errorRate": "Error Rate",
+    "callsPerDay": "{count}/day",
+    "input": "Input",
+    "output": "Output",
+    "agents": "Agents",
+    "agentsCount": "{count} total",
+    "onlineCount": "{count} online",
+    "tokensSuffix": "tokens",
+    "dailyTokenUsage": "Daily Token Usage",
+    "legendInput": "Input",
+    "legendOutput": "Output",
+    "today": "Today",
+    "toolColumn": "Tool",
+    "callsColumn": "Calls",
+    "tokensColumn": "Tokens",
+    "avgMsColumn": "Avg ms",
+    "errorsColumn": "Errors",
+    "noData": "No agent activity yet",
+    "noDataDesc": "Tool usage data will appear once agents start working on this project.",
+    "noToolData": "No tool usage in this range",
+    "selectAgent": "Select an agent",
+    "selectAgentDesc": "Pick an agent on the left to see their daily tokens and tool usage.",
+    "loading": "Loading...",
+    "loadError": "Failed to load observability data",
+    "rolePm": "PM Agent",
+    "roleDeveloper": "Developer",
+    "roleAdmin": "Admin",
+    "agent": "Agent",
+    "loadFailed": "Failed to load token usage",
+    "lifecycle": "Lifecycle breakdown",
+    "phase": {
+      "elaboration": "Elaboration",
+      "proposal": "Proposal drafting",
+      "review": "Review",
+      "execution": "Execution",
+      "verify": "Verify"
+    },
+    "taskList": "Per-task usage",
+    "noTasks": "No tasks spawned from this idea yet",
+    "drafting": "Drafting",
+    "draftingBreakdown": "Drafting breakdown",
+    "draftingDocs": "Docs",
+    "draftingTasks": "Tasks",
+    "draftingValidate": "Validate",
+    "reviewRounds": "Review rounds",
+    "reviewPass": "PASS",
+    "reviewFail": "FAIL",
+    "toolTimeline": "Tool calls",
+    "toolCall": "{count, plural, one {# call} other {# calls}}",
+    "toolErrors": "{count, plural, one {# error} other {# errors}}",
+    "errors": "Errors",
+    "sessionInfo": "Session info"
   }
 }
diff --git a/messages/zh.json b/messages/zh.json
@@ -132,6 +132,7 @@
     "proposals": "提案",
     "tasks": "任务",
     "activity": "动态",
+    "observability": "可观测性",
     "logout": "退出登录",
     "newProject": "新建项目",
     "backToProjects": "项目",
@@ -1127,6 +1128,7 @@
         "elaboration": "需求细化",
         "proposal": "提案",
         "tasks": "任务",
+        "tokens": "Tokens",
         "activity": "动态"
       },
       "timeline": {
@@ -1193,5 +1195,70 @@
       "project": "项目",
       "project_group": "分组"
     }
+  },
+  "observability": {
+    "title": "智能体可观测性",
+    "subtitle": "所有智能体的 Token 使用与工具调用指标",
+    "range7d": "7 天",
+    "range30d": "30 天",
+    "range90d": "90 天",
+    "totalTokens": "Token 总量",
+    "toolCalls": "工具调用",
+    "sessions": "会话",
+    "cacheRead": "缓存读取",
+    "cacheWrite": "缓存写入",
+    "outputTokens": "输出",
+    "errorRate": "错误率",
+    "callsPerDay": "{count}/天",
+    "input": "输入",
+    "output": "输出",
+    "agents": "智能体",
+    "agentsCount": "共 {count} 个",
+    "onlineCount": "{count} 个在线",
+    "tokensSuffix": "tokens",
+    "dailyTokenUsage": "每日 Token 使用",
+    "legendInput": "输入",
+    "legendOutput": "输出",
+    "today": "今日",
+    "toolColumn": "工具",
+    "callsColumn": "调用数",
+    "tokensColumn": "Token",
+    "avgMsColumn": "平均耗时",
+    "errorsColumn": "错误",
+    "noData": "暂无智能体活动",
+    "noDataDesc": "当智能体开始在此项目上工作后，工具使用数据将在此显示。",
+    "noToolData": "此区间内暂无工具使用",
+    "selectAgent": "选择智能体",
+    "selectAgentDesc": "在左侧选择一个智能体以查看其每日 Token 与工具使用情况。",
+    "loading": "加载中...",
+    "loadError": "加载可观测性数据失败",
+    "rolePm": "PM 智能体",
+    "roleDeveloper": "开发者",
+    "roleAdmin": "管理员",
+    "agent": "智能体",
+    "loadFailed": "加载 token 使用记录失败",
+    "lifecycle": "生命周期分布",
+    "phase": {
+      "elaboration": "需求细化",
+      "proposal": "提案起草",
+      "review": "评审",
+      "execution": "执行",
+      "verify": "验证"
+    },
+    "taskList": "按任务分布",
+    "noTasks": "该想法尚未派生任何任务",
+    "drafting": "起草",
+    "draftingBreakdown": "起草阶段分布",
+    "draftingDocs": "文档",
+    "draftingTasks": "任务",
+    "draftingValidate": "校验",
+    "reviewRounds": "评审轮次",
+    "reviewPass": "通过",
+    "reviewFail": "未通过",
+    "toolTimeline": "工具调用",
+    "toolCall": "{count} 次调用",
+    "toolErrors": "{count} 个错误",
+    "errors": "错误",
+    "sessionInfo": "会话信息"
   }
 }
diff --git a/prisma/migrations/20260419112021_add_tool_usage_event_and_token_usage/migration.sql b/prisma/migrations/20260419112021_add_tool_usage_event_and_token_usage/migration.sql
@@ -0,0 +1,42 @@
+-- AlterTable
+ALTER TABLE "AgentSession" ADD COLUMN     "tokenUsage" JSONB;
+
+-- CreateTable
+CREATE TABLE "ToolUsageEvent" (
+    "id" SERIAL NOT NULL,
+    "uuid" TEXT NOT NULL,
+    "companyUuid" TEXT NOT NULL,
+    "agentUuid" TEXT NOT NULL,
+    "sessionUuid" TEXT,
+    "toolName" TEXT NOT NULL,
+    "source" TEXT NOT NULL DEFAULT 'mcp',
+    "durationMs" INTEGER NOT NULL,
+    "inputSize" INTEGER NOT NULL,
+    "outputSize" INTEGER NOT NULL,
+    "isError" BOOLEAN NOT NULL DEFAULT false,
+    "errorText" TEXT,
+    "entityType" TEXT,
+    "entityUuid" TEXT,
+    "projectUuid" TEXT,
+    "createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
+
+    CONSTRAINT "ToolUsageEvent_pkey" PRIMARY KEY ("id")
+);
+
+-- CreateIndex
+CREATE UNIQUE INDEX "ToolUsageEvent_uuid_key" ON "ToolUsageEvent"("uuid");
+
+-- CreateIndex
+CREATE INDEX "ToolUsageEvent_companyUuid_createdAt_idx" ON "ToolUsageEvent"("companyUuid", "createdAt");
+
+-- CreateIndex
+CREATE INDEX "ToolUsageEvent_agentUuid_createdAt_idx" ON "ToolUsageEvent"("agentUuid", "createdAt");
+
+-- CreateIndex
+CREATE INDEX "ToolUsageEvent_sessionUuid_idx" ON "ToolUsageEvent"("sessionUuid");
+
+-- CreateIndex
+CREATE INDEX "ToolUsageEvent_entityType_entityUuid_idx" ON "ToolUsageEvent"("entityType", "entityUuid");
+
+-- CreateIndex
+CREATE INDEX "ToolUsageEvent_projectUuid_createdAt_idx" ON "ToolUsageEvent"("projectUuid", "createdAt");