Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
662 changes: 662 additions & 0 deletions docs/chorus-observability-design.md

Large diffs are not rendered by default.

4,707 changes: 4,706 additions & 1 deletion docs/design.pen

Large diffs are not rendered by default.

61 changes: 61 additions & 0 deletions docs/token-observability-bugfix-checklist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Token Observability — Bug Fix Checklist

## Root Cause: Claude API usage fields have mixed semantics

Claude API transcript `message.usage` per turn:
- `input_tokens` — per-turn (incremental)
- `output_tokens` — per-turn (incremental)
- `cache_creation_input_tokens` — per-turn (incremental)
- `cache_read_input_tokens` — **cumulative** across the session

CC calculates `total_tokens = input + output + cache_create + cache_read` from the **last turn only**, because cache_read in the last turn already contains the full session total.

The old code sent ALL turns and summed them → cache_read was double-counted N times (once per turn). Front-end only showed `input + output`, hiding the real magnitude.

## Bug #1: Entity attribution — carry-forward wrong for sub-agents

**Symptom**: Reviewer tokens attributed to elaboration (idea entity) instead of review (proposal entity).

**Root cause**: carry-forward picks the last timeline entry before each turn. Reviewer reads idea for context before touching proposal → early turns attributed to idea. For sub-agents, ALL tokens belong to one primary entity regardless of which other entities they read.

**Fix**: `findPrimaryEntity()` picks highest-priority entity from timeline (task > proposal > idea > document). Sub-agent turns all go to that entity. Main agent still uses carry-forward.

- [x] `src/services/observability.service.ts` — replace sentinel with primary entity model
- [x] `src/services/__tests__/token-attribution.test.ts` — 16 tests covering both models
- [x] Verify: `pnpm test` — all pass

## Bug #2: Shell scripts send all turns → cumulative cache_read over-counted

**Symptom**: Token totals are 10-50x higher than CC reports.

**Root cause**: All three shell scripts (`on-stop.sh`, `on-subagent-stop.sh`, `on-session-end.sh`) extracted every assistant turn's usage and sent them as separate records. Server summed all records. Since `cache_read_input_tokens` is cumulative, summing N turns counts the same cache tokens N times.

**Fix**: Extract only the **last assistant turn** (which contains session totals). Also switch to temp files + `--slurpfile` + `curl -d @file` in all scripts.

- [x] `public/chorus-plugin/bin/on-stop.sh` — last turn only
- [x] `public/chorus-plugin/bin/on-subagent-stop.sh` — last turn + temp files + sourceSessionId
- [x] `public/chorus-plugin/bin/on-session-end.sh` — last turn + temp files + sourceSessionId
- [x] Verify: `bash public/chorus-plugin/bin/test-syntax.sh` — all pass

## Bug #3: Frontend tokensSum missing cache fields

**Symptom**: Page shows ~1.9k when CC reports ~23k.

**Root cause**: `tokensSum()` only summed `input_tokens + output_tokens`, missing `cache_creation_input_tokens` and `cache_read_input_tokens`. CC's formula includes all 4 fields.

**Fix**: `tokensSum = input + output + cache_create + cache_read` in all 4 components.

- [x] `tokens-view.tsx` — fixed
- [x] `agent-observability.tsx` — fixed
- [x] `task-tokens-view.tsx` — fixed
- [x] `token-usage-card.tsx` — fixed
- [x] Verify: `npx tsc --noEmit` — clean

## Verification

With a fresh CC session + new project:
1. Run full yolo pipeline (idea → proposal → reviewer → approve → dev → verify)
2. Check observability page: total tokens should match CC's reported total_tokens
3. Review phase should show reviewer's tokens (~23k), not 0 or 3.6k
4. Execution phase should show dev's tokens (~34k)
5. No tokens should leak into elaboration phase from reviewers
89 changes: 89 additions & 0 deletions docs/token-observability-changes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Token Observability Changes Summary

## Architecture Overview

Token usage is tracked per-assistant-turn via `TokenUsageRecord` table (decoupled from AgentSession). The CC Stop hook fires every assistant turn, uploading the full transcript turns + entity timeline to the server. Server does attribution and dedup.

## Changed Files

### Shell Scripts (Plugin)

**`public/chorus-plugin/bin/on-stop.sh`**
- Stop hook fires every assistant turn (async)
- Extracts turns (per-assistant-message usage with timestamp) from transcript
- **NEW**: Extracts entity timeline from transcript's MCP tool_use blocks (was: tool-log.jsonl)
- Builds payload via temp files + jq --slurpfile (avoids shell arg length limits)
- POSTs to `/api/agent-report/token-usage` with `sourceSessionId` for dedup

**`public/chorus-plugin/bin/on-subagent-stop.sh`**
- Layer 3 added: parses sub-agent transcript for turns + timeline, POSTs to server
- **NEW**: Extracts entity timeline from sub-agent's own transcript (was: tool-log.jsonl filtered by agent_id)
- Passes `sessionUuid` (Chorus session) so server can distinguish sub-agent vs main agent records

**`public/chorus-plugin/bin/on-session-end.sh`**
- Same transcript-based timeline extraction as on-stop.sh
- Final upload on session close

**`public/chorus-plugin/hooks/hooks.json`**
- Added Stop hook entry with nested format: `{matcher: "", hooks: [{type, command, async: true}]}`

### Server (API + Service)

**`src/app/api/agent-report/token-usage/route.ts`**
- Accepts `{sourceSessionId?, sessionUuid?, turns[], timeline[]}`
- Calls `attributeTokenUsage()` for per-turn entity attribution
- **NEW**: Calls `resolveProjectUuids()` (plural) for per-record projectUuid resolution
- Each record gets projectUuid based on its own entityUuid, not a single shared value

**`src/services/observability.service.ts`**

Key functions:
- `attributeTokenUsage()` — per-turn records with timeline entity matching via `findActiveEntity()` (carry-forward: last timeline entry before turn timestamp)
- **NEW**: `resolveProjectUuids()` — batch-queries all entity UUIDs, returns Map<entityUuid, projectUuid>. Replaces old `resolveProjectUuid()` which returned a single value for all records
- `insertAttributedTokenUsage()` — uses `prisma.createMany` with `skipDuplicates` on `(sourceSessionId, turnTimestamp)` unique constraint
- `getIdeaLifecycleTokens()` — **NEW phase token logic**: uses `sessionUuid` to split tokens between phases:
- proposal entity + sessionUuid → review (sub-agent reviewer)
- proposal entity + no sessionUuid → proposal drafting (main agent)
- task entity + sessionUuid → execution (sub-agent dev)
- task entity + no sessionUuid → verify (main agent admin)
- idea entity → elaboration

### Database

**`prisma/schema.prisma`**
- Added `TokenUsageRecord` model with `@@unique([sourceSessionId, turnTimestamp])` for dedup
- Removed `tokenUsage` JSON field from `AgentSession`

## Key Design Decisions

1. **Timeline from transcript, not tool-log.jsonl** — Each agent's transcript is independent. tool-log.jsonl is shared and mixes agents. Transcript has MCP tool_use blocks with entity UUIDs in input params.

2. **Per-record projectUuid** — A single CC session may span multiple projects. Each record's entityUuid resolves to its own projectUuid. Records without entity get null projectUuid.

3. **sessionUuid distinguishes main agent vs sub-agent** — Sub-agents have Chorus sessions (sessionUuid set). Main agent records have sessionUuid null. This is used for phase attribution in lifecycle views.

4. **Server-side dedup** — Stop hook uploads full transcript every turn. `skipDuplicates` on `(sourceSessionId, turnTimestamp)` prevents re-insertion. Only new turns get inserted.

5. **Server resolves projectUuid** — Client doesn't need to track project. Server looks up entity → project via DB (task→proposal→project, idea→project, proposal→project).

## Known Limitations

- **Long CC sessions across projects**: The carry-forward entity attribution means turns between entity tool calls inherit the last entity. In a single-project session this is correct. In a multi-project session, turns after switching projects but before the first entity tool call in the new project may still be attributed to the previous project's entity.
- **Stale data from old uploads**: Records inserted by older code versions (single projectUuid, tool-log.jsonl timeline) remain in the DB. They are dedup-protected and won't be overwritten. For clean testing, use a brand new project in a fresh CC session.

## Testing Checklist

To verify E2E in a **new CC session** (important — avoids stale data):

1. Create a new project
2. Create idea, claim, skip elaboration (generates idea entity timeline entries)
3. Create proposal with doc + task drafts, submit (generates proposal entity entries)
4. Spawn proposal-reviewer sub-agent (generates reviewer token upload via on-subagent-stop)
5. Approve proposal (materializes tasks)
6. Spawn dev sub-agent to execute task (generates task entity + dev token upload)
7. Verify task as admin
8. Check observability page: project total should be reasonable (not millions)
9. Check idea detail → Tokens tab: lifecycle phases should have distinct non-zero values
10. Check proposal detail: Total Tokens and Tool Calls should reflect actual work
11. Verify Review phase shows reviewer's tokens (non-zero), separate from Proposal drafting
12. Verify Execution phase shows dev's tokens, separate from Verify phase
67 changes: 67 additions & 0 deletions messages/en.json
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@
"proposals": "Proposals",
"tasks": "Tasks",
"activity": "Activity",
"observability": "Observability",
"logout": "Sign out",
"newProject": "New Project",
"backToProjects": "Projects",
Expand Down Expand Up @@ -1126,6 +1127,7 @@
"elaboration": "Elaboration",
"proposal": "Proposal",
"tasks": "Tasks",
"tokens": "Tokens",
"activity": "Activity"
},
"timeline": {
Expand Down Expand Up @@ -1192,5 +1194,70 @@
"project": "Project",
"project_group": "Group"
}
},
"observability": {
"title": "Agent Observability",
"subtitle": "Token usage and tool call metrics across all agents",
"range7d": "7d",
"range30d": "30d",
"range90d": "90d",
"totalTokens": "Total Tokens",
"toolCalls": "Tool Calls",
"sessions": "Sessions",
"cacheRead": "Cache Read",
"cacheWrite": "Cache write",
"outputTokens": "Output",
"errorRate": "Error Rate",
"callsPerDay": "{count}/day",
"input": "Input",
"output": "Output",
"agents": "Agents",
"agentsCount": "{count} total",
"onlineCount": "{count} online",
"tokensSuffix": "tokens",
"dailyTokenUsage": "Daily Token Usage",
"legendInput": "Input",
"legendOutput": "Output",
"today": "Today",
"toolColumn": "Tool",
"callsColumn": "Calls",
"tokensColumn": "Tokens",
"avgMsColumn": "Avg ms",
"errorsColumn": "Errors",
"noData": "No agent activity yet",
"noDataDesc": "Tool usage data will appear once agents start working on this project.",
"noToolData": "No tool usage in this range",
"selectAgent": "Select an agent",
"selectAgentDesc": "Pick an agent on the left to see their daily tokens and tool usage.",
"loading": "Loading...",
"loadError": "Failed to load observability data",
"rolePm": "PM Agent",
"roleDeveloper": "Developer",
"roleAdmin": "Admin",
"agent": "Agent",
"loadFailed": "Failed to load token usage",
"lifecycle": "Lifecycle breakdown",
"phase": {
"elaboration": "Elaboration",
"proposal": "Proposal drafting",
"review": "Review",
"execution": "Execution",
"verify": "Verify"
},
"taskList": "Per-task usage",
"noTasks": "No tasks spawned from this idea yet",
"drafting": "Drafting",
"draftingBreakdown": "Drafting breakdown",
"draftingDocs": "Docs",
"draftingTasks": "Tasks",
"draftingValidate": "Validate",
"reviewRounds": "Review rounds",
"reviewPass": "PASS",
"reviewFail": "FAIL",
"toolTimeline": "Tool calls",
"toolCall": "{count, plural, one {# call} other {# calls}}",
"toolErrors": "{count, plural, one {# error} other {# errors}}",
"errors": "Errors",
"sessionInfo": "Session info"
}
}
67 changes: 67 additions & 0 deletions messages/zh.json
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@
"proposals": "提案",
"tasks": "任务",
"activity": "动态",
"observability": "可观测性",
"logout": "退出登录",
"newProject": "新建项目",
"backToProjects": "项目",
Expand Down Expand Up @@ -1127,6 +1128,7 @@
"elaboration": "需求细化",
"proposal": "提案",
"tasks": "任务",
"tokens": "Tokens",
"activity": "动态"
},
"timeline": {
Expand Down Expand Up @@ -1193,5 +1195,70 @@
"project": "项目",
"project_group": "分组"
}
},
"observability": {
"title": "智能体可观测性",
"subtitle": "所有智能体的 Token 使用与工具调用指标",
"range7d": "7 天",
"range30d": "30 天",
"range90d": "90 天",
"totalTokens": "Token 总量",
"toolCalls": "工具调用",
"sessions": "会话",
"cacheRead": "缓存读取",
"cacheWrite": "缓存写入",
"outputTokens": "输出",
"errorRate": "错误率",
"callsPerDay": "{count}/天",
"input": "输入",
"output": "输出",
"agents": "智能体",
"agentsCount": "共 {count} 个",
"onlineCount": "{count} 个在线",
"tokensSuffix": "tokens",
"dailyTokenUsage": "每日 Token 使用",
"legendInput": "输入",
"legendOutput": "输出",
"today": "今日",
"toolColumn": "工具",
"callsColumn": "调用数",
"tokensColumn": "Token",
"avgMsColumn": "平均耗时",
"errorsColumn": "错误",
"noData": "暂无智能体活动",
"noDataDesc": "当智能体开始在此项目上工作后,工具使用数据将在此显示。",
"noToolData": "此区间内暂无工具使用",
"selectAgent": "选择智能体",
"selectAgentDesc": "在左侧选择一个智能体以查看其每日 Token 与工具使用情况。",
"loading": "加载中...",
"loadError": "加载可观测性数据失败",
"rolePm": "PM 智能体",
"roleDeveloper": "开发者",
"roleAdmin": "管理员",
"agent": "智能体",
"loadFailed": "加载 token 使用记录失败",
"lifecycle": "生命周期分布",
"phase": {
"elaboration": "需求细化",
"proposal": "提案起草",
"review": "评审",
"execution": "执行",
"verify": "验证"
},
"taskList": "按任务分布",
"noTasks": "该想法尚未派生任何任务",
"drafting": "起草",
"draftingBreakdown": "起草阶段分布",
"draftingDocs": "文档",
"draftingTasks": "任务",
"draftingValidate": "校验",
"reviewRounds": "评审轮次",
"reviewPass": "通过",
"reviewFail": "未通过",
"toolTimeline": "工具调用",
"toolCall": "{count} 次调用",
"toolErrors": "{count} 个错误",
"errors": "错误",
"sessionInfo": "会话信息"
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
-- AlterTable
ALTER TABLE "AgentSession" ADD COLUMN "tokenUsage" JSONB;

-- CreateTable
CREATE TABLE "ToolUsageEvent" (
"id" SERIAL NOT NULL,
"uuid" TEXT NOT NULL,
"companyUuid" TEXT NOT NULL,
"agentUuid" TEXT NOT NULL,
"sessionUuid" TEXT,
"toolName" TEXT NOT NULL,
"source" TEXT NOT NULL DEFAULT 'mcp',
"durationMs" INTEGER NOT NULL,
"inputSize" INTEGER NOT NULL,
"outputSize" INTEGER NOT NULL,
"isError" BOOLEAN NOT NULL DEFAULT false,
"errorText" TEXT,
"entityType" TEXT,
"entityUuid" TEXT,
"projectUuid" TEXT,
"createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,

CONSTRAINT "ToolUsageEvent_pkey" PRIMARY KEY ("id")
);

-- CreateIndex
CREATE UNIQUE INDEX "ToolUsageEvent_uuid_key" ON "ToolUsageEvent"("uuid");

-- CreateIndex
CREATE INDEX "ToolUsageEvent_companyUuid_createdAt_idx" ON "ToolUsageEvent"("companyUuid", "createdAt");

-- CreateIndex
CREATE INDEX "ToolUsageEvent_agentUuid_createdAt_idx" ON "ToolUsageEvent"("agentUuid", "createdAt");

-- CreateIndex
CREATE INDEX "ToolUsageEvent_sessionUuid_idx" ON "ToolUsageEvent"("sessionUuid");

-- CreateIndex
CREATE INDEX "ToolUsageEvent_entityType_entityUuid_idx" ON "ToolUsageEvent"("entityType", "entityUuid");

-- CreateIndex
CREATE INDEX "ToolUsageEvent_projectUuid_createdAt_idx" ON "ToolUsageEvent"("projectUuid", "createdAt");
Loading
Loading