Skip to content

Token accounting: dedupe fallback by message.id + fresh-vs-cached headline#11

Open
wan-huiyan wants to merge 2 commits into
dioptx:mainfrom
wan-huiyan:upstream/token-accounting
Open

Token accounting: dedupe fallback by message.id + fresh-vs-cached headline#11
wan-huiyan wants to merge 2 commits into
dioptx:mainfrom
wan-huiyan:upstream/token-accounting

Conversation

@wan-huiyan
Copy link
Copy Markdown

Two related token-reporting changes:

  1. Dedup fallback to message.iddeduplicateAssistant groups streaming chunks by requestId, but transcripts that omit requestId (older Claude Code / partial logs) bypassed it and re-introduced the ~2-3× token inflation. Chunks still share message.id, so it now keys on requestId ?? message.id.
  2. Decompose the "in" headline into fresh vs cached — the "NN in" total is ~97% cheap cache_read; a new Input X new · Y cached line separates freshly-billed input from cache reads so the cost is interpretable. Headline total unchanged.

91 tests pass, tsc clean. Supersedes #9 (consolidated with the dedup fix here).

🤖 Generated with Claude Code

wan-huiyan and others added 2 commits May 29, 2026 13:17
…absent (#5)

`deduplicateAssistant` collapses streaming assistant chunks (which share a
requestId and each report the same usage) so tokens aren't counted ~2-3x. But
assistant rows that OMIT requestId — older Claude Code versions or partial
transcripts — hit the `!msg.requestId` guard and passed straight through
un-grouped, re-introducing the exact inflation this function exists to prevent.

Those chunks still share `message.id`, so group by `requestId ?? message.id`.
Rows with neither key still pass through unchanged; behaviour for requestId-
bearing transcripts is identical.

- types.ts: add optional `message.id`
- parser.ts: key dedup on `requestId ?? message.id`; extract a `flush()` helper
- parser.test.ts: +2 cases (merge no-requestId chunks sharing message.id;
  do NOT merge no-requestId chunks with different message.ids)
- CHANGELOG: Unreleased entry

Defensive: current Claude Code transcripts carry requestId on every assistant
row (verified against a real 150-response session — dedup already correct
there), so this closes a latent edge rather than a live miscount. Full suite
89 passing; tsc clean.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "NN in" headline sums input + cache_read + cache_creation, which on a
long multi-turn session is ~97% cheap cache_read — so a user sees an alarming
"37M in / $80" without realizing almost none of it is freshly-billed input.

Add an `Input  X new · Y cached` line that splits freshly-billed input
(input + cache_creation, at $15/M + $18.75/M) from cache reads ($1.50/M), so
the cost line is interpretable. Single-session and live views; +2 tests. 89 pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant