DUUL v1.1 cost-reduction: byte budget, per-tool model, cost warning by devplanningo · Pull Request #11 · Planningo/duul

devplanningo · 2026-04-21T06:54:41Z

Summary

Implements Tasks 2–4 from plans/v1.1-cost-reduction.md plus two rounds of review polish. Each feat commit is independently shippable; the two trailing commits are review fixups.

Task 2 — Reviewer byte budget (c5bb4e4) — adds a DUUL_MAX_REVIEWER_BYTES opt-in cap threaded through executeFilesystemTool via a mutable ReviewerByteBudget counter. Short-circuits further file reads once exceeded, with a prompt addendum telling the reviewer to submit its verdict.
Task 3 — Per-tool model override (8091056) — reviewer_config.model now accepts either a string (unchanged) or { plan?, code?, partition? } so callers can downgrade code / partition without touching plan. Resolved model is included in the provider cache key to prevent collisions.
Task 4 — Iteration cost warning (2b6b159) — server populates cost_warning on the review response once iteration_count >= ceil(iteration_limit * 0.6), with the per-round cost estimate so the orchestrator can decide whether to keep iterating, accept a REVISE-with-minor-issues, or escalate to human.
Fix: byte-budget safety + default opt-in (1a821a1) — count UTF-8 bytes (not UTF-16 code units), make the default cap Infinity (early measurements showed 200KB tripped ~1/3 of code reviews into spurious REVISEs), standardize cost_warning schema to .optional().nullable().
Polish (f25b799) — conditionalize the prompt's "file budget" addendum so it doesn't lie when no cap is set, skip the used counter when cap === Infinity, sync docstring recommendation with README.

Test plan

npm run build — clean
npm test — 60/60 tests pass (includes new Korean UTF-8 regression, per-tool model resolution, cost-warning threshold cases)
Measure a real DUUL session end-to-end once merged and compare against the v1.0 baseline (duul-tokens --since <merge>)
Spot-check cost_warning surfaces correctly around iteration 3/5 on a long plan ping-pong
Confirm DUUL_MAX_REVIEWER_BYTES=200000 still enforces the cap when set explicitly

Follow-ups (not in this PR)

Budget-exhausted post-LLM gate (gates_tripped += "budget_exhausted", force requires_human_review) — low priority while the default cap is unset. Noted in plans/v1.1-cost-reduction.md.
Task 1 (OpenAI/Anthropic prompt caching) still pending.

🤖 Generated with Claude Code

Caps the cumulative bytes returned by reviewer filesystem tools per review call. Once exceeded, further tool calls return a budget-exhausted message so the reviewer submits its verdict instead of continuing to request files. - Add DUUL_MAX_REVIEWER_BYTES env var (default 200000). - Thread a mutable ReviewerByteBudget through executeFilesystemTool. - Instantiate one budget per review call in each provider's tool loop. - Append file-budget guidance at end of plan/code review system prompts. - Document env var in both READMEs. - Add src/__tests__/filesystem-tools-budget.test.ts covering accumulation, exhaustion, no-budget backwards compat, and env resolution. Target: reduce average code_review tokens by >=30% (from 117k baseline). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

reviewer_config.model now accepts either a single string (applied to all review tools — existing behavior) or an object with per-tool overrides: { plan?, code?, partition? }. Unspecified tools fall back to REVIEW_MODEL / provider default. - Extend ReviewerConfigSchema.model to a union of string | per-tool object. - callReview / getProvider accept a toolName and resolve the concrete model for the call. Provider cache key includes the resolved model so per-tool models don't collide. - Each tool (plan-review, code-review, execution-partition) passes its own toolName through. - Tests in src/__tests__/per-tool-model.test.ts cover string form, per-tool form, partial overrides, undefined, and schema validation. - README (en/ko) documents the override with upgrade-only guidance — plan defects compound so plan must stay on a strong model. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds an optional `cost_warning` field to the iteration meta output. Once `iteration_count` crosses ~60% of `iteration_limit` (Math.ceil), the server emits a short advisory message including the current round's estimated cost (or "unknown amount" when pricing is unavailable). Null below the threshold and in the iteration-limit short-circuit path (requires_human_review already handles that case). - Add cost_warning to IterationMetaOutputSchema. - computeCostWarning helper in review-limits.ts. - All three tools (plan/code/partition) compute and include it. - Tests in src/__tests__/cost-warning.test.ts cover the 60% trigger, null-cost fallback, zero cost handling, and iteration 0 guard. - CLAUDE.md documents that the orchestrator should surface the warning to the user before deciding to continue iterating. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Three post-review tweaks landing together: 1. Count UTF-8 bytes, not UTF-16 code units, in the byte budget. `executeFilesystemTool` was using `result.length`, undercounting non-ASCII output (Korean ~3x) and letting reviews blow past the cap. Now uses `Buffer.byteLength(result, 'utf8')`. Adds a Korean-text regression test. 2. Make DUUL_MAX_REVIEWER_BYTES opt-in (default Infinity). Early measurements showed the 200KB default tripped ~1/3 of code reviews into spurious REVISEs, which cost more rounds than the cap saved. Infra stays in place so cost-conscious users can set the env var explicitly. README docs updated with guidance. 3. Standardize cost_warning schema: `.optional().nullable()`. Internal callers always populate it, but external MCP consumers that omit the field would fail validation. Harmless safety tweak. Follow-up noted in plans/v1.1-cost-reduction.md: budget-exhausted post-LLM gate (low priority while default cap is unset). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Three small cleanups after the opt-in default switch: 1. Prompt addendum no longer claims "limited byte budget" unconditionally. Reviewer is now told: "if the host enforces a byte budget, you'll get a budget-exhausted message; otherwise read as needed." Avoids making the reviewer artificially conservative when no cap is configured. 2. Skip `used` accumulation when `cap === Infinity`. Harmless micro-waste to increment a counter nothing reads, but the guard also clarifies intent: tracking only matters when a cap exists. 3. Sync the `getMaxReviewerBytes` docstring recommendation with the README (200000–500000 range rather than a single 200000 example). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

devplanningo and others added 5 commits April 21, 2026 14:22

devplanningo merged commit 9871dab into master Apr 21, 2026
1 check passed

devplanningo deleted the devplanningo/dev-plan-impl branch April 21, 2026 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DUUL v1.1 cost-reduction: byte budget, per-tool model, cost warning#11

DUUL v1.1 cost-reduction: byte budget, per-tool model, cost warning#11
devplanningo merged 5 commits into
masterfrom
devplanningo/dev-plan-impl

devplanningo commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devplanningo commented Apr 21, 2026

Summary

Test plan

Follow-ups (not in this PR)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant