Skip to content

DUUL v1.1 cost-reduction: byte budget, per-tool model, cost warning#11

Merged
devplanningo merged 5 commits into
masterfrom
devplanningo/dev-plan-impl
Apr 21, 2026
Merged

DUUL v1.1 cost-reduction: byte budget, per-tool model, cost warning#11
devplanningo merged 5 commits into
masterfrom
devplanningo/dev-plan-impl

Conversation

@devplanningo
Copy link
Copy Markdown
Contributor

Summary

Implements Tasks 2–4 from plans/v1.1-cost-reduction.md plus two rounds of review polish. Each feat commit is independently shippable; the two trailing commits are review fixups.

  • Task 2 — Reviewer byte budget (c5bb4e4) — adds a DUUL_MAX_REVIEWER_BYTES opt-in cap threaded through executeFilesystemTool via a mutable ReviewerByteBudget counter. Short-circuits further file reads once exceeded, with a prompt addendum telling the reviewer to submit its verdict.
  • Task 3 — Per-tool model override (8091056)reviewer_config.model now accepts either a string (unchanged) or { plan?, code?, partition? } so callers can downgrade code / partition without touching plan. Resolved model is included in the provider cache key to prevent collisions.
  • Task 4 — Iteration cost warning (2b6b159) — server populates cost_warning on the review response once iteration_count >= ceil(iteration_limit * 0.6), with the per-round cost estimate so the orchestrator can decide whether to keep iterating, accept a REVISE-with-minor-issues, or escalate to human.
  • Fix: byte-budget safety + default opt-in (1a821a1) — count UTF-8 bytes (not UTF-16 code units), make the default cap Infinity (early measurements showed 200KB tripped ~1/3 of code reviews into spurious REVISEs), standardize cost_warning schema to .optional().nullable().
  • Polish (f25b799) — conditionalize the prompt's "file budget" addendum so it doesn't lie when no cap is set, skip the used counter when cap === Infinity, sync docstring recommendation with README.

Test plan

  • npm run build — clean
  • npm test — 60/60 tests pass (includes new Korean UTF-8 regression, per-tool model resolution, cost-warning threshold cases)
  • Measure a real DUUL session end-to-end once merged and compare against the v1.0 baseline (duul-tokens --since <merge>)
  • Spot-check cost_warning surfaces correctly around iteration 3/5 on a long plan ping-pong
  • Confirm DUUL_MAX_REVIEWER_BYTES=200000 still enforces the cap when set explicitly

Follow-ups (not in this PR)

  • Budget-exhausted post-LLM gate (gates_tripped += "budget_exhausted", force requires_human_review) — low priority while the default cap is unset. Noted in plans/v1.1-cost-reduction.md.
  • Task 1 (OpenAI/Anthropic prompt caching) still pending.

🤖 Generated with Claude Code

devplanningo and others added 5 commits April 21, 2026 14:22
Caps the cumulative bytes returned by reviewer filesystem tools per
review call. Once exceeded, further tool calls return a budget-exhausted
message so the reviewer submits its verdict instead of continuing to
request files.

- Add DUUL_MAX_REVIEWER_BYTES env var (default 200000).
- Thread a mutable ReviewerByteBudget through executeFilesystemTool.
- Instantiate one budget per review call in each provider's tool loop.
- Append file-budget guidance at end of plan/code review system prompts.
- Document env var in both READMEs.
- Add src/__tests__/filesystem-tools-budget.test.ts covering accumulation,
  exhaustion, no-budget backwards compat, and env resolution.

Target: reduce average code_review tokens by >=30% (from 117k baseline).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
reviewer_config.model now accepts either a single string (applied to
all review tools — existing behavior) or an object with per-tool
overrides: { plan?, code?, partition? }. Unspecified tools fall back
to REVIEW_MODEL / provider default.

- Extend ReviewerConfigSchema.model to a union of string | per-tool object.
- callReview / getProvider accept a toolName and resolve the concrete
  model for the call. Provider cache key includes the resolved model so
  per-tool models don't collide.
- Each tool (plan-review, code-review, execution-partition) passes its
  own toolName through.
- Tests in src/__tests__/per-tool-model.test.ts cover string form,
  per-tool form, partial overrides, undefined, and schema validation.
- README (en/ko) documents the override with upgrade-only guidance —
  plan defects compound so plan must stay on a strong model.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds an optional `cost_warning` field to the iteration meta output. Once
`iteration_count` crosses ~60% of `iteration_limit` (Math.ceil), the
server emits a short advisory message including the current round's
estimated cost (or "unknown amount" when pricing is unavailable).

Null below the threshold and in the iteration-limit short-circuit path
(requires_human_review already handles that case).

- Add cost_warning to IterationMetaOutputSchema.
- computeCostWarning helper in review-limits.ts.
- All three tools (plan/code/partition) compute and include it.
- Tests in src/__tests__/cost-warning.test.ts cover the 60% trigger,
  null-cost fallback, zero cost handling, and iteration 0 guard.
- CLAUDE.md documents that the orchestrator should surface the warning
  to the user before deciding to continue iterating.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three post-review tweaks landing together:

1. Count UTF-8 bytes, not UTF-16 code units, in the byte budget.
   `executeFilesystemTool` was using `result.length`, undercounting
   non-ASCII output (Korean ~3x) and letting reviews blow past the cap.
   Now uses `Buffer.byteLength(result, 'utf8')`. Adds a Korean-text
   regression test.

2. Make DUUL_MAX_REVIEWER_BYTES opt-in (default Infinity).
   Early measurements showed the 200KB default tripped ~1/3 of
   code reviews into spurious REVISEs, which cost more rounds than
   the cap saved. Infra stays in place so cost-conscious users can
   set the env var explicitly. README docs updated with guidance.

3. Standardize cost_warning schema: `.optional().nullable()`.
   Internal callers always populate it, but external MCP consumers
   that omit the field would fail validation. Harmless safety tweak.

Follow-up noted in plans/v1.1-cost-reduction.md: budget-exhausted
post-LLM gate (low priority while default cap is unset).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three small cleanups after the opt-in default switch:

1. Prompt addendum no longer claims "limited byte budget" unconditionally.
   Reviewer is now told: "if the host enforces a byte budget, you'll get
   a budget-exhausted message; otherwise read as needed." Avoids making
   the reviewer artificially conservative when no cap is configured.

2. Skip `used` accumulation when `cap === Infinity`. Harmless micro-waste
   to increment a counter nothing reads, but the guard also clarifies
   intent: tracking only matters when a cap exists.

3. Sync the `getMaxReviewerBytes` docstring recommendation with the README
   (200000–500000 range rather than a single 200000 example).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@devplanningo devplanningo merged commit 9871dab into master Apr 21, 2026
1 check passed
@devplanningo devplanningo deleted the devplanningo/dev-plan-impl branch April 21, 2026 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant