DUUL v1.1 cost-reduction: byte budget, per-tool model, cost warning#11
Merged
Conversation
Caps the cumulative bytes returned by reviewer filesystem tools per review call. Once exceeded, further tool calls return a budget-exhausted message so the reviewer submits its verdict instead of continuing to request files. - Add DUUL_MAX_REVIEWER_BYTES env var (default 200000). - Thread a mutable ReviewerByteBudget through executeFilesystemTool. - Instantiate one budget per review call in each provider's tool loop. - Append file-budget guidance at end of plan/code review system prompts. - Document env var in both READMEs. - Add src/__tests__/filesystem-tools-budget.test.ts covering accumulation, exhaustion, no-budget backwards compat, and env resolution. Target: reduce average code_review tokens by >=30% (from 117k baseline). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
reviewer_config.model now accepts either a single string (applied to
all review tools — existing behavior) or an object with per-tool
overrides: { plan?, code?, partition? }. Unspecified tools fall back
to REVIEW_MODEL / provider default.
- Extend ReviewerConfigSchema.model to a union of string | per-tool object.
- callReview / getProvider accept a toolName and resolve the concrete
model for the call. Provider cache key includes the resolved model so
per-tool models don't collide.
- Each tool (plan-review, code-review, execution-partition) passes its
own toolName through.
- Tests in src/__tests__/per-tool-model.test.ts cover string form,
per-tool form, partial overrides, undefined, and schema validation.
- README (en/ko) documents the override with upgrade-only guidance —
plan defects compound so plan must stay on a strong model.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds an optional `cost_warning` field to the iteration meta output. Once `iteration_count` crosses ~60% of `iteration_limit` (Math.ceil), the server emits a short advisory message including the current round's estimated cost (or "unknown amount" when pricing is unavailable). Null below the threshold and in the iteration-limit short-circuit path (requires_human_review already handles that case). - Add cost_warning to IterationMetaOutputSchema. - computeCostWarning helper in review-limits.ts. - All three tools (plan/code/partition) compute and include it. - Tests in src/__tests__/cost-warning.test.ts cover the 60% trigger, null-cost fallback, zero cost handling, and iteration 0 guard. - CLAUDE.md documents that the orchestrator should surface the warning to the user before deciding to continue iterating. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three post-review tweaks landing together: 1. Count UTF-8 bytes, not UTF-16 code units, in the byte budget. `executeFilesystemTool` was using `result.length`, undercounting non-ASCII output (Korean ~3x) and letting reviews blow past the cap. Now uses `Buffer.byteLength(result, 'utf8')`. Adds a Korean-text regression test. 2. Make DUUL_MAX_REVIEWER_BYTES opt-in (default Infinity). Early measurements showed the 200KB default tripped ~1/3 of code reviews into spurious REVISEs, which cost more rounds than the cap saved. Infra stays in place so cost-conscious users can set the env var explicitly. README docs updated with guidance. 3. Standardize cost_warning schema: `.optional().nullable()`. Internal callers always populate it, but external MCP consumers that omit the field would fail validation. Harmless safety tweak. Follow-up noted in plans/v1.1-cost-reduction.md: budget-exhausted post-LLM gate (low priority while default cap is unset). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three small cleanups after the opt-in default switch: 1. Prompt addendum no longer claims "limited byte budget" unconditionally. Reviewer is now told: "if the host enforces a byte budget, you'll get a budget-exhausted message; otherwise read as needed." Avoids making the reviewer artificially conservative when no cap is configured. 2. Skip `used` accumulation when `cap === Infinity`. Harmless micro-waste to increment a counter nothing reads, but the guard also clarifies intent: tracking only matters when a cap exists. 3. Sync the `getMaxReviewerBytes` docstring recommendation with the README (200000–500000 range rather than a single 200000 example). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements Tasks 2–4 from
plans/v1.1-cost-reduction.mdplus two rounds of review polish. Eachfeatcommit is independently shippable; the two trailing commits are review fixups.c5bb4e4) — adds aDUUL_MAX_REVIEWER_BYTESopt-in cap threaded throughexecuteFilesystemToolvia a mutableReviewerByteBudgetcounter. Short-circuits further file reads once exceeded, with a prompt addendum telling the reviewer to submit its verdict.8091056) —reviewer_config.modelnow accepts either a string (unchanged) or{ plan?, code?, partition? }so callers can downgradecode/partitionwithout touchingplan. Resolved model is included in the provider cache key to prevent collisions.2b6b159) — server populatescost_warningon the review response onceiteration_count >= ceil(iteration_limit * 0.6), with the per-round cost estimate so the orchestrator can decide whether to keep iterating, accept a REVISE-with-minor-issues, or escalate to human.1a821a1) — count UTF-8 bytes (not UTF-16 code units), make the default capInfinity(early measurements showed 200KB tripped ~1/3 of code reviews into spurious REVISEs), standardizecost_warningschema to.optional().nullable().f25b799) — conditionalize the prompt's "file budget" addendum so it doesn't lie when no cap is set, skip theusedcounter whencap === Infinity, sync docstring recommendation with README.Test plan
npm run build— cleannpm test— 60/60 tests pass (includes new Korean UTF-8 regression, per-tool model resolution, cost-warning threshold cases)duul-tokens --since <merge>)cost_warningsurfaces correctly around iteration 3/5 on a long plan ping-pongDUUL_MAX_REVIEWER_BYTES=200000still enforces the cap when set explicitlyFollow-ups (not in this PR)
gates_tripped += "budget_exhausted", forcerequires_human_review) — low priority while the default cap is unset. Noted inplans/v1.1-cost-reduction.md.🤖 Generated with Claude Code