jaemk · jaemk · Jun 21, 2026 · Jun 14, 2026 · Jun 14, 2026 · Jun 15, 2026
diff --git a/.agents/skills/pr-cycle/SKILL.md b/.agents/skills/pr-cycle/SKILL.md
@@ -64,8 +64,8 @@ judgment core and push everything else to cheaper models or to no model at all.
 | Tier | What | Steps | Model |
 |------|------|-------|-------|
 | 0 — mechanical | All GitHub API ops, `make ci`, README regen, push preamble | 1, 5, 6, 8(preamble), 9, 10 | script (`pr.py`) |
-| 1 — cheap delegation | Local review via `pr-review` (read-only sub-agents); mechanical/repetitive fix application | 2, 4b, 11 | Sonnet (pinned in agent def; review sub-agents overridable per-run, e.g. to opus — see [Input](#input)) |
-| 2 — judgment core | Classify findings; write explicit fix specs + test assertions; sync audit | 3, 4a, 7 | Opus (session model) |
+| 1 — cheap delegation | Local review via `pr-review` (read-only sub-agents); fix application, fanned out across disjoint sub-agents | 2, 4b, 11 | Sonnet by default; per-group `model` override to Opus for harder groups (see 4b); review sub-agents overridable per-run (see [Input](#input)) |
+| 2 — judgment core | Classify findings; write explicit fix specs + test assertions; partition the fan-out; sync audit | 3, 4a, 7 | Opus (session model) |
 
 A Sonnet session can drive the whole cycle; only Tier-2 actually needs strong
 reasoning, so consider switching the session to a cheaper model once the judgment
@@ -107,8 +107,9 @@ review-agent model override from the Input if one was given. `pr-review` will:
 
 - acquire the diff (`.agents/skills/pr-cycle/pr.py PR_NUMBER diff`, equivalent to
   `git diff origin/master`),
-- spawn `pr-code-reviewer` and `pr-consumer-reviewer` in parallel (read-only, each
-  carrying its own rubric; Sonnet by default, or the overridden model on **both**
+- shard the changed material into appropriately sized, randomized chunks and spawn one
+  `pr-code-reviewer` and one `pr-consumer-reviewer` per shard in parallel (read-only,
+  each carrying its own rubric; Sonnet by default, or the overridden model on **all**
   spawns), and
 - return a consolidated findings report with severity and a per-finding verdict.
 
@@ -157,19 +158,46 @@ Common fix types:
 - Trybuild golden file regeneration: `TRYBUILD=overwrite cargo test --no-default-features --features "proc_macro,time_stores" compile_fail_macro_arg_validation`
 - Macro code changes: `cached_proc_macro/src/`
 
-#### 4b. Apply fixes (route per fix)
-
-For each spec, choose the routing:
-
-- **Delegate to `pr-fix-implementer` (Sonnet)** when the fix is **mechanical or
-  repetitive across multiple files** (e.g. the same change in all six sharded stores,
-  a doc-string pattern replicated across store modules). State "delegating because:
-  <reason>". Spawn the `pr-fix-implementer` agent with the fix spec as the prompt.
-- **Apply inline** when the fix is **a one-off, subtle, or logic/macro change**.
-  For small one-off edits the spec-writing + verification round-trip costs more than
-  editing directly. State "applying inline because: <reason>".
-
-After all fixes are applied, verify with:
+#### 4b. Partition the fixes and fan out across disjoint sub-agents
+
+Once every valid finding has a spec, apply them by **fanning out across as many
+parallel sub-agents as the specs allow**, rather than applying them serially in the
+orchestrator. Two rules govern the fan-out:
+
+**Disjoint partitioning (correctness).** Parallel agents share one working tree, so two
+agents must never write the same file — concurrent edits to one file race and corrupt
+each other. Partition the specs into groups whose **written-file sets do not overlap**:
+
+- For each spec, compute the full set of files it writes — the Target file(s) *and* the
+  test file its Test clause adds to (often `tests/cached.rs`).
+- Any two specs that share a written file MUST land in the same group. A common sink
+  like `tests/cached.rs` therefore pulls every test-adding spec into one group — that is
+  expected; keep that group together rather than risking a race.
+- Otherwise split into as many groups as possible — ideally one spec per group — to
+  maximize parallelism. More disjoint groups means more concurrency.
+
+**Appropriate model per group (cost).** Each group is handled by a `pr-fix-implementer`
+agent spawned with the Agent tool's `model` parameter set to the tier the group's
+*hardest* fix needs:
+
+- `model: sonnet` (the agent's default) — mechanical or repetitive groups: doc/comment
+  updates, a pattern replicated across the sharded stores, golden-file regen, simple
+  test additions.
+- `model: opus` — groups containing a subtle logic change, a macro change in
+  `cached_proc_macro/src/`, or any fix whose application still needs real reasoning. The
+  spec from 4a is already precise enough to hand off (it must be, to be valid); raising
+  the implementer's model buys more careful application, not more decision latitude.
+
+Spawn all groups **in a single message** (multiple Agent calls) so they run concurrently.
+Each agent's prompt is the verbatim fix spec(s) for its group. Before spawning, state the
+partition: list each group, the files it owns, its model, and why that model.
+
+**Inline fallback.** Skip the fan-out and edit directly only in the degenerate case where
+it cannot pay off: a single spec, or a few tiny one-off edits that all touch one
+overlapping region (so they cannot be partitioned anyway). State "applying inline because:
+<reason>".
+
+After all agents report back, verify with:
 
 ```bash
 .agents/skills/pr-cycle/pr.py PR_NUMBER ci

diff --git a/.agents/skills/pr-review/SKILL.md b/.agents/skills/pr-review/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: pr-review
-description: Targeted, read-only review of a PR or checked-out branch. Acquires the diff (a PR number, or the current branch vs origin/master), spawns an independent code-review sub-agent and a library-consumer sub-agent in parallel, then aggregates their findings into a single report with severity and a valid / already-fixed / invalid verdict for each. Read-only — it does not edit files, commit, push, or touch the GitHub PR conversation. The review sub-agents default to Sonnet but can be overridden per run (e.g. to opus). Use when asked to "review this PR", "review the branch", "what's wrong with this diff", "do a code review", or "review with opus". For the full review → fix → push → resolve loop, use `pr-cycle` (which delegates its review step here).
+description: Targeted, read-only review of a PR or checked-out branch. Acquires the diff (a PR number, or the current branch vs origin/master), shards the changed material into appropriately sized, randomized chunks, and spawns multiple read-only code-review and library-consumer sub-agents in parallel (one per shard), then aggregates and de-duplicates their findings into a single report with severity and a valid / already-fixed / invalid verdict for each. Read-only — it does not edit files, commit, push, or touch the GitHub PR conversation. The review sub-agents default to Sonnet but can be overridden per run (e.g. to opus). Use when asked to "review this PR", "review the branch", "what's wrong with this diff", "do a code review", or "review with opus". For the full review → fix → push → resolve loop, use `pr-cycle` (which delegates its review step here).
 allowed-tools: Bash, Read, Agent
 ---
 
@@ -13,8 +13,9 @@ findings, then goes on to address, push, and resolve them.
 
 ## Scope — what this does and does not do
 
-**Does:** acquire the diff, spawn the two read-only review sub-agents, evaluate
-their findings, and report them with severity and a verdict.
+**Does:** acquire the diff, shard it into appropriately sized chunks, spawn the
+read-only review sub-agents (one per shard, multiple of each type), evaluate and
+de-duplicate their findings, and report them with severity and a verdict.
 
 **Does NOT:** edit files, run `make ci`, regenerate the README, commit, or push; and
 it does **not** interact with the GitHub PR conversation — it does not read existing
@@ -29,8 +30,8 @@ This skill is purely advisory: its output is a findings report for a human (or f
 
 | Tier | What | Step | Model |
 |------|------|------|-------|
-| 1 — cheap delegation | Read-only review sub-agents | 2 | Sonnet (pinned in agent def; overridable per-run, e.g. to opus — see [Input](#input)) |
-| 2 — judgment core | Classify findings into valid / already-fixed / invalid | 3, 4 | session model (use Opus for the verdict pass) |
+| 1 — cheap delegation | Read-only review sub-agents, one per shard | 3 | Sonnet (pinned in agent def; overridable per-run, e.g. to opus — see [Input](#input)) |
+| 2 — judgment core | Shard the material; de-duplicate and classify findings into valid / already-fixed / invalid | 2, 4, 5 | session model (use Opus) |
 
 ## Input
 
@@ -41,63 +42,116 @@ A target and an optional review-agent model override, in any order.
   `gh pr view --json number` (run with the sandbox disabled — see below), but a PR is
   **not required**: a plain checked-out branch is reviewed by diffing against
   `origin/master`.
-- **Review-agent model**: the model used by the two sub-agents (`pr-code-reviewer`,
+- **Review-agent model**: the model used by the two reviewer types (`pr-code-reviewer`,
   `pr-consumer-reviewer`) **defaults to `sonnet`**, but can be overridden. If the input
   names a model (e.g. "review with opus", "opus reviewers", "model=opus"), pass that
-  model to the Agent tool's `model` parameter when spawning **both** sub-agents in
-  step 2. With no override, omit `model` so each agent uses its pinned Sonnet default.
+  model to the Agent tool's `model` parameter when spawning **all** shard sub-agents in
+  step 3. With no override, omit `model` so each agent uses its pinned Sonnet default.
+- **Shard sizing (optional)**: by default the orchestrator sizes shards automatically
+  from the review-agent model — smaller shards for cheaper models, larger for stronger
+  ones (see step 2). Override with an explicit target in the input if you want finer or
+  coarser splitting, e.g. "shards of ~4 files", "one file per shard", or "single shard"
+  (the latter restores the old whole-diff-per-reviewer behavior).
 
 Announce the resolved target and review-agent model at the start — e.g. "Reviewing
 the current branch with **opus** reviewers" or "Reviewing PR #264 with Sonnet
-reviewers" — before spawning anything.
+reviewers" — before spawning anything. After sharding (step 2), announce the shard
+counts (e.g. "3 code shards, 2 consumer shards") before spawning the reviewers.
 
 ## Steps
 
-### 1. Acquire the diff
+### 1. Acquire the diff and build the review inventory
 
 The diff is `git diff origin/master`, which works for any checked-out branch whether
 or not it has a PR:
 
 ```bash
 git diff origin/master
+git diff origin/master --stat
 ```
 
 If you are targeting a specific PR, the `pr-cycle` helper prints the identical diff
-and is equivalent:
+and is equivalent (`.agents/skills/pr-cycle/pr.py PR_NUMBER diff`).
 
-```bash
-.agents/skills/pr-cycle/pr.py PR_NUMBER diff
-```
+From the changed-file list, build an inventory of **review units**. A unit is normally
+one changed file, with one exception: keep **atomic couplings** together as a single
+unit — a trybuild `tests/ui/<case>.rs` and its matching `<case>.stderr` (and any paired
+source) must travel together, since reviewing one without the other is meaningless.
+
+Tag each unit with the reviewer type(s) it needs:
+- **Code-review set** — all code: `cached_proc_macro/src/`, `src/`, `tests/`, examples.
+  Essentially every changed `.rs` file and golden file.
+- **Consumer-review set** — public-facing surface only: `src/lib.rs`, the public APIs in
+  `src/stores/`, `cached_proc_macro/src/lib.rs` (the macro attribute surface),
+  `README.md`, `CHANGELOG.md`, `docs/migrations/`, and `examples/`. Internal macro
+  plumbing and internal test helpers are not consumer-relevant.
+
+A unit may belong to both sets (e.g. `src/lib.rs`).
+
+### 2. Shard each set into appropriately sized, randomized chunks
 
-Capture the full diff text — it is fed verbatim to both sub-agents.
+The code set and the consumer set are sharded **independently**. Sharding has two jobs:
+keep each shard small enough that the review model attends to every line, and vary the
+grouping between rounds so repeated reviews surface different findings.
 
-### 2. Spawn two independent sub-agents in parallel
+**a. Pick the target shard size from the review-agent model.** Cheaper models get
+smaller shards; stronger models absorb more per shard without losing attention:
 
-**Agent A — code reviewer**: Spawn with the `pr-code-reviewer` agent type. Prompt must
-include:
-- The PR number (or branch name, if there is no PR)
-- The full diff (from step 1)
+| Review model | Target per shard |
+|--------------|------------------|
+| sonnet (default) | ~600-900 changed diff lines, or ~4-6 units |
+| opus | ~1500-2500 changed diff lines, or ~10-15 units |
 
-**Agent B — library consumer**: Spawn with the `pr-consumer-reviewer` agent type. Prompt
-must include:
-- The PR number (or branch name)
-- The full diff
-- The current `src/lib.rs` doc comments and `README.md` (or relevant excerpts covering
-  the changed APIs)
+An explicit shard-size override from the Input wins over this table. Use the
+`--stat` line counts from step 1 for packing.
+
+**b. Randomize the grouping, then pack.** Produce a fresh random ordering of the units
+each run — `shuf` reseeds from the OS on every invocation, so each round yields a
+different permutation:
+
+```bash
+git diff origin/master --name-only | shuf
+```
 
-Both agents are read-only (no Edit/Write tools) and carry their full rubrics in their
+Pack the shuffled unit list greedily: add units to the current shard until adding the
+next would exceed the target size, then start a new shard. Because the order is
+reshuffled every round, a given file lands with different neighbors each time — reviewers
+see different cross-file context and surface different cross-cutting findings. Do **not**
+re-sort the shuffled list into a tidy order; the randomness is the point. (Atomic
+couplings from step 1 stay intact as one unit through the shuffle.)
+
+This yields some number of code shards and consumer shards (each typically a handful).
+Announce the counts before spawning.
+
+### 3. Spawn one sub-agent per shard, in parallel
+
+For each **code shard**, spawn a `pr-code-reviewer`. For each **consumer shard**, spawn a
+`pr-consumer-reviewer`. Every agent's prompt must include:
+- The target (PR number, or branch name if there is no PR)
+- The explicit list of files in its shard
+- An instruction to **scope its review to those files**: acquire its slice with
+  `git diff origin/master -- <files...>` and Read those files in full for context, but
+  report findings only on the assigned files.
+- (consumer shards only) a pointer to the current `src/lib.rs` doc comments and
+  `README.md` for the APIs its files touch.
+
+Both agent types are read-only (no Edit/Write) and carry their full rubrics in their
 agent definitions — do not re-specify the rubric in the prompt.
 
 **Model override:** if the input requested a review-agent model (see [Input](#input)),
-pass it to the Agent tool's `model` parameter on **both** spawns (e.g. `model: "opus"`).
+pass it to the Agent tool's `model` parameter on **every** spawn (e.g. `model: "opus"`).
 With no override, omit `model` so each agent uses its pinned Sonnet default.
 
-Launch both agents in parallel. Wait for both to complete before proceeding.
+Spawn **all** shard agents in a single message so they run concurrently, and wait for all
+to complete before proceeding. (Harness concurrency is capped; excess agents queue and
+still complete.)
 
-### 3. Evaluate all findings
+### 4. Evaluate all findings (de-duplicate across shards)
 
-Collect both sub-agent reports. For each finding, assign a verdict and explain your
-reasoning:
+Collect every shard's report. Shards are disjoint, so most findings are unique, but a
+cross-cutting issue can be reported by more than one shard (or by both a code and a
+consumer reviewer) — **merge duplicates into one finding** before judging. For each
+finding, assign a verdict and explain your reasoning:
 
 - **Valid** — the concern is real and the code should change.
 - **Already fixed** — the concern was valid in principle but the current code already
@@ -110,13 +164,15 @@ This verdict pass is the judgment core; run it on the session model (use Opus).
 soften or pad — an invalid finding called valid sends `pr-cycle` (or a human) chasing a
 non-issue.
 
-### 4. Report
+### 5. Report
 
 Present a single consolidated report:
 
 - The target reviewed (PR number or branch name) and the review-agent model used.
-- **Code-reviewer findings**: total count, broken down by severity (high / medium / low),
-  and by verdict (valid / already-fixed / invalid).
+- **Sharding**: how many code shards and consumer shards ran, and the target shard size
+  used.
+- **Code-reviewer findings**: total count (after de-dup), broken down by severity
+  (high / medium / low), and by verdict (valid / already-fixed / invalid).
 - **Consumer-reviewer findings**: the same breakdown.
 - For each **valid** finding: a one-line summary, the `file:line` (or area), and why it
   matters — enough that `pr-cycle` or a human can act on it without re-reading the agent

diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -15,7 +15,7 @@ jobs:
     runs-on: ubuntu-latest
 
     steps:
-    - uses: actions/checkout@v4
+    - uses: actions/checkout@v6
 
     - uses: dtolnay/rust-toolchain@1.96.0
       with:

diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
@@ -10,12 +10,24 @@ jobs:
     environment: release  # Optional: for enhanced security
     permissions:
       id-token: write     # Required for OIDC token exchange
+      contents: write     # Required for pushing tags and creating GitHub releases
     steps:
     - uses: actions/checkout@v6
+      with:
+        fetch-depth: 0    # Full history needed so --generate-notes can compute the diff range from the previous tag
     - uses: rust-lang/crates-io-auth-action@v1
       id: auth
     - name: Publish to crates.io
+      id: publish
       run: bash bin/publish.sh
       env:
         CARGO_REGISTRY_TOKEN: ${{ steps.auth.outputs.token }}
+    - name: Tag and create GitHub releases
+      # Tags every publishable workspace crate that lacks a tag or release yet,
+      # including backfilling crates published in earlier runs; idempotent
+      # (skips tags and releases that already exist on the remote).
+      if: steps.publish.outcome == 'success'
+      env:
+        GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+      run: bash bin/tag-release.sh
 
diff --git a/.gitignore b/.gitignore
@@ -7,3 +7,4 @@ _tmp_readme.md
 local/
 !local/.gitkeep
 .antigravitycli/
+.claude/worktrees/