让 review comment 与 refactor marker 冲突可收敛#402
Conversation
在 GitHub State Contract 中补充 PR review comment fix 的完成定义,明确由 review comment 驱动的修复必须回复并 resolve 原 thread,或显式升级处理,避免只完成代码提交而漏掉 GitHub 状态闭环。 ⟦AI:AUTO-LOOP⟧ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
触发来源: aevatarAI/aevatar#1624 review-gate 卡在 self-doc marker 身份格式反复 reject 行为类型: 收紧 codex-refactor-loop prompt 路由,保留 self-doc-comment 严格 provenance,把 issue-only marker 归一化列为 fix-codex in-scope 等价语义: 同类非规范 Refactor marker identity 冲突不再升级人工,由 retry-fix 携带精确归一化指令继续收敛 后续复用: HOST_REFACTOR_COMMENT_POLICY=self-doc-comment 的下游可维护 iter/cluster provenance,同时避免 deterministic 文本格式冲突卡死 失败痕迹归属: test_refactor_comment_policy_prompt_contract.py 覆盖 implement/review-fix/quality/reflector 契约 ⟦AI:AUTO-LOOP⟧ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4d5aad1 to
aaaf4be
Compare
🤖 测试审查:缺少 review-thread 闭环契约的回归测试TL;DR
详细说明这次 prompt 中关于非规范 refactor marker 归一化的变更已有测试覆盖: 阻断点在另一处: 📎 完整 codex 原始输出(存档备查)---
pr: 402
role: tests
verdict: reject
---
## Verdict
Reject: the marker-normalization prompt changes have source-regression coverage, but the new GitHub State Contract rule for PR review comment fixes has no matching test.
## Evidence
- `skills/codex-refactor-loop/SKILL.md:532` adds a new completion contract: PR review comment fixes are incomplete until the original review thread is replied to and resolved, or explicitly escalated. `rg "PR review comment fix|review-thread closure|original thread|replied to and resolved"` finds this only in `SKILL.md`, with no matching source-regression or behavior test under `skills/codex-refactor-loop/scripts/`.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:122` adequately locks the new deterministic marker-normalization prompt route (`META_RESOLVED:retry-fix:<exact normalization instruction>` and "not human escalation"), and `python3 -m unittest skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py` passes. The gap is specifically the separate review-thread closure contract added to `SKILL.md`.
## What would change your verdict
Add a focused source-regression test that reads `SKILL.md` and asserts the GitHub State Contract contains the `PR review comment fix` row with the required reply-and-resolve-or-explicitly-escalate completion semantics. If there is existing controller/comment-monitor logic that enforces review-thread closure, add or point to a behavior test for that path as well.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:reject⟦AI:AUTO-LOOP⟧ |
🤖 质量审查通过: PR 402 可读性与简洁性无阻塞TL;DR
详细说明这组改动把“非 canonical refactor marker identity”收敛到已有流程里: implement prompt 写清 canonical marker, review-fix prompt 把确定性 marker normalization 归到现有 (A) fixable in-scope, meta-reflector 只在 stalled 时重试 fix, reviewer-quality 要求给出 exact expected marker。没有新增 helper、接口、脚本入口或第二套 parser 规则。 测试侧只扩展了 📎 完整 codex 原始输出(存档备查) ---
pr: 402
role: quality
verdict: approve
---
## Verdict
Approve: the PR keeps the change focused on the review-comment/refactor-marker convergence rules, with readable prompt edits and no new complexity, dead code, or unrelated cleanup.
## Evidence
- skills/codex-refactor-loop/prompts/implement.md:34: The canonical marker identity rule is stated directly at the existing self-doc policy point, so it avoids a second flow or helper vocabulary.
- skills/codex-refactor-loop/prompts/review-fix.md:47: The fix-codex rule classifies marker normalization as the existing (A) fixable in-scope path, which is simple and avoids an unnecessary human-decision branch.
- skills/codex-refactor-loop/prompts/meta-reflector-stalled.md:36: The stalled-route addition is a single deterministic retry rule scoped to non-canonical marker identity, not a broad new escalation mechanism.
- skills/codex-refactor-loop/prompts/reviewer-quality.md:29: The reviewer instruction asks for the exact expected canonical marker, which keeps feedback concrete and reviewable.
- skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:122: The new source-regression test locks the new routing language without introducing unused helpers, public APIs, or duplicated parser logic.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:quality:approve⟦AI:AUTO-LOOP⟧ |
🤖 架构审查阻塞:新增 GitHub 闭环契约缺少机械验证TL;DR
详细说明
当前新增测试集中在 建议最小修复:增加一个读取 📎 完整 codex 原始输出(存档备查)---
pr: 402
role: architect
verdict: reject
---
## Verdict
reject — PR adds a new GitHub completion contract but does not add the required source-regression/behavior coverage for that SKILL.md contract change.
## Evidence
- `skills/codex-refactor-loop/SKILL.md:532` adds a new required visible state contract: `PR review comment fix | Completion includes review-thread closure...`; `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:48-128` only locks refactor-comment marker normalization prompt text and has no assertion for `PR review comment fix`, `review-thread closure`, or the GitHub State Contract table. This regresses CLAUDE.md / AGENTS.md: `行为变更必须配套 **behavior test**(断言行为本身)+ **source-regression test**(对 SKILL.md 段落标题、narrow allowlist 字面、授权来源 path 等做字面断言),防止"改文档没改实现"或反之。`
- Same gap also violates the governance coupling rule in CLAUDE.md / AGENTS.md: `治理前置:架构性 / 流程性规则与对应机械验证手段同时进仓库,缺一不补口径。`
## What would change your verdict
Add mechanical coverage for the new `GitHub State Contract` row. Minimum acceptable fix: a source-regression test that reads `skills/codex-refactor-loop/SKILL.md` and asserts the `PR review comment fix` row plus the closure/escalation wording. If this is intended to drive actual GitHub thread reply/resolve behavior, add the matching behavior test or wire it into the existing controller/comment handling tests so the new completion definition is executable, not prose-only.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:architect:reject⟦AI:AUTO-LOOP⟧ |
…规则(补 review-gate 缺口) 回应 PR#402 review-gate(architect+tests reject:新 GitHub State Contract 完成规则缺机械覆盖): test_refactor_comment_policy_prompt_contract.py 加 source-regression,锁 SKILL.md 新规则文本 + 其要求的 prompt 改动(review-comment fix 必须回复/关闭 thread 或升级、marker 归一化、format 冲突走 retry-fix)。 纯加锁定测试,不改 louis4li 的产品逻辑。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🤖 tests review r2: approveTL;DR
详细说明这次 diff 主要改 与几个 prompt 的契约文本,配套扩展了 。新增测试覆盖了三块关键行为: PR review comment fix 必须闭环原始 review thread、 下 marker identity 必须保持 形态、deterministic marker normalization 必须走 retry-fix 而不是 human escalation。 我检查了 diff 中的测试质量风险: 没有新增 sleep/delay pacing, 没有 或 manual 分类, 没有削弱已有断言。已跑目标测试: , 结果 , OK。 📎 完整 codex 原始输出(存档备查)---
pr: 402
role: tests
verdict: approve
---
## Verdict
Test coverage and test quality are adequate for this prompt-contract diff.
## Evidence
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25` adds a source-regression assertion for the new `GitHub State Contract` row requiring PR review-thread closure or explicit escalation.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:63` extends the existing `self-doc-comment` contract test to lock canonical `Refactor (iter${ITERATION}/${CLUSTER_ID})` marker identity and the fix-codex non-human-decision route.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:137` adds focused coverage for deterministic marker normalization routing to `META_RESOLVED:retry-fix`, which covers the new `meta-reflector-stalled.md` behavior text.
- The diff adds no `sleep`/`delay`, no skip/manual-test marker, and no weakened assertions; the changed test remains source-regression style, matching this repository's prompt-contract test convention.
- Verification run: `python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py` passed, `Ran 7 tests in 0.002s`.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:approve⟦AI:AUTO-LOOP⟧ |
🤖 tests review r2: approve (corrected summary)TL;DR
详细说明上一条评论的 raw artifact 完整,但正文里的 inline code 被 shell 展开吃掉了;这里补一条干净版。 这次 diff 主要改 我检查了 diff 中的测试质量风险: 没有新增 📎 完整 codex 原始输出(存档备查)---
pr: 402
role: tests
verdict: approve
---
## Verdict
Test coverage and test quality are adequate for this prompt-contract diff.
## Evidence
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25` adds a source-regression assertion for the new `GitHub State Contract` row requiring PR review-thread closure or explicit escalation.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:63` extends the existing `self-doc-comment` contract test to lock canonical `Refactor (iter${ITERATION}/${CLUSTER_ID})` marker identity and the fix-codex non-human-decision route.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:137` adds focused coverage for deterministic marker normalization routing to `META_RESOLVED:retry-fix`, which covers the new `meta-reflector-stalled.md` behavior text.
- The diff adds no `sleep`/`delay`, no skip/manual-test marker, and no weakened assertions; the changed test remains source-regression style, matching this repository's prompt-contract test convention.
- Verification run: `python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py` passed, `Ran 7 tests in 0.002s`.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:approve⟦AI:AUTO-LOOP⟧ |
🤖 Architect review: 阻塞,新增 completion contract 缺 behavior guardTL;DR
详细说明这次新增的规则是行为 contract: review comment 驱动的 fix 不能只靠本地 CLAUDE.md 对这类行为变更的要求是 behavior test + source-regression test 双落地。这里 source-regression 有了,缺的是会在“没有回复/resolve 原 thread 且没有 escalation evidence”时失败的行为测试,以及对应的执行路径。 📎 完整 codex 原始输出(存档备查)---
pr: 402
role: architect
verdict: reject
---
## Verdict
Reject: the PR adds a new review-thread completion contract, but only pins the wording with source-regression assertions and does not add behavior coverage for the new required GitHub-side completion step.
## Evidence
- `skills/codex-refactor-loop/SKILL.md:532` adds a new required state transition: "PR review comment fix" completion is incomplete until the original thread is replied to and resolved, or explicitly escalated. This is a behavior contract, not just vocabulary. CLAUDE.md requires: "行为变更必须配套 **behavior test**(断言行为本身)+ **source-regression test**(对 SKILL.md 段落标题、narrow allowlist 字面、授权来源 path 等做字面断言),防止\"改文档没改实现\"或反之。"
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25` only adds source-regression checks that the `SKILL.md` sentence exists. I found no behavior test or implementation path that verifies a PR-review-comment-driven fix cannot be considered complete until the original GitHub review thread is replied to and resolved, or explicitly escalated. CLAUDE.md also requires: "**变更必须可验证**:行为约束必须落到机械验证手段(behavior test / source-regression test / 段落 lint);仅靠\"agent 应该记得\"承载的约束视为未落地。"
## What would change your verdict
Add the missing behavior enforcement and behavior test for the new completion contract. Concretely: make the controller/fix-completion path record or check the original PR review thread reply+resolution or explicit escalation before treating a PR review comment fix as complete, and add a behavior test that fails when a fix completes without that GitHub thread closure/escalation evidence. Keep the existing source-regression assertions as the documentation guard.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:architect:reject
⟦AI:AUTO-LOOP⟧ |
📊 当前状态 — review 完成,等作者更新(不需要 controller 介入)
已加 🤖 controller status banner ⟦AI:AUTO-LOOP⟧ |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
已按最新 architect review 反馈更新 PR head: 本次补齐的是
请 auto-loop 在下一轮 wakeup/re-review 中重新纳入 architect verdict。 ⟦AI:AUTO-LOOP⟧ |
📊 当前状态 — Consensus-rnd Phase review-gate(❌ 否)
下一步自动会做:3 reviewer 输出 → 真值表 reject=0&approve≥1 → 合并到 dev。 🤖 controller status banner ⟦AI:AUTO-LOOP⟧ |
🤖 Quality review 通过TL;DR
详细说明我按三点 diff 读取了 PR 文件列表和完整 diff,并打开了 touched files 的完整内容。新增的 prompt 和 SKILL.md 的改动集中在 review-thread completion 与 deterministic marker normalization,没有看到无关清理、重复抽象、注释灌水或复杂度上升。 📎 完整 codex 原始输出(存档备查)---
pr: 402
role: quality
verdict: approve
---
## Verdict
Approve: the PR is focused, readable, and does not introduce naming, dead-code, abstraction, complexity, comment, or drive-by quality issues under `HOST_REFACTOR_COMMENT_POLICY=none`.
## Evidence
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/review_fix_dispatch.py:54`: `ReviewThreadCompletionEvidence` names the business evidence directly, stays as a small value object, and avoids a generic manager/helper abstraction.
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/review_fix_dispatch.py:69`: `validate_review_thread_completion` is 13 lines with simple fail-closed branches; no method-size or cyclomatic-complexity smell.
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/controller_actions.py:574`: the controller wrapper is a direct call-through to the validation helper, so the new completion surface is reachable and not dead code.
- `skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:95`: tests cover non-thread fixes, replied/resolved completion, explicit escalation, missing thread evidence, and unresolved/replied-only failures without duplicated setup large enough to require extraction.
- `skills/codex-refactor-loop/SKILL.md:532`: the new state-contract row is scoped to PR review comment completion and does not add unrelated cleanup or extra process surface.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:quality:approve⟦AI:AUTO-LOOP⟧ |
🤖 测试覆盖 review 通过TL;DR
详细说明我检查了三点: 新增 本地 focused 验证通过: 📎 完整 codex 原始输出(存档备查)pr: 402
|
🤖 Architect review: 架构角度 approveTL;DR
详细说明我按三点 diff 审了 同时, deterministic refactor marker normalization 被保留在 fix/retry 路径,不是 human escalation。这符合“人工介入要诚实”的边界: 只有真正需要人做产品、战略、治理或权限决策时才升级 maintainer。 📎 完整 codex 原始输出(存档备查)pr: 402
|
📊 当前状态 — Consensus-rnd Phase review-gate 共识达成(❌ 否)
合并中。 🤖 controller status banner ⟦AI:AUTO-LOOP⟧ |
🤖 质量复审通过:无可读性阻塞TL;DR
详细说明我按三点 diff 和完整 touched file context 检查了新增的 review-thread completion validator、controller action 转发点、prompt contract 更新和对应 source-regression/behavior tests。新增命名贴合业务语义,方法体短且线性,prompt 改动集中在 review thread closure 与 deterministic marker normalization,没有发现 unrelated cleanup 或 refactor self-doc policy 违规。 📎 完整 codex 原始输出(存档备查)---
pr: 402
role: quality
verdict: approve
---
## Verdict
Approve: the PR is focused, the new names describe review-thread completion evidence directly, and I found no dead code, over-engineering, avoidable duplication, or refactor self-doc policy issue under `none`.
## Evidence
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/review_fix_dispatch.py:54`: `ReviewThreadCompletionEvidence` is a small domain data carrier with fields that map directly to the completion contract; no generic manager/helper naming or unnecessary interface layer.
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/review_fix_dispatch.py:69`: `validate_review_thread_completion` is short and linear, with three explicit fail-closed checks and no complexity smell.
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/controller_actions.py:574`: `validate_review_fix_completion` is reachable through controller actions and is covered by `test_controller_completion_path_fails_closed_for_open_review_thread`; the underlying validator is also directly covered for accept/block cases.
- `skills/codex-refactor-loop/prompts/review-fix.md:47`, `skills/codex-refactor-loop/prompts/reviewer-quality.md:28`, and `skills/codex-refactor-loop/prompts/meta-reflector-stalled.md:36`: prompt edits are narrow to deterministic marker normalization and do not add unrelated workflow cleanup.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:quality:approve⟦AI:AUTO-LOOP⟧ |
🤖 Tests review: approveTL;DR
详细说明这次新增逻辑的关键路径都有覆盖:非 review-thread 驱动的修复不会被误拦,review-thread 驱动的修复必须有原 thread id、已回复、已 resolved,或者明确升级证据; 我没有看到新增 📎 完整 codex 原始输出(存档备查)pr: 402
|
🤖 Architect review r2: completion gate 还没接到主路由TL;DR
详细说明PR 新增的业务规则是: 由 PR review comment 驱动的修复,必须回复并 resolve 原 thread,或者显式升级,才算完成。这个规则本身合理,但当前 diff 只新增了 问题是主链路没有用它。 需要改成: 在 📎 完整 codex 原始输出(存档备查)---
pr: 402
role: architect
verdict: reject
---
## Verdict
reject - the PR declares a new PR-review-thread completion gate, but the actual `FIX_DONE` route can still proceed to review-gate without validating or producing original-thread reply/resolution evidence.
## Evidence
- `skills/codex-refactor-loop/SKILL.md:532` adds the new contract: "fixes driven by PR review comments are incomplete until the original thread is replied to and resolved, or explicitly escalated." But `skills/codex-refactor-loop/scripts/codex_refactor_loop/wakeup_plan.py:479-485` still maps every `FIX_DONE` directly to `review-gate`, and `:517-523` still assigns `FIX_DONE` to `reviewer-codex`; there is no call to the new completion validator on that route. This violates CLAUDE.md: "**变更必须可验证**:行为约束必须落到机械验证手段(behavior test / source-regression test / 段落 lint);仅靠\"agent 应该记得\"承载的约束视为未落地。"
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/review_fix_dispatch.py:54-81` defines `ReviewThreadCompletionEvidence` and `validate_review_thread_completion`, and `skills/codex-refactor-loop/scripts/codex_refactor_loop/controller_actions.py:574-575` exposes a wrapper, but `git grep` on the PR head shows those are only referenced by the wrapper and tests. No controller helper, wakeup action, `FIX_DONE` consumer, or GitHub GraphQL thread resolver builds this evidence from live PR review threads or blocks completion when it is absent. This worsens the architecture by adding a parallel advisory check instead of making the new business rule part of the single review-fix mainline, contrary to CLAUDE.md: "**单一主干,插件扩展**:每个 skill 一条权威主链路;新能力以子模块/脚本挂载,禁止平行第二系统。"
- `skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:95-163` only unit-tests direct calls to the validator/wrapper. It does not assert that a review-thread-driven `FIX_DONE` without replied+resolved evidence is blocked before reviewer re-dispatch or merge. For a behavior rule added to the GitHub state contract, helper-only coverage is insufficient under CLAUDE.md: "**治理前置**:架构性 / 流程性规则与对应机械验证手段同时进仓库,缺一不补口径。"
## What would change your verdict
Wire the completion gate into the actual review-fix completion path: before `FIX_DONE` routes back to reviewers or any merge-eligible review gate, derive durable evidence from the original PR review thread or live GitHub state, require replied+resolved or explicit escalation, and fail closed otherwise. Add a behavior test for the route itself, not only the standalone validator, showing an unresolved original review thread prevents reviewer re-dispatch/merge and produces the expected blocked/escalation action.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:architect:reject⟦AI:AUTO-LOOP⟧ |
📊 当前状态 — Consensus-rnd Phase review-gate FIX(❌ 否,待 fork 作者)
说明:本 PR 来自 fork 分支,controller 无 fork 写权,无法自动 push fix。需 fork 作者据 architect r2 reject 补「原 thread reply/resolution 证据校验」后重触发 review-gate。详见 r2 reviewer 评论。 🤖 controller status banner ⟦AI:AUTO-LOOP⟧ |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
🤖 回复 architect r2:completion gate 已接入主路由针对 architect r2 的阻断点:
验证命令: python3 -m unittest \
skills/codex-refactor-loop/scripts/test_wakeup_plan.py \
skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py \
skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py结果: 结论:architect r2 指出的“completion gate 还没接到主路由”已修正,并且后续提交把 gate 从“全 PR unresolved thread”收窄到“原 review thread completion evidence”,避免普通 fix round 被误挡。 ⟦AI:AUTO-LOOP⟧ |
Seed review-thread completion evidence during fix dispatch and require either original-thread closure or sourced meta escalation before FIX_DONE can continue, so the gate no longer depends on manually-created artifacts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep the seeded review-thread completion contract in the prompt while reducing duplicate wording and preserving the source-regression anchors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
根因:reviewer artifact frontmatter 不带 head_sha,_review_evidence_from_artifact 取不到所审 head SHA → 所有 reviewing PR(#480/#473/#402)恒判 WAIT_OR_REDISPATCH:invalid_reviewer_evidence:missing_reviewed_head_sha,且 wakeup-plan 不自动重派 reviewer → review→merge headless 永久卡死(挡住发版)。 修法(codex hotfix worker 实现,controller 验证后提交): - controller_actions 派 reviewer 时取 headRefOid,缺则 fail-closed,注入权威 HEAD_SHA 进 rendered prompt。 - reviewer-{architect,tests,quality}.md frontmatter 写 head_sha: ${HEAD_SHA}。 - wakeup_runner._review_head_sha_for:artifact 缺 head_sha 时从 controller-rendered prompt/log 回收(不依赖 worker 自报,避免 markerless 覆辙);live-head 陈旧判定与 merge 真值表语义不变。 - wakeup_plan 为 missing/stale reviewer head 的 OPEN reviewing PR 投射可执行 dispatch_reviewers 自愈动作。 验证:controller 独立重跑 test_wakeup_runner+review_gate+wakeup_plan+controller_actions 293 通过、marker/prompt 合同 30 通过;review-gate 安全语义(stale head 仍不 merge)由 13 项 review-gate 测试锁住。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…merge 闭环 承接 r7:r7 让 reviewer 证据绑定 head_sha,但 wakeup_plan 投射的 review-evidence-redispatch 自愈动作被 closed-action-projection finalizer 强制 status_only(其 kind 不在 EXECUTABLE_ACTION_KINDS),runner 永不应用 → #480/#473/#402 缺 head 的 review 证据无法自动重派 → review_gate 恒 WAIT_OR_REDISPATCH → 并发耗干到 0。 修法(codex hotfix worker 实现,controller 验证后提交): - wakeup_plan EXECUTABLE_ACTION_KINDS 加 review-evidence-redispatch,finalizer 保留 runner authority 不再强制 status_only。 - controller_actions.dispatch_reviewers 读 stale_review_roles 只重派陈旧角色,且 _pending_review_spawn_exists 跳过已有在飞 dispatch-reviewers:<pr>:<role>:r1 intent(幂等不双派);仍走 r7 的 HEAD_SHA 注入,使重派 reviewer 的新证据 head-bound。 - merge-gate 真值表不变,仅让既有自愈动作可执行 + 定向到陈旧角色。 验证:controller 独立重跑 test_wakeup_plan+wakeup_runner+review_gate+controller_actions 295 通过。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…R headless 走到 review_gate 决策 根因:review_gate 的 CI 读 PrChecksProjection.check_pr 对 gh api pulls/<n> 与 commits/<sha>/check-runs 的单次非零返回立即 fail-closed(ci_unavailable:pull_api_failure),无重试;rate limit 健康,实为瞬时 blip → 让 #473/#402(MERGEABLE+CI 绿)每 tick 被瞬时失败挡住、never reach MERGE/FIX。 修法(最小,只动 PrChecksProjection;不碰 #322 ReleaseRequiredChecksProjection): - _run_api_read:对两处 api read 最多 3 次确定性重试(无 sleep),成功即停,持续失败仍 fail-closed(安全)。happy path 无行为变化。 - starvation 结论(worker 核实):review_gate 本就被 apply,blocked ci_unavailable 行可重评估(非 terminal duplicate-suppressed),无需改 run_once。 - e2e:test_review_gate_e2e 加「首次 pulls 读瞬时失败仍 merge」用例;test_pr_checks 加 retry 覆盖。 验证:controller 独立重跑 test_pr_checks+wakeup_runner+review_gate+review_gate_e2e+wakeup_plan+controller_actions 309 通过。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ady PR headless merge,backlog 能 drain 根因(已核实):wakeup_plan 每 tick 把 SPAWN actions(implement intent + design-consensus #495/#496/#490/#491/#418,index 0-5)排在 lifecycle review_gate(#480/#473/#402,index 6-8)之前;run_once 旧逻辑 `if applied_spawns>0 and not is_spawn_action: break` → 只要派了任一 spawn,首个 lifecycle 即 break 整 tick。design-consensus 几乎总有 spawn 活 → review_gate/merge 永远 starve → #473(approve/approve/comment → MERGE_WITH_COMMENTS,MERGEABLE+CI 绿)永不 merge,active 数永不下降。 修法(最小,核心 dispatch loop): - 引入 consumes_spawn_budget = is_spawn_action or _uses_spawn_budget(action);超 budget 时 `continue`(不 break)以便仍能抵达后续 drain-only lifecycle。 - _uses_spawn_budget:dispatch_reviewers=True、review_gate 仅当决策为 FIX(会派 fix worker)=True;否则 review_gate(MERGE/MERGE_WITH_COMMENTS)、close 等 drain-only=False → 不占 spawn budget、floor 填满后仍被评估执行。 - 保留 r6 的 spawn launch-failure break/retry 与 dispatch_design_consensus no-intents skip-and-continue;真值表/head-binding/CI-green/mergeable 安全语义不变。 结果:每 tick 既用 spawn 填满 floor,又评估 merge/close 等 drain-only lifecycle,backlog 能 drain。 验证:controller 独立重跑 test_wakeup_runner+review_gate+review_gate_e2e+wakeup_plan+controller_actions+pr_checks 309 通过;新增 test_wakeup_runner_lifecycle_review_gate_not_starved_after_spawn_batch。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… malformed — review→merge 根因 根因(实测确认):_run_command 对所有 gh 命令在 full[1:1] 插 --repo(子命令之前)。gh 不接受 --repo 作 pre-subcommand 全局 flag,且 gh api 根本不吃 --repo(repo 在 URL):`gh --repo O/R api repos/O/R/pulls/473` → rc=1 "unknown flag: --repo"。于是 wakeup-runner 的每个 gh 调用都 malformed:CI 读(PrChecksProjection(runner=_run_command) 的 gh api pulls/check-runs)→ ci_unavailable:pull_api_failure;mergeability(gh pr view)→ mergeability_unavailable;live-state 等同理。→ review gate 永远读不到 CI/mergeable → #473(approve/approve/comment=MERGE_WITH_COMMENTS,绿)、#402、#480 永不 headless 落地。r11 的 retry 只是重复同一 malformed 命令。直连 PrChecksProjection()/直连 gh 用的是 plain default runner(无注入)所以正常,掩盖了 bug。 最小修法:_run_command 仅对 `gh pr`/`gh issue` 注入 --repo,且放在子命令(及其 verb)之后的合法位置;`gh api` 完全不动;绝不 pre-subcommand 注入;已存在 --repo 则不重复。 验证:controller 独立重跑 test_wakeup_runner+review_gate+review_gate_e2e+pr_checks+wakeup_plan+controller_actions 312 通过;新增 argv-shape 测试 + e2e(stub subprocess.run 拒绝 malformed `gh --repo ... api`,ready PR 仍 merge)。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🤖 测试视角 Review 通过TL;DR
详细说明我只看测试覆盖和测试质量。 是 review-thread-driven fix 完成所需的证据容器;相关测试覆盖了非 review-thread 路径、已 reply+resolved 的通过路径、显式 escalation 的通过路径、缺少 、未 reply、未 resolved 的 fail-closed 路径,以及 controller-facing wrapper。 source-regression 也覆盖了新增规则: 的 GitHub State Contract 必须写明 PR review comment fix 需要原线程 reply+resolved 或显式 escalate; 的 deterministic marker normalization 必须走 retry-fix,不走 human escalation。没有新增 测试节奏、、manual bypass、弱化断言或 mock-only pseudo-coverage。 验证命令已在 PR head archive 上跑过: 9 tests OK; 7 tests OK。 📎 完整 codex 原始输出(存档备查)pr: 402
|
🤖 Quality review approveTL;DR
详细说明我按三点 diff 可执行代码变化集中在 验证: 我用 PR head 的 📎 完整 codex 原始输出(存档备查)pr: 402
|
🤖 架构评审:reject,review-thread 完成契约还没接到实际路由TL;DR
详细说明这次问题不在规则方向,而在落点。 代码层也一样: 需要的修复很具体:把这个 evidence gate 接到 review-fix completion 的实际路由上。行为测试要覆盖:review-thread 驱动的 📎 完整 codex 原始输出(存档备查)---
pr: 402
role: architect
head_sha: 6fdbbba603fc38507ab3efcbf53a032f353f3d0f
verdict: reject
---
## Verdict
reject: the PR adds a review-thread completion contract, but the existing FIX_DONE route can still proceed to reviewer redispatch without that evidence.
## Evidence
- skills/codex-refactor-loop/SKILL.md:532 adds a new GitHub State Contract row: "PR review comment fix | Completion includes review-thread closure: fixes driven by PR review comments are incomplete until the original thread is replied to and resolved, or explicitly escalated." But the phase routing table still says skills/codex-refactor-loop/SKILL.md:504 "`FIX_DONE` | Dispatch reviewers again." with no required review-thread completion gate. This regresses the CLAUDE.md clause: "**变更必须可验证**:行为约束必须落到机械验证手段(behavior test / source-regression test / 段落 lint);仅靠\"agent 应该记得\"承载的约束视为未落地。"
- skills/codex-refactor-loop/scripts/codex_refactor_loop/review_fix_dispatch.py:69 defines `validate_review_thread_completion`, and skills/codex-refactor-loop/scripts/codex_refactor_loop/controller_actions.py:574 exposes `validate_review_fix_completion`, but PR diff evidence shows no route/dispatch/merge-path caller that enforces it before `FIX_DONE` advances the review-gate. That violates the same CLAUDE.md clause above and the governance clause: "**治理前置**:架构性 / 流程性规则与对应机械验证手段同时进仓库,缺一不补口径。"
- No scope, external repo, protocol/schema, host production SSOT, actor-split, or compatibility-shim concerns found in the changed files.
## What would change your verdict
Wire the new completion evidence into the existing review-fix completion path before `FIX_DONE` can trigger reviewer redispatch, or narrow the SKILL contract so it only documents an explicit helper API rather than declaring completion incomplete. Add a behavior test that exercises the actual route: a review-thread-driven `FIX_DONE` with missing reply/resolution evidence must not dispatch the next reviewer round, while replied+resolved or explicit escalation evidence may proceed.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:architect:reject⟦AI:AUTO-LOOP⟧ |
🤖 测试视角 Review 通过TL;DR
详细说明我只看测试覆盖和测试质量。 source-regression 也覆盖了新增规则: 验证命令已在 PR head archive 上跑过: 📎 完整 codex 原始输出(存档备查)pr: 402
|
🤖 fix-codex 阻塞:当前 worker 不在 PR head worktreeTL;DR
详细说明阻塞需求来自 architect: 我没有应用代码改动,因为运行态证据显示 PR 402 的 head 是 📎 完整 codex 原始输出(存档备查)⟦AI:AUTO-LOOP⟧ |
Summary
self-doc-comment严格 provenance,要求 issue-only refactor marker 归一化为iter/clustermarker。retry-fix,避免升级人工卡死。Test plan
python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py⟦AI:AUTO-LOOP⟧