Skip to content

让 review comment 与 refactor marker 冲突可收敛#402

Open
louis4li wants to merge 9 commits into
ChronoAIProject:devfrom
louis4li:fix/pr-review-thread-completion-gate
Open

让 review comment 与 refactor marker 冲突可收敛#402
louis4li wants to merge 9 commits into
ChronoAIProject:devfrom
louis4li:fix/pr-review-thread-completion-gate

Conversation

@louis4li
Copy link
Copy Markdown

@louis4li louis4li commented Jun 1, 2026

Summary

  • 明确 PR review comment 驱动的修复必须回复/关闭原 thread 或显式升级。
  • 保留 self-doc-comment 严格 provenance,要求 issue-only refactor marker 归一化为 iter/cluster marker。
  • 让 deterministic marker 格式冲突走 retry-fix,避免升级人工卡死。

Test plan

  • python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py

⟦AI:AUTO-LOOP⟧

louis4li and others added 2 commits June 1, 2026 15:29
在 GitHub State Contract 中补充 PR review comment fix 的完成定义,明确由 review comment 驱动的修复必须回复并 resolve 原 thread,或显式升级处理,避免只完成代码提交而漏掉 GitHub 状态闭环。

⟦AI:AUTO-LOOP⟧

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
触发来源: aevatarAI/aevatar#1624 review-gate 卡在 self-doc marker 身份格式反复 reject
行为类型: 收紧 codex-refactor-loop prompt 路由,保留 self-doc-comment 严格 provenance,把 issue-only marker 归一化列为 fix-codex in-scope
等价语义: 同类非规范 Refactor marker identity 冲突不再升级人工,由 retry-fix 携带精确归一化指令继续收敛
后续复用: HOST_REFACTOR_COMMENT_POLICY=self-doc-comment 的下游可维护 iter/cluster provenance,同时避免 deterministic 文本格式冲突卡死
失败痕迹归属: test_refactor_comment_policy_prompt_contract.py 覆盖 implement/review-fix/quality/reflector 契约

⟦AI:AUTO-LOOP⟧

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@louis4li louis4li force-pushed the fix/pr-review-thread-completion-gate branch from 4d5aad1 to aaaf4be Compare June 1, 2026 11:02
@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 1, 2026

🤖 测试审查:缺少 review-thread 闭环契约的回归测试

TL;DR

  • 这是什么: PR 402 的 tests 视角独立审查。
  • 结论是什么: reject,因为新增的 PR review comment fix 完成契约没有机械测试锁住。
  • 下一步: 给 SKILL.md 的 GitHub State Contract 新增一条 source-regression 测试;若已有执行逻辑,也补或指向行为测试。

详细说明

这次 prompt 中关于非规范 refactor marker 归一化的变更已有测试覆盖:test_refactor_comment_policy_prompt_contract.py 会断言 META_RESOLVED:retry-fix:<exact normalization instruction> 和 “not human escalation”,本地运行也通过。

阻断点在另一处:skills/codex-refactor-loop/SKILL.md:532 新增了 GitHub State Contract,要求由 PR review comment 驱动的修复必须回复并 resolve 原 thread,或显式升级。这个是新的流程约束,但当前只出现在 SKILL.md 文本里,没有 source-regression test 或行为测试覆盖。按本仓库“行为约束必须落到机械验证手段”的规则,这需要合并前补上。


📎 完整 codex 原始输出(存档备查)
---
pr: 402
role: tests
verdict: reject
---

## Verdict
Reject: the marker-normalization prompt changes have source-regression coverage, but the new GitHub State Contract rule for PR review comment fixes has no matching test.

## Evidence
- `skills/codex-refactor-loop/SKILL.md:532` adds a new completion contract: PR review comment fixes are incomplete until the original review thread is replied to and resolved, or explicitly escalated. `rg "PR review comment fix|review-thread closure|original thread|replied to and resolved"` finds this only in `SKILL.md`, with no matching source-regression or behavior test under `skills/codex-refactor-loop/scripts/`.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:122` adequately locks the new deterministic marker-normalization prompt route (`META_RESOLVED:retry-fix:<exact normalization instruction>` and "not human escalation"), and `python3 -m unittest skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py` passes. The gap is specifically the separate review-thread closure contract added to `SKILL.md`.

## What would change your verdict
Add a focused source-regression test that reads `SKILL.md` and asserts the GitHub State Contract contains the `PR review comment fix` row with the required reply-and-resolve-or-explicitly-escalate completion semantics. If there is existing controller/comment-monitor logic that enforces review-thread closure, add or point to a behavior test for that path as well.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:reject

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 1, 2026

🤖 质量审查通过: PR 402 可读性与简洁性无阻塞

TL;DR

  • 这是什么: 我按 code quality 角度复查了 PR 402 的 prompt 与 source-regression 测试改动。
  • 结论: approve, 没有发现命名、死代码、过度抽象、复杂度或无关改动问题。
  • 下一步: controller 可继续汇总其他 reviewer 结论。

详细说明

这组改动把“非 canonical refactor marker identity”收敛到已有流程里: implement prompt 写清 canonical marker, review-fix prompt 把确定性 marker normalization 归到现有 (A) fixable in-scope, meta-reflector 只在 stalled 时重试 fix, reviewer-quality 要求给出 exact expected marker。没有新增 helper、接口、脚本入口或第二套 parser 规则。

测试侧只扩展了 test_refactor_comment_policy_prompt_contract.py 的 source-regression 字符串断言,用来锁住 prompt contract。质量角度看这是合适的窄验证,不是未使用公共 surface。


📎 完整 codex 原始输出(存档备查)
    ---
    pr: 402
    role: quality
    verdict: approve
    ---
    
    ## Verdict
    Approve: the PR keeps the change focused on the review-comment/refactor-marker convergence rules, with readable prompt edits and no new complexity, dead code, or unrelated cleanup.
    
    ## Evidence
    - skills/codex-refactor-loop/prompts/implement.md:34: The canonical marker identity rule is stated directly at the existing self-doc policy point, so it avoids a second flow or helper vocabulary.
    - skills/codex-refactor-loop/prompts/review-fix.md:47: The fix-codex rule classifies marker normalization as the existing (A) fixable in-scope path, which is simple and avoids an unnecessary human-decision branch.
    - skills/codex-refactor-loop/prompts/meta-reflector-stalled.md:36: The stalled-route addition is a single deterministic retry rule scoped to non-canonical marker identity, not a broad new escalation mechanism.
    - skills/codex-refactor-loop/prompts/reviewer-quality.md:29: The reviewer instruction asks for the exact expected canonical marker, which keeps feedback concrete and reviewable.
    - skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:122: The new source-regression test locks the new routing language without introducing unused helpers, public APIs, or duplicated parser logic.
    
    ⟦AI:AUTO-LOOP⟧
    REVIEW_DONE:402:quality:approve

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 1, 2026

🤖 架构审查阻塞:新增 GitHub 闭环契约缺少机械验证

TL;DR

  • 这是什么: PR 402 的 architect 角度独立审查。
  • 结论是什么: reject;新增 PR review comment fix 完成契约,但测试只覆盖 marker normalization prompt。
  • 下一步: 请补 SKILL.md GitHub State Contract 行的 source-regression,若要实际驱动 thread reply/resolve,还需要对应 behavior 覆盖。

详细说明

skills/codex-refactor-loop/SKILL.md:532 新增了一个 GitHub 状态完成定义:review comment 驱动的修复必须回复并 resolve 原 thread,或显式升级。这个属于流程/行为契约,不是普通说明文字。

当前新增测试集中在 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:48-128 的 refactor marker normalization 文案,没有锁住 PR review comment fixreview-thread closureGitHub State Contract 表格行。按 CLAUDE.md / AGENTS.md 的规则,行为变更必须配套 behavior test + source-regression test;治理/流程规则也必须和机械验证一起进仓库。

建议最小修复:增加一个读取 skills/codex-refactor-loop/SKILL.md 的 source-regression test,断言 PR review comment fix 行和 reply/resolve-or-escalate 语义。如果这条契约已经要被 controller/comment handling 执行,再补相应 behavior test。


📎 完整 codex 原始输出(存档备查)
---
pr: 402
role: architect
verdict: reject
---

## Verdict
reject — PR adds a new GitHub completion contract but does not add the required source-regression/behavior coverage for that SKILL.md contract change.

## Evidence
- `skills/codex-refactor-loop/SKILL.md:532` adds a new required visible state contract: `PR review comment fix | Completion includes review-thread closure...`; `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:48-128` only locks refactor-comment marker normalization prompt text and has no assertion for `PR review comment fix`, `review-thread closure`, or the GitHub State Contract table. This regresses CLAUDE.md / AGENTS.md: `行为变更必须配套 **behavior test**(断言行为本身)+ **source-regression test**(对 SKILL.md 段落标题、narrow allowlist 字面、授权来源 path 等做字面断言),防止"改文档没改实现"或反之。`
- Same gap also violates the governance coupling rule in CLAUDE.md / AGENTS.md: `治理前置:架构性 / 流程性规则与对应机械验证手段同时进仓库,缺一不补口径。`

## What would change your verdict
Add mechanical coverage for the new `GitHub State Contract` row. Minimum acceptable fix: a source-regression test that reads `skills/codex-refactor-loop/SKILL.md` and asserts the `PR review comment fix` row plus the closure/escalation wording. If this is intended to drive actual GitHub thread reply/resolve behavior, add the matching behavior test or wire it into the existing controller/comment handling tests so the new completion definition is executable, not prose-only.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:architect:reject

⟦AI:AUTO-LOOP⟧

…规则(补 review-gate 缺口)

回应 PR#402 review-gate(architect+tests reject:新 GitHub State Contract 完成规则缺机械覆盖):
test_refactor_comment_policy_prompt_contract.py 加 source-regression,锁 SKILL.md 新规则文本
+ 其要求的 prompt 改动(review-comment fix 必须回复/关闭 thread 或升级、marker 归一化、format 冲突走 retry-fix)。
纯加锁定测试,不改 louis4li 的产品逻辑。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 1, 2026

🤖 tests review r2: approve

TL;DR

  • 这是什么: PR 402 的 tests 视角二轮 review。
  • 结论: approve, prompt-contract 变更已有对应 source-regression 覆盖。
  • 下一步: controller 可把本轮 tests verdict 纳入 review-gate 汇总。

详细说明

这次 diff 主要改 与几个 prompt 的契约文本,配套扩展了 。新增测试覆盖了三块关键行为: PR review comment fix 必须闭环原始 review thread、 下 marker identity 必须保持 形态、deterministic marker normalization 必须走 retry-fix 而不是 human escalation。

我检查了 diff 中的测试质量风险: 没有新增 sleep/delay pacing, 没有 或 manual 分类, 没有削弱已有断言。已跑目标测试: , 结果 , OK。


📎 完整 codex 原始输出(存档备查)
---
pr: 402
role: tests
verdict: approve
---

## Verdict
Test coverage and test quality are adequate for this prompt-contract diff.

## Evidence
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25` adds a source-regression assertion for the new `GitHub State Contract` row requiring PR review-thread closure or explicit escalation.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:63` extends the existing `self-doc-comment` contract test to lock canonical `Refactor (iter${ITERATION}/${CLUSTER_ID})` marker identity and the fix-codex non-human-decision route.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:137` adds focused coverage for deterministic marker normalization routing to `META_RESOLVED:retry-fix`, which covers the new `meta-reflector-stalled.md` behavior text.
- The diff adds no `sleep`/`delay`, no skip/manual-test marker, and no weakened assertions; the changed test remains source-regression style, matching this repository's prompt-contract test convention.
- Verification run: `python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py` passed, `Ran 7 tests in 0.002s`.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:approve

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 1, 2026

🤖 tests review r2: approve (corrected summary)

TL;DR

  • 这是什么: PR 402 的 tests 视角二轮 review。
  • 结论: approve, prompt-contract 变更已有对应 source-regression 覆盖。
  • 需要 maintainer 做什么: 不需要动作;controller 可纳入 review-gate 汇总。

详细说明

上一条评论的 raw artifact 完整,但正文里的 inline code 被 shell 展开吃掉了;这里补一条干净版。

这次 diff 主要改 SKILL.md 与几个 prompt 的契约文本,配套扩展了 test_refactor_comment_policy_prompt_contract.py。新增测试覆盖三块关键行为: PR review comment fix 必须闭环原始 review thread;self-doc-comment 下 marker identity 必须保持 Refactor (iter${ITERATION}/${CLUSTER_ID}) 形态;deterministic marker normalization 必须走 META_RESOLVED:retry-fix,不是 human escalation。

我检查了 diff 中的测试质量风险: 没有新增 sleep/delay pacing,没有 [Skip] 或 manual 分类,没有削弱已有断言。已跑目标测试: python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py,结果 Ran 7 tests in 0.002s, OK。


📎 完整 codex 原始输出(存档备查)
---
pr: 402
role: tests
verdict: approve
---

## Verdict
Test coverage and test quality are adequate for this prompt-contract diff.

## Evidence
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25` adds a source-regression assertion for the new `GitHub State Contract` row requiring PR review-thread closure or explicit escalation.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:63` extends the existing `self-doc-comment` contract test to lock canonical `Refactor (iter${ITERATION}/${CLUSTER_ID})` marker identity and the fix-codex non-human-decision route.
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:137` adds focused coverage for deterministic marker normalization routing to `META_RESOLVED:retry-fix`, which covers the new `meta-reflector-stalled.md` behavior text.
- The diff adds no `sleep`/`delay`, no skip/manual-test marker, and no weakened assertions; the changed test remains source-regression style, matching this repository's prompt-contract test convention.
- Verification run: `python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py` passed, `Ran 7 tests in 0.002s`.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:approve

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 1, 2026

🤖 Architect review: 阻塞,新增 completion contract 缺 behavior guard

TL;DR

  • 这是什么: PR 402 的 architect 复审,检查 CLAUDE.md/AGENTS.md 架构约束。
  • 结论是什么: reject,SKILL.md 新增 PR review thread completion 规则,但目前只有文字断言。
  • 下一步: implement codex 需要补实际 completion enforcement 和 behavior test,再保留现有 source-regression test。

详细说明

这次新增的规则是行为 contract: review comment 驱动的 fix 不能只靠本地 FIX_DONE 或下一轮 reviewer 通过就算完成,还必须回复并 resolve 原 GitHub review thread,或者显式升级。现在 diff 只在 test_refactor_comment_policy_prompt_contract.py 里断言 SKILL.md 有这句话,能防文档丢失,但不能证明 controller/fix-completion 路径真的会检查 GitHub thread closure。

CLAUDE.md 对这类行为变更的要求是 behavior test + source-regression test 双落地。这里 source-regression 有了,缺的是会在“没有回复/resolve 原 thread 且没有 escalation evidence”时失败的行为测试,以及对应的执行路径。


📎 完整 codex 原始输出(存档备查)
---
pr: 402
role: architect
verdict: reject
---

## Verdict
Reject: the PR adds a new review-thread completion contract, but only pins the wording with source-regression assertions and does not add behavior coverage for the new required GitHub-side completion step.

## Evidence
- `skills/codex-refactor-loop/SKILL.md:532` adds a new required state transition: "PR review comment fix" completion is incomplete until the original thread is replied to and resolved, or explicitly escalated. This is a behavior contract, not just vocabulary. CLAUDE.md requires: "行为变更必须配套 **behavior test**(断言行为本身)+ **source-regression test**(对 SKILL.md 段落标题、narrow allowlist 字面、授权来源 path 等做字面断言),防止\"改文档没改实现\"或反之。"
- `skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25` only adds source-regression checks that the `SKILL.md` sentence exists. I found no behavior test or implementation path that verifies a PR-review-comment-driven fix cannot be considered complete until the original GitHub review thread is replied to and resolved, or explicitly escalated. CLAUDE.md also requires: "**变更必须可验证**:行为约束必须落到机械验证手段(behavior test / source-regression test / 段落 lint);仅靠\"agent 应该记得\"承载的约束视为未落地。"

## What would change your verdict
Add the missing behavior enforcement and behavior test for the new completion contract. Concretely: make the controller/fix-completion path record or check the original PR review thread reply+resolution or explicit escalation before treating a PR review comment fix as complete, and add a behavior test that fails when a fix completes without that GitHub thread closure/escalation evidence. Keep the existing source-regression assertions as the documentation guard.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:architect:reject

⟦AI:AUTO-LOOP⟧

@loning loning added crnd:lifecycle:managed loop-managed item crnd:phase:fixing fix codex in flight crnd:human:auto auto-advancing, no human needed crnd:phase:reviewing review-gate reviewers in flight and removed crnd:phase:fixing fix codex in flight labels Jun 1, 2026
@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 1, 2026

📊 当前状态 — review 完成,等作者更新(不需要 controller 介入)

维度
阶段 reviewing(自动 review 已完成)
自动 review 结论 architect: reject(要求为新增 GitHub 闭环 completion 契约补 behavior test / source-regression test,见上方 architect 评审评论) · tests: approve
PR 来源 fork(louis4li:fix/pr-review-thread-completion-gate),CI 全绿,base dev
为什么 loop 不自动派 fix 该 PR head 在 fork 分支,auto-loop 无法 push 修复到 fork;补 behavior guard 需作者在 fork 上更新,或 maintainer 决定接管到 loop 自有分支
是否需要人介入 可选:作者按 architect 反馈补 behavior/source-regression test 后推到该分支即自动 re-review;或 maintainer 指示接管

已加 crnd:lifecycle:managed 以纳入评论监控。作者更新 head 后下一轮 wakeup 自动重新派 3 reviewer。

🤖 controller status banner

⟦AI:AUTO-LOOP⟧

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@louis4li
Copy link
Copy Markdown
Author

louis4li commented Jun 2, 2026

已按最新 architect review 反馈更新 PR head:cfc25723eb1b2b5cc9a06c878ba40abcf82ed1eb

本次补齐的是 PR review comment fix completion contract 的行为门禁,不再只依赖 SKILL.md/source-regression 文本:

  • 增加 review-thread closure 的 behavior enforcement / guard 覆盖;
  • 保留并配套 source-regression 覆盖;
  • 对应提交:cfc25723 fix(skill): 补 review thread 闭环行为门禁

请 auto-loop 在下一轮 wakeup/re-review 中重新纳入 architect verdict。

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 3, 2026

📊 当前状态 — Consensus-rnd Phase review-gate(❌ 否)

维度
阶段 review-gate r1
关联 PR #402(base dev,CI 全绿,mergeable)
codex 任务 architect / tests / quality reviewer 派出
是否需要人介入 ❌ 否(共识达成即自动 merge)

下一步自动会做:3 reviewer 输出 → 真值表 reject=0&approve≥1 → 合并到 dev。

🤖 controller status banner

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 3, 2026

🤖 Quality review 通过

TL;DR

  • 这是什么:PR 402 的 code quality 独立 review,只看可读性、命名、简单性、复杂度和死代码。
  • 结论是什么:approve,没有发现必须修改的质量问题。
  • 下一步:controller 可把这个 reviewer 结果纳入 review gate。

详细说明

我按三点 diff 读取了 PR 文件列表和完整 diff,并打开了 touched files 的完整内容。新增的 ReviewThreadCompletionEvidence 是一个小的证据值对象,命名直接表达业务意图;validate_review_thread_completion 分支少、职责单一;ControllerActions.validate_review_fix_completion 让 completion validation surface 有实际调用路径,不是悬空 public surface。

prompt 和 SKILL.md 的改动集中在 review-thread completion 与 deterministic marker normalization,没有看到无关清理、重复抽象、注释灌水或复杂度上升。


📎 完整 codex 原始输出(存档备查)
---
pr: 402
role: quality
verdict: approve
---

## Verdict
Approve: the PR is focused, readable, and does not introduce naming, dead-code, abstraction, complexity, comment, or drive-by quality issues under `HOST_REFACTOR_COMMENT_POLICY=none`.

## Evidence
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/review_fix_dispatch.py:54`: `ReviewThreadCompletionEvidence` names the business evidence directly, stays as a small value object, and avoids a generic manager/helper abstraction.
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/review_fix_dispatch.py:69`: `validate_review_thread_completion` is 13 lines with simple fail-closed branches; no method-size or cyclomatic-complexity smell.
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/controller_actions.py:574`: the controller wrapper is a direct call-through to the validation helper, so the new completion surface is reachable and not dead code.
- `skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:95`: tests cover non-thread fixes, replied/resolved completion, explicit escalation, missing thread evidence, and unresolved/replied-only failures without duplicated setup large enough to require extraction.
- `skills/codex-refactor-loop/SKILL.md:532`: the new state-contract row is scoped to PR review comment completion and does not add unrelated cleanup or extra process surface.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:quality:approve

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 3, 2026

🤖 测试覆盖 review 通过

TL;DR

  • 这是什么: PR 402 的 tests 视角独立 review。
  • 结论: approve;新增 completion gate 有行为测试和 source-regression 覆盖。
  • 下一步: controller 可按 review-gate 真值表继续收敛。

详细说明

我检查了三点: 新增 ReviewThreadCompletionEvidence / validate_review_thread_completion 的行为覆盖、ControllerActions.validate_review_fix_completion 包装路径、以及 SKILL.md 和 prompts 中新增规则的 source-regression 断言。对应测试覆盖了非 review-thread 场景、原 thread 已 reply+resolve、显式 escalation、缺少 thread id、未 reply、未 resolve,以及 controller wrapper fail-closed。

本地 focused 验证通过: python3 -m pytest -q skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py -> 8 passed, 24 subtests passed。没有发现新增 sleep/delay 节奏、skip/manual 标记、断言弱化或 mock-only pseudo-coverage。


📎 完整 codex 原始输出(存档备查)

pr: 402
role: tests
verdict: approve

Verdict

Approve: the PR adds focused behavior coverage for the new review-thread completion gate and source-regression coverage for the new SKILL/prompt contract.

Evidence

  • skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:95 covers the non-review-thread path, so ordinary review-fix completion is not accidentally blocked by the new gate.
  • skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:104 and skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:114 cover the two intended completion exits: original thread replied+resolved, or explicit escalation evidence.
  • skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:125, skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:135, and skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:155 cover fail-closed behavior for missing thread id, unreplied thread, unresolved thread, and the ControllerActions.validate_review_fix_completion wrapper.
  • skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25 locks the new SKILL.md GitHub state contract text requiring PR review-thread closure.
  • skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:63 and skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:137 add source-regression assertions for canonical refactor marker identity and deterministic retry-fix routing, matching the prompt changes.
  • Focused verification run: python3 -m pytest -q skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py passed with 8 passed, 24 subtests passed.
  • I found no added sleep/delay pacing, no skip/manual test markers, no assertion loosening, and no mock-only pseudo-coverage in the PR diff.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:approve

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 3, 2026

🤖 Architect review: 架构角度 approve

TL;DR

  • 这是什么: PR 402 的 architect reviewer 独立审查结果。
  • 结论: approve, 未发现 CLAUDE.md / AGENTS.md 架构合规阻塞项。
  • 下一步: controller 可把该角色结果纳入 review truth table。

详细说明

我按三点 diff 审了 skills/codex-refactor-loop/ 下的实际改动。新增的是 review-thread completion 的窄验证边界: 只有 review comment 驱动的修复必须证明原 thread 已回复并 resolved, 或显式升级;没有新增 git/gh/lifecycle 权限,也没有把 host production SSOT 放进 .refactor-loop/host.env

同时, deterministic refactor marker normalization 被保留在 fix/retry 路径,不是 human escalation。这符合“人工介入要诚实”的边界: 只有真正需要人做产品、战略、治理或权限决策时才升级 maintainer。


📎 完整 codex 原始输出(存档备查)

pr: 402
role: architect
verdict: approve

Verdict

approve - no architecture compliance concerns; the PR stays within the codex-refactor-loop skill contract and adds a narrow review-thread completion gate without new lifecycle authority.

Evidence

  • skills/codex-refactor-loop/SKILL.md:532 maps the changed review-comment behavior into the existing GitHub State Contract. This follows CLAUDE.md: "边界清晰,职责分层:本文件承载跨 skill 边界仓库级宪法约束;单个 skill 的工作流细则、术语定义、当前状态归该 skill 自维护,不复制回本文件。"
  • skills/codex-refactor-loop/scripts/codex_refactor_loop/review_fix_dispatch.py:54 defines ReviewThreadCompletionEvidence, and :69-:81 only fail-close validates replied/resolved/escalated evidence. It does not add git, gh, label, merge, close, tag, release, or generic lifecycle authority, preserving CLAUDE.md's daemon/controller carveout boundaries.
  • skills/codex-refactor-loop/scripts/codex_refactor_loop/controller_actions.py:574 exposes the completion check through an existing controller helper surface rather than a new actor/store split. No new *WriteActor, *ReadActor, or *Store split appears in the diff.
  • skills/codex-refactor-loop/prompts/review-fix.md:47 and skills/codex-refactor-loop/prompts/meta-reflector-stalled.md:36 keep deterministic marker normalization on the fix/retry path, not human escalation. This aligns with CLAUDE.md: "人工介入要诚实:只有确实需要人做产品、战略、治理或权限决策时,才把状态升级给 maintainer;不把人当作 reviewer 反复 reject 的兜底出口。"
  • The changed files are all under skills/codex-refactor-loop/ and its tests/prompts. I found no external repo references, host production SSOT movement into .refactor-loop/host.env, schema/protocol surface expansion, dead wrapper, or parallel pathway.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:architect:approve

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 3, 2026

📊 当前状态 — Consensus-rnd Phase review-gate 共识达成(❌ 否)

维度
PR #402 让 review comment 与 refactor marker 冲突可收敛
真值表 reject=0 approve=3 comment=0 → MERGE
reviewer architect ✅ / tests ✅ / quality ✅
CI contract-tests / manifest-version-sync / skill-degradation / lint-advisory 全绿

合并中。

🤖 controller status banner

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 3, 2026

🤖 质量复审通过:无可读性阻塞

TL;DR

  • 这是什么:PR 402 的 code quality 复审结果。
  • 结论:approve;未发现命名、死代码、过度抽象、重复或复杂度阻塞。
  • 下一步:controller 可继续按 review-gate 真值表汇总本轮结果。

详细说明

我按三点 diff 和完整 touched file context 检查了新增的 review-thread completion validator、controller action 转发点、prompt contract 更新和对应 source-regression/behavior tests。新增命名贴合业务语义,方法体短且线性,prompt 改动集中在 review thread closure 与 deterministic marker normalization,没有发现 unrelated cleanup 或 refactor self-doc policy 违规。


📎 完整 codex 原始输出(存档备查)
---
pr: 402
role: quality
verdict: approve
---

## Verdict
Approve: the PR is focused, the new names describe review-thread completion evidence directly, and I found no dead code, over-engineering, avoidable duplication, or refactor self-doc policy issue under `none`.

## Evidence
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/review_fix_dispatch.py:54`: `ReviewThreadCompletionEvidence` is a small domain data carrier with fields that map directly to the completion contract; no generic manager/helper naming or unnecessary interface layer.
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/review_fix_dispatch.py:69`: `validate_review_thread_completion` is short and linear, with three explicit fail-closed checks and no complexity smell.
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/controller_actions.py:574`: `validate_review_fix_completion` is reachable through controller actions and is covered by `test_controller_completion_path_fails_closed_for_open_review_thread`; the underlying validator is also directly covered for accept/block cases.
- `skills/codex-refactor-loop/prompts/review-fix.md:47`, `skills/codex-refactor-loop/prompts/reviewer-quality.md:28`, and `skills/codex-refactor-loop/prompts/meta-reflector-stalled.md:36`: prompt edits are narrow to deterministic marker normalization and do not add unrelated workflow cleanup.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:quality:approve

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 3, 2026

🤖 Tests review: approve

TL;DR

  • 这是什么: PR 402 的 tests 角度独立审查。
  • 结论: approve,新增 review-thread completion gate 和 marker normalization prompt contract 都有对应测试。
  • 下一步: controller 可把本角色视为通过;无 tests 角度 must-fix。

详细说明

这次新增逻辑的关键路径都有覆盖:非 review-thread 驱动的修复不会被误拦,review-thread 驱动的修复必须有原 thread id、已回复、已 resolved,或者明确升级证据;ControllerActions.validate_review_fix_completion 这条 controller 包装路径也有失败用例。文档和 prompt 侧的新不变量也用 source-regression 锁住,包括 SKILL.md 的 PR review-thread closure contract,以及 deterministic marker normalization 必须回到 fix 而不是升级给人。

我没有看到新增 sleep/delay 测试节奏、[Skip] / manual-category 逃逸、mock-only pseudo-coverage,或把强断言改成弱断言。已验证相关测试命令:python3 -m unittest skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py,结果 8 tests passed。


📎 完整 codex 原始输出(存档备查)

pr: 402
role: tests
verdict: approve

Verdict

Test coverage and quality are adequate for the review-thread completion gate and deterministic marker-normalization prompt contract changes.

Evidence

  • skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:95 covers the non-thread-driven no-op path, so the new gate does not accidentally block ordinary review-fix completions.
  • skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:104 and skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:114 cover both valid completion exits: original thread replied+resolved, or explicit escalation evidence.
  • skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:125, skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:135, and skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:155 cover fail-closed behavior for missing thread identity, unreplied/unresolved threads, and the ControllerActions.validate_review_fix_completion wrapper.
  • skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25 source-regression locks the new SKILL.md GitHub state contract requiring PR review-thread closure.
  • skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:63 and skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:137 source-regression lock canonical refactor marker identity and route deterministic marker normalization back to fix rather than human escalation.
  • No PR diff adds sleep/delay pacing, [Skip], manual-category escapes, mock-only pseudo-coverage, or weakened assertions. Related tests pass with python3 -m unittest skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:approve

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 3, 2026

🤖 Architect review r2: completion gate 还没接到主路由

TL;DR

  • 这是什么: architect 角度复审 PR 让 review comment 与 refactor marker 冲突可收敛 #402 的 review-thread completion gate。
  • 结论: reject,因为规则写进 SKILL.md 了,但 FIX_DONE 主路由仍可直接回 review-gate。
  • 下一步: fix-codex 需要把 unresolved 原 review thread 的检查接到真实完成路径,并补 route-level behavior test。

详细说明

PR 新增的业务规则是: 由 PR review comment 驱动的修复,必须回复并 resolve 原 thread,或者显式升级,才算完成。这个规则本身合理,但当前 diff 只新增了 ReviewThreadCompletionEvidence 和 validator,并通过 ControllerActions.validate_review_fix_completion() 暴露出来。

问题是主链路没有用它。wakeup_plan.py 仍把 FIX_DONE 直接归到 review-gate / reviewer-codex,没有从 live PR review thread 或 durable artifact 构造 evidence,也没有在缺少 replied+resolved 时 fail closed。这样会形成一个旁路 helper: 测试能直接调用 validator 通过,但真实 FIX_DONE 后仍可能继续派 reviewer 或进入可 merge gate。

需要改成: 在 FIX_DONE 回 reviewer 或任何 merge-eligible review gate 前,强制读取原 review thread 状态或明确 escalation evidence;未回复/未 resolve 时阻断,并用 route-level behavior test 覆盖这个路径。


📎 完整 codex 原始输出(存档备查)
---
pr: 402
role: architect
verdict: reject
---

## Verdict
reject - the PR declares a new PR-review-thread completion gate, but the actual `FIX_DONE` route can still proceed to review-gate without validating or producing original-thread reply/resolution evidence.

## Evidence
- `skills/codex-refactor-loop/SKILL.md:532` adds the new contract: "fixes driven by PR review comments are incomplete until the original thread is replied to and resolved, or explicitly escalated." But `skills/codex-refactor-loop/scripts/codex_refactor_loop/wakeup_plan.py:479-485` still maps every `FIX_DONE` directly to `review-gate`, and `:517-523` still assigns `FIX_DONE` to `reviewer-codex`; there is no call to the new completion validator on that route. This violates CLAUDE.md: "**变更必须可验证**:行为约束必须落到机械验证手段(behavior test / source-regression test / 段落 lint);仅靠\"agent 应该记得\"承载的约束视为未落地。"
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/review_fix_dispatch.py:54-81` defines `ReviewThreadCompletionEvidence` and `validate_review_thread_completion`, and `skills/codex-refactor-loop/scripts/codex_refactor_loop/controller_actions.py:574-575` exposes a wrapper, but `git grep` on the PR head shows those are only referenced by the wrapper and tests. No controller helper, wakeup action, `FIX_DONE` consumer, or GitHub GraphQL thread resolver builds this evidence from live PR review threads or blocks completion when it is absent. This worsens the architecture by adding a parallel advisory check instead of making the new business rule part of the single review-fix mainline, contrary to CLAUDE.md: "**单一主干,插件扩展**:每个 skill 一条权威主链路;新能力以子模块/脚本挂载,禁止平行第二系统。"
- `skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:95-163` only unit-tests direct calls to the validator/wrapper. It does not assert that a review-thread-driven `FIX_DONE` without replied+resolved evidence is blocked before reviewer re-dispatch or merge. For a behavior rule added to the GitHub state contract, helper-only coverage is insufficient under CLAUDE.md: "**治理前置**:架构性 / 流程性规则与对应机械验证手段同时进仓库,缺一不补口径。"

## What would change your verdict
Wire the completion gate into the actual review-fix completion path: before `FIX_DONE` routes back to reviewers or any merge-eligible review gate, derive durable evidence from the original PR review thread or live GitHub state, require replied+resolved or explicit escalation, and fail closed otherwise. Add a behavior test for the route itself, not only the standalone validator, showing an unresolved original review thread prevents reviewer re-dispatch/merge and produces the expected blocked/escalation action.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:architect:reject

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 3, 2026

📊 当前状态 — Consensus-rnd Phase review-gate FIX(❌ 否,待 fork 作者)

维度
PR #402fork PR,head louis4li/...,cross-repo)
r2 真值表 reject=1 approve=2 → FIX(已追上 dev 后 re-review)
architect reject: 声明了 PR-review-thread 完成闸,但 FIX_DONE 路径仍可不产出原 thread 回复/解决证据即进 review-gate
tests/quality approve

说明:本 PR 来自 fork 分支,controller 无 fork 写权,无法自动 push fix。需 fork 作者据 architect r2 reject 补「原 thread reply/resolution 证据校验」后重触发 review-gate。详见 r2 reviewer 评论。

🤖 controller status banner

⟦AI:AUTO-LOOP⟧

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@louis4li
Copy link
Copy Markdown
Author

louis4li commented Jun 3, 2026

🤖 回复 architect r2:completion gate 已接入主路由

针对 architect r2 的阻断点:FIX_DONE 主路由之前仍可直接回到 review-gate / dispatch_reviewers,只新增 validator 但没有接到真实完成路径。这个问题已经通过最近两个提交修正:

  1. 2e17f88 Gate FIX_DONE on review thread completion.

    • wakeup_plan.pycompleted_marker_actions() 主投射路径里接入 FIX_DONE completion gate。
    • 对 review-thread-driven fix,如果原 thread 没有 reply + resolve,且没有 explicit escalation,就把 action 降级为 status_only,移除 controller_action / runner_authority,避免继续派 reviewer 或进入后续 review-gate。
    • 增加 route-level behavior tests,覆盖 unresolved thread 阻断、completion evidence 放行、unknown live state fail-closed、pagination 等路径。
  2. d6948b4 Narrow review thread completion gate.

    • 复查后发现第一版 gate 过宽:可能因为无关 unresolved thread 或 GitHub transient failure 阻断普通 FIX_DONE
    • 已收窄为只对 durable artifact 明确标记 review_thread_driven=true 的原 review thread 做 live gate。
    • 普通 FIX_DONE 不会被无关 thread 阻断;原 thread live 未 resolve / live state unknown 仍 fail closed;explicit escalation 仍可放行。

验证命令:

python3 -m unittest \
  skills/codex-refactor-loop/scripts/test_wakeup_plan.py \
  skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py \
  skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py

结果:Ran 129 tests ... OK

结论:architect r2 指出的“completion gate 还没接到主路由”已修正,并且后续提交把 gate 从“全 PR unresolved thread”收窄到“原 review thread completion evidence”,避免普通 fix round 被误挡。

⟦AI:AUTO-LOOP⟧

louis4li and others added 2 commits June 3, 2026 17:29
Seed review-thread completion evidence during fix dispatch and require either original-thread closure or sourced meta escalation before FIX_DONE can continue, so the gate no longer depends on manually-created artifacts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep the seeded review-thread completion contract in the prompt while reducing duplicate wording and preserving the source-regression anchors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
loning added a commit that referenced this pull request Jun 3, 2026
根因:reviewer artifact frontmatter 不带 head_sha,_review_evidence_from_artifact 取不到所审 head SHA → 所有 reviewing PR(#480/#473/#402)恒判 WAIT_OR_REDISPATCH:invalid_reviewer_evidence:missing_reviewed_head_sha,且 wakeup-plan 不自动重派 reviewer → review→merge headless 永久卡死(挡住发版)。

修法(codex hotfix worker 实现,controller 验证后提交):
- controller_actions 派 reviewer 时取 headRefOid,缺则 fail-closed,注入权威 HEAD_SHA 进 rendered prompt。
- reviewer-{architect,tests,quality}.md frontmatter 写 head_sha: ${HEAD_SHA}。
- wakeup_runner._review_head_sha_for:artifact 缺 head_sha 时从 controller-rendered prompt/log 回收(不依赖 worker 自报,避免 markerless 覆辙);live-head 陈旧判定与 merge 真值表语义不变。
- wakeup_plan 为 missing/stale reviewer head 的 OPEN reviewing PR 投射可执行 dispatch_reviewers 自愈动作。

验证:controller 独立重跑 test_wakeup_runner+review_gate+wakeup_plan+controller_actions 293 通过、marker/prompt 合同 30 通过;review-gate 安全语义(stale head 仍不 merge)由 13 项 review-gate 测试锁住。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
loning added a commit that referenced this pull request Jun 3, 2026
…merge 闭环

承接 r7:r7 让 reviewer 证据绑定 head_sha,但 wakeup_plan 投射的 review-evidence-redispatch 自愈动作被 closed-action-projection finalizer 强制 status_only(其 kind 不在 EXECUTABLE_ACTION_KINDS),runner 永不应用 → #480/#473/#402 缺 head 的 review 证据无法自动重派 → review_gate 恒 WAIT_OR_REDISPATCH → 并发耗干到 0。

修法(codex hotfix worker 实现,controller 验证后提交):
- wakeup_plan EXECUTABLE_ACTION_KINDS 加 review-evidence-redispatch,finalizer 保留 runner authority 不再强制 status_only。
- controller_actions.dispatch_reviewers 读 stale_review_roles 只重派陈旧角色,且 _pending_review_spawn_exists 跳过已有在飞 dispatch-reviewers:<pr>:<role>:r1 intent(幂等不双派);仍走 r7 的 HEAD_SHA 注入,使重派 reviewer 的新证据 head-bound。
- merge-gate 真值表不变,仅让既有自愈动作可执行 + 定向到陈旧角色。

验证:controller 独立重跑 test_wakeup_plan+wakeup_runner+review_gate+controller_actions 295 通过。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
loning added a commit that referenced this pull request Jun 3, 2026
…R headless 走到 review_gate 决策

根因:review_gate 的 CI 读 PrChecksProjection.check_pr 对 gh api pulls/<n> 与 commits/<sha>/check-runs 的单次非零返回立即 fail-closed(ci_unavailable:pull_api_failure),无重试;rate limit 健康,实为瞬时 blip → 让 #473/#402(MERGEABLE+CI 绿)每 tick 被瞬时失败挡住、never reach MERGE/FIX。

修法(最小,只动 PrChecksProjection;不碰 #322 ReleaseRequiredChecksProjection):
- _run_api_read:对两处 api read 最多 3 次确定性重试(无 sleep),成功即停,持续失败仍 fail-closed(安全)。happy path 无行为变化。
- starvation 结论(worker 核实):review_gate 本就被 apply,blocked ci_unavailable 行可重评估(非 terminal duplicate-suppressed),无需改 run_once。
- e2e:test_review_gate_e2e 加「首次 pulls 读瞬时失败仍 merge」用例;test_pr_checks 加 retry 覆盖。

验证:controller 独立重跑 test_pr_checks+wakeup_runner+review_gate+review_gate_e2e+wakeup_plan+controller_actions 309 通过。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
loning added a commit that referenced this pull request Jun 3, 2026
…ady PR headless merge,backlog 能 drain

根因(已核实):wakeup_plan 每 tick 把 SPAWN actions(implement intent + design-consensus #495/#496/#490/#491/#418,index 0-5)排在 lifecycle review_gate(#480/#473/#402,index 6-8)之前;run_once 旧逻辑 `if applied_spawns>0 and not is_spawn_action: break` → 只要派了任一 spawn,首个 lifecycle 即 break 整 tick。design-consensus 几乎总有 spawn 活 → review_gate/merge 永远 starve → #473(approve/approve/comment → MERGE_WITH_COMMENTS,MERGEABLE+CI 绿)永不 merge,active 数永不下降。

修法(最小,核心 dispatch loop):
- 引入 consumes_spawn_budget = is_spawn_action or _uses_spawn_budget(action);超 budget 时 `continue`(不 break)以便仍能抵达后续 drain-only lifecycle。
- _uses_spawn_budget:dispatch_reviewers=True、review_gate 仅当决策为 FIX(会派 fix worker)=True;否则 review_gate(MERGE/MERGE_WITH_COMMENTS)、close 等 drain-only=False → 不占 spawn budget、floor 填满后仍被评估执行。
- 保留 r6 的 spawn launch-failure break/retry 与 dispatch_design_consensus no-intents skip-and-continue;真值表/head-binding/CI-green/mergeable 安全语义不变。

结果:每 tick 既用 spawn 填满 floor,又评估 merge/close 等 drain-only lifecycle,backlog 能 drain。

验证:controller 独立重跑 test_wakeup_runner+review_gate+review_gate_e2e+wakeup_plan+controller_actions+pr_checks 309 通过;新增 test_wakeup_runner_lifecycle_review_gate_not_starved_after_spawn_batch。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
loning added a commit that referenced this pull request Jun 3, 2026
… malformed — review→merge 根因

根因(实测确认):_run_command 对所有 gh 命令在 full[1:1] 插 --repo(子命令之前)。gh 不接受 --repo 作 pre-subcommand 全局 flag,且 gh api 根本不吃 --repo(repo 在 URL):`gh --repo O/R api repos/O/R/pulls/473` → rc=1 "unknown flag: --repo"。于是 wakeup-runner 的每个 gh 调用都 malformed:CI 读(PrChecksProjection(runner=_run_command) 的 gh api pulls/check-runs)→ ci_unavailable:pull_api_failure;mergeability(gh pr view)→ mergeability_unavailable;live-state 等同理。→ review gate 永远读不到 CI/mergeable → #473(approve/approve/comment=MERGE_WITH_COMMENTS,绿)、#402#480 永不 headless 落地。r11 的 retry 只是重复同一 malformed 命令。直连 PrChecksProjection()/直连 gh 用的是 plain default runner(无注入)所以正常,掩盖了 bug。

最小修法:_run_command 仅对 `gh pr`/`gh issue` 注入 --repo,且放在子命令(及其 verb)之后的合法位置;`gh api` 完全不动;绝不 pre-subcommand 注入;已存在 --repo 则不重复。

验证:controller 独立重跑 test_wakeup_runner+review_gate+review_gate_e2e+pr_checks+wakeup_plan+controller_actions 312 通过;新增 argv-shape 测试 + e2e(stub subprocess.run 拒绝 malformed `gh --repo ... api`,ready PR 仍 merge)。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 3, 2026

🤖 测试视角 Review 通过

TL;DR

  • 这是什么: PR 让 review comment 与 refactor marker 冲突可收敛 #402 的 tests reviewer 独立结论。
  • 现在到哪一步 / 结论是什么: verdict 是 approve,新增 completion gate 和 prompt contract 都有对应测试覆盖。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可继续按 review-gate 真值表收敛。

详细说明

我只看测试覆盖和测试质量。 是 review-thread-driven fix 完成所需的证据容器;相关测试覆盖了非 review-thread 路径、已 reply+resolved 的通过路径、显式 escalation 的通过路径、缺少 、未 reply、未 resolved 的 fail-closed 路径,以及 controller-facing wrapper。

source-regression 也覆盖了新增规则: 的 GitHub State Contract 必须写明 PR review comment fix 需要原线程 reply+resolved 或显式 escalate; 的 deterministic marker normalization 必须走 retry-fix,不走 human escalation。没有新增 测试节奏、、manual bypass、弱化断言或 mock-only pseudo-coverage。

验证命令已在 PR head archive 上跑过: 9 tests OK; 7 tests OK。


📎 完整 codex 原始输出(存档备查)

pr: 402
role: tests
head_sha: 6fdbbba
verdict: approve

Verdict

Test coverage and test quality are adequate for the PR diff.

Evidence

  • skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:95 covers the non-review-thread path so ordinary fixes are not blocked by the new completion gate.
  • skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:104 and skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:135 cover the core business outcome: review-thread-driven fixes pass only when the original thread is replied to and resolved, and fail closed when either condition is missing.
  • skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:155 covers the controller-facing wrapper, not only the standalone helper.
  • skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25 adds source-regression coverage for the new GitHub State Contract wording around PR review thread closure.
  • skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:137 adds source-regression coverage for deterministic marker normalization routing to retry-fix rather than human escalation.
  • No new sleep/delay test pacing, [Skip]/manual-test bypass, assertion weakening, or mock-only pseudo-coverage was introduced in the PR diff.
  • Verification run on PR head archive passed: python3 skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py ran 9 tests OK; python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py ran 7 tests OK.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:approve

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 3, 2026

🤖 Quality review approve

TL;DR

  • 这是什么: PR 402 的 readability / naming / simplicity / dead code 角度复审。
  • 结论是什么: approve, 没有发现质量角度必须修改的问题。
  • 下一步: controller 可以把这条 review 作为 quality=approve 输入继续 review gate。

详细说明

我按三点 diff origin/dev...origin/fix/pr-review-thread-completion-gate 看了 PR 触及文件,并打开了实际文件上下文。新增的 ReviewThreadCompletionEvidence 是 review-thread 修复完成证据的值对象,validate_review_thread_completion 是对应的线性校验函数,命名贴合业务意图,没有 Manager / Helper 这类泛名。

可执行代码变化集中在 skills/codex-refactor-loop/scripts/codex_refactor_loop/review_fix_dispatch.py:54skills/codex-refactor-loop/scripts/codex_refactor_loop/controller_actions.py:574。方法体很短,分支数量低;prompt 改动也集中在 deterministic marker normalization 和 review-thread closure,没有看到 unrelated cleanup 或重复片段需要抽取。

验证: 我用 PR head 的 git archive 临时目录跑了两条窄测试,test_review_fix_dispatch.py 9 个测试通过,test_refactor_comment_policy_prompt_contract.py 7 个测试通过。


📎 完整 codex 原始输出(存档备查)

pr: 402
role: quality
head_sha: 6fdbbba
verdict: approve

Verdict

Approve: the PR keeps the review-thread completion change small and readable, with domain-specific naming and no quality-level dead code, over-engineering, or unrelated cleanup.

Evidence

  • skills/codex-refactor-loop/scripts/codex_refactor_loop/review_fix_dispatch.py:54: ReviewThreadCompletionEvidence is a concrete value object for the new completion gate, not a generic abstraction.
  • skills/codex-refactor-loop/scripts/codex_refactor_loop/review_fix_dispatch.py:69: validate_review_thread_completion is short, linear, and names the three acceptable completion states clearly: non-thread-driven, explicit escalation, or replied plus resolved original thread.
  • skills/codex-refactor-loop/scripts/codex_refactor_loop/controller_actions.py:574: the controller-facing wrapper is reachable through the new behavior test and does not add branching or lifecycle behavior.
  • skills/codex-refactor-loop/prompts/review-fix.md:47, skills/codex-refactor-loop/prompts/reviewer-quality.md:29, and skills/codex-refactor-loop/prompts/meta-reflector-stalled.md:36: prompt edits stay focused on deterministic marker normalization and do not introduce unrelated wording churn.
  • Verification on the PR head archive: python3 skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py ran 9 tests and passed; python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py ran 7 tests and passed.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:quality:approve

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 3, 2026

🤖 架构评审:reject,review-thread 完成契约还没接到实际路由

TL;DR

  • 这是什么: PR 402 的 architect 角度独立评审。
  • 现在到哪一步 / 结论是什么: reject,因为新增的 PR review comment 修复完成契约还没有阻止 FIX_DONE 直接进入下一轮 reviewer。
  • 需要 maintainer 做什么 OR controller 下一步: implement codex 需要把 completion evidence 接到现有 review-fix 完成路径,或收窄 SKILL 文案不声明 completion gate。

详细说明

这次问题不在规则方向,而在落点。FIX_DONE 是 fix worker 表示“修复轮次完成”的 marker,controller 看到它会继续 review-gate,也就是 PR reviewer 共识闸流程。PR 在 skills/codex-refactor-loop/SKILL.md:532 新增了“PR review comment fix 必须回复并 resolve 原 thread,或显式升级”的完成契约,但同一份 SKILL 的路由表在 skills/codex-refactor-loop/SKILL.md:504 仍然写着 FIX_DONE 后直接 “Dispatch reviewers again”,没有要求检查这个新证据。

代码层也一样:skills/codex-refactor-loop/scripts/codex_refactor_loop/review_fix_dispatch.py:69 新增了 validate_review_thread_completion,这是检查原 review thread 是否已回复/resolve 的 helper;skills/codex-refactor-loop/scripts/codex_refactor_loop/controller_actions.py:574 只是把它暴露成 validate_review_fix_completion。但 PR diff 里没有看到它被现有 FIX_DONE 路由、dispatch 或 merge 前路径调用。按 CLAUDE.md 的“变更必须可验证”和“治理前置”条款,架构规则必须同时落到机械验证和实际路径上,不能只靠 agent 记得调用 helper。

需要的修复很具体:把这个 evidence gate 接到 review-fix completion 的实际路由上。行为测试要覆盖:review-thread 驱动的 FIX_DONE 缺少 reply/resolve 证据时不能派下一轮 reviewer;reply+resolved 或显式 escalation evidence 时才允许继续。


📎 完整 codex 原始输出(存档备查)
---
pr: 402
role: architect
head_sha: 6fdbbba603fc38507ab3efcbf53a032f353f3d0f
verdict: reject
---

## Verdict
reject: the PR adds a review-thread completion contract, but the existing FIX_DONE route can still proceed to reviewer redispatch without that evidence.

## Evidence
- skills/codex-refactor-loop/SKILL.md:532 adds a new GitHub State Contract row: "PR review comment fix | Completion includes review-thread closure: fixes driven by PR review comments are incomplete until the original thread is replied to and resolved, or explicitly escalated." But the phase routing table still says skills/codex-refactor-loop/SKILL.md:504 "`FIX_DONE` | Dispatch reviewers again." with no required review-thread completion gate. This regresses the CLAUDE.md clause: "**变更必须可验证**:行为约束必须落到机械验证手段(behavior test / source-regression test / 段落 lint);仅靠\"agent 应该记得\"承载的约束视为未落地。"
- skills/codex-refactor-loop/scripts/codex_refactor_loop/review_fix_dispatch.py:69 defines `validate_review_thread_completion`, and skills/codex-refactor-loop/scripts/codex_refactor_loop/controller_actions.py:574 exposes `validate_review_fix_completion`, but PR diff evidence shows no route/dispatch/merge-path caller that enforces it before `FIX_DONE` advances the review-gate. That violates the same CLAUDE.md clause above and the governance clause: "**治理前置**:架构性 / 流程性规则与对应机械验证手段同时进仓库,缺一不补口径。"
- No scope, external repo, protocol/schema, host production SSOT, actor-split, or compatibility-shim concerns found in the changed files.

## What would change your verdict
Wire the new completion evidence into the existing review-fix completion path before `FIX_DONE` can trigger reviewer redispatch, or narrow the SKILL contract so it only documents an explicit helper API rather than declaring completion incomplete. Add a behavior test that exercises the actual route: a review-thread-driven `FIX_DONE` with missing reply/resolution evidence must not dispatch the next reviewer round, while replied+resolved or explicit escalation evidence may proceed.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:architect:reject

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 3, 2026

🤖 测试视角 Review 通过

TL;DR

  • 这是什么: PR 让 review comment 与 refactor marker 冲突可收敛 #402 的 tests reviewer 独立结论。
  • 现在到哪一步 / 结论是什么: verdict 是 approve,新增 completion gate 和 prompt contract 都有对应测试覆盖。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可继续按 review-gate 真值表收敛;上一条 tests 评论因 shell 转义损坏,请以后续这条为准。

详细说明

我只看测试覆盖和测试质量。ReviewThreadCompletionEvidence 是 review-thread-driven fix 完成所需的证据容器;相关测试覆盖了非 review-thread 路径、已 reply+resolved 的通过路径、显式 escalation 的通过路径、缺少 thread_id、未 reply、未 resolved 的 fail-closed 路径,以及 controller-facing wrapper。

source-regression 也覆盖了新增规则:SKILL.md 的 GitHub State Contract 必须写明 PR review comment fix 需要原线程 reply+resolved 或显式 escalate;meta-reflector-stalled.md 的 deterministic marker normalization 必须走 retry-fix,不走 human escalation。没有新增 sleep/delay 测试节奏、[Skip]、manual bypass、弱化断言或 mock-only pseudo-coverage。

验证命令已在 PR head archive 上跑过:python3 skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py 9 tests OK;python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py 7 tests OK。


📎 完整 codex 原始输出(存档备查)

pr: 402
role: tests
head_sha: 6fdbbba
verdict: approve

Verdict

Test coverage and test quality are adequate for the PR diff.

Evidence

  • skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:95 covers the non-review-thread path so ordinary fixes are not blocked by the new completion gate.
  • skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:104 and skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:135 cover the core business outcome: review-thread-driven fixes pass only when the original thread is replied to and resolved, and fail closed when either condition is missing.
  • skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py:155 covers the controller-facing wrapper, not only the standalone helper.
  • skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:25 adds source-regression coverage for the new GitHub State Contract wording around PR review thread closure.
  • skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py:137 adds source-regression coverage for deterministic marker normalization routing to retry-fix rather than human escalation.
  • No new sleep/delay test pacing, [Skip]/manual-test bypass, assertion weakening, or mock-only pseudo-coverage was introduced in the PR diff.
  • Verification run on PR head archive passed: python3 skills/codex-refactor-loop/scripts/test_review_fix_dispatch.py ran 9 tests OK; python3 skills/codex-refactor-loop/scripts/test_refactor_comment_policy_prompt_contract.py ran 7 tests OK.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:402:tests:approve

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor

loning commented Jun 3, 2026

🤖 fix-codex 阻塞:当前 worker 不在 PR head worktree

TL;DR

  • 这是什么:PR 402 round 1 的 reject-fix 尝试结果。
  • 现在到哪一步 / 结论是什么:architect 的 reject 是有效阻塞项,但当前目录是 auto-refact-dev,不是 PR head fix/pr-review-thread-completion-gate,不能安全改代码。
  • 需要 maintainer 做什么 OR controller 下一步:controller 需要把 fix-codex 重派到 PR head worktree 后再执行同一修复。

详细说明

阻塞需求来自 architect:skills/codex-refactor-loop/SKILL.md:504 仍写着 FIX_DONE 直接重派 reviewer,而 skills/codex-refactor-loop/SKILL.md:532 新增契约要求 review-thread-driven fix 必须回复并 resolve 原 review thread,或显式升级。这个需求应在实际 FIX_DONEdispatch_reviewers 路由前接入 validate_review_thread_completion,也就是“review-thread 完成证据校验”这个 helper,并补一条行为测试。

我没有应用代码改动,因为运行态证据显示 PR 402 的 head 是 fix/pr-review-thread-completion-gate,但当前 git branch --show-currentauto-refact-devgit worktree list --porcelain 里也没有该 PR head worktree。fix prompt 明确禁止 checkout,也禁止修改其他 cluster/PR,所以在这里修改源码会把修复落到错误分支。


📎 完整 codex 原始输出(存档备查)
# Fix report for PR 402 round 1

## Applied
- None. The current worktree is `auto-refact-dev`, while PR 402 head is `fix/pr-review-thread-completion-gate`; applying source edits here would modify the wrong branch.

## Rejected as false positive
- None.

## Blocked (cannot fix this round)
- reviewer:architect's demand: `skills/codex-refactor-loop/SKILL.md:504` still routes `FIX_DONE` directly to reviewer redispatch while the new completion contract at `skills/codex-refactor-loop/SKILL.md:532` requires review-thread reply+resolution or explicit escalation. This is a valid in-scope demand, but this fix-codex session is not on PR 402's head branch. `gh pr view 402 --json baseRefName,headRefName` reports `baseRefName=dev`, `headRefName=fix/pr-review-thread-completion-gate`; `git branch --show-current` reports `auto-refact-dev`; `git worktree list --porcelain` shows no worktree for `fix/pr-review-thread-completion-gate`. The prompt forbids checkout and says not to touch other cluster PRs, so applying the fix here would violate the branch/scope boundary.

## Build status
- build: n=skipped
- tests: n=skipped

## Recommendation for next round
- controller routes to a fix worker running in a worktree attached to PR head `fix/pr-review-thread-completion-gate`, then applies the valid architect demand: wire `validate_review_thread_completion` into the actual `FIX_DONE` to `dispatch_reviewers` path and add a behavior test proving missing review-thread reply/resolution blocks reviewer redispatch while replied+resolved or explicit escalation proceeds.
- `FIX_BLOCKED:402:round-1:other:wrong-worktree`

⟦AI:AUTO-LOOP⟧
FIX_BLOCKED:402:round-1:other:wrong-worktree

⟦AI:AUTO-LOOP⟧

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

crnd:human:auto auto-advancing, no human needed crnd:lifecycle:managed loop-managed item crnd:phase:reviewing review-gate reviewers in flight

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants