Skip to content

feat: /recce-analyze + intent routing + Codex parity (M1, M4)#26

Open
even-wei wants to merge 2 commits intomainfrom
spacedock-ensign/cascade-003-pre-pr-one-sentence
Open

feat: /recce-analyze + intent routing + Codex parity (M1, M4)#26
even-wei wants to merge 2 commits intomainfrom
spacedock-ensign/cascade-003-pre-pr-one-sentence

Conversation

@even-wei
Copy link
Copy Markdown
Contributor

@even-wei even-wei commented May 5, 2026

Summary

  • Adds /recce-analyze merged bootstrap command (plugins/recce-quickstart/commands/recce-analyze.md): orchestrates 8 steps — recce check-base → branch on recommendation → prepare base artifacts → start MCP server → call 4 MCP tools → compose 4-section markdown summary.
  • Adds deprecation > [!WARNING] blockquotes to recce-setup.md and recce-pr.md pointing users to /recce-analyze.
  • Updates recce-guide/SKILL.md: /recce-analyze is now the primary command; 10 canonical trigger phrases added; legacy commands section preserved.
  • Adds AGENTS.md Codex routing block: 10 trigger phrases wire to the PR Impact Analysis orchestration block (Codex parity, AC-8).
  • Adds tests/fixtures/trigger_phrases.txt (10 phrases) and test suites (test_merged_bootstrap.py 6 tests, test_intent_routing.py 4 tests) — 10/10 pass.

Test plan

  • uv run pytest tests/test_merged_bootstrap.py tests/test_intent_routing.py -v

Cascade self-review evidence

Triplet review at every stage. Mechanical correctness verified:

  • 10/10 tests pass (test_merged_bootstrap.py 6 tests, test_intent_routing.py 4 tests)
  • recce-analyze.md exists with all 8 orchestration steps
  • 10 phrases in tests/fixtures/trigger_phrases.txt
  • Deprecation notices added to recce-setup.md + recce-pr.md
  • AGENTS.md "PR Impact Analysis" section added
  • Reviewer traces in spacedock-ops/docs/cascade/003-pre-pr-one-sentence/_trace/: design / build / verify / deliver all PASS first-try

Human reviewer must check (cascade can't verify these)

# Concern What to look for
1 git stash; git checkout <base>; dbt build; git checkout <target>; git stash pop robustness What if the user has uncommitted untracked files? Conflicting stash on pop? What if git checkout <target> fails (the user's branch was deleted upstream)? The agent could leave the working tree on the wrong branch. Read step 3-4 of recce-analyze.md carefully and consider edge cases.
2 Codex AGENTS.md parity is byte-equivalent Build runner claimed "steps 3-7 identical between recce-analyze.md and AGENTS.md Codex section — no branching logic except ${CLAUDE_PLUGIN_ROOT} path variable." Diff the two sections directly; confirm no drift in error handling, fallback paths, or wording.
3 Trigger phrase false-positives "what changed vs main", "review my changes", "run recce" — broad phrases. Could collide with non-Recce intents in dbt projects (e.g., a user asking "review my changes" generically). The skill's "When to Activate" section needs to be specific enough to avoid hijacking unrelated requests.
4 Deprecation timeline /recce-setup and /recce-pr get deprecation blockquotes but stay live. When does deprecation → removal? Is there telemetry to track usage and a plan to deprecate?
5 Merge order with paired PR Depends on DataRecce/recce#1353 — that PR must merge first. Otherwise /recce-analyze invokes recce check-base that doesn't exist on main. Tests in this PR mock the orchestration so CI passes independently.

References

Resolves recce-pre-pr-summary-in-one-sentence-claude-code-and-codex-2fd8dfb5866e (M1, M4)

Depends on: DataRecce/recce#1353 (M2, the recce check-base subcommand) — merge that first.

🤖 Generated with cascade workflow (triplet review at every stage: design / build / verify / deliver — all PASS).

New commands/recce-analyze.md delivers the one-shot Recce setup + PR impact
analysis command. Steps 1-8 cover prerequisites, branch detection, base
strategy (recce check-base), target artifacts, MCP server start, four-tool
analysis, markdown output, and timing guard.

feat(commands): deprecate /recce-setup and /recce-pr in favor of /recce-analyze

Adds deprecation blockquotes at the top of both files per design R1
mitigation; existing steps are unchanged.

feat(skills): wire 10 canonical trigger phrases to /recce-analyze (M4, AC-5)

Updates recce-guide SKILL.md with a 'Canonical Trigger Phrases' subsection
listing the 10 phrases that MUST trigger /recce-analyze. Promotes
/recce-analyze to primary command; demotes /recce-setup and /recce-pr to
'Legacy commands' subsection.

feat(codex): mirror /recce-analyze flow in AGENTS.md for Codex parity (M1, AC-8)

Adds '## PR Impact Analysis (One-Sentence Trigger)' section mirroring
recce-analyze.md steps 3-7. Same 10 trigger phrases. Pinned convention
version comment per R6.

test: phrase-fixture routing + merged-bootstrap structure assertions (AC-1, AC-2, AC-3, AC-5, AC-8)

Adds pyproject.toml for Python test runner; 15 tests across
test_intent_routing.py and test_merged_bootstrap.py; all pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
@wcchang1115
Copy link
Copy Markdown

Code Review: PR #26/recce-analyze merged bootstrap

HEAD reviewed: d79ae00 on spacedock-ensign/cascade-003-pre-pr-one-sentence
Passes run: A, C, D, E, F (logic & procedure focus per request)

The bulk of the diff is .md, but those files are executable procedure — Claude Code and Codex run them as instructions. Reviewed as code, not docs.

Verdict: NO-GO — 2 BLOCKERs, 5 ISSUEs, 3 NOTEs


🔴 BLOCKERs

1. docs_generate path writes the target manifest into target-base/plugins/recce-quickstart/commands/recce-analyze.md:71, AGENTS.md:37-38

When recce check-base returns docs_generate, the procedure runs dbt docs generate --target-path target-base while the working tree is on the target branch. dbt docs generate compiles the current working tree's SQL and rewrites both manifest.json and catalog.json in --target-path. Result: target-branch SQL gets written into target-base/, corrupting the base artifacts the MCP server consumes one step later. Diffs in Step 6 silently come back wrong.

The only scenario where this doesn't fire is if the target branch's manifest happens to equal the base branch's manifest — near-zero probability, since divergent SQL is the whole point of having a target branch.

The sibling full_build branch (lines 76-81) gets it right by checking out <base-branch> first. Fix: mirror the stash/checkout dance in the docs_generate path, or swap to a command that only refreshes the catalog without rewriting the manifest.

2. AGENTS.md (Codex path) has no prereq phaseAGENTS.md:25-29

"Assumes prerequisites (dbt, recce installed; dbt_project.yml present) and branch detection are already satisfied."

There is no companion section in AGENTS.md that performs them. The trigger-phrase block at AGENTS.md:9-11 fires immediately ("Execute immediately without asking for confirmation"), so Codex jumps straight to recce check-base. Fresh dbt project + missing recce install = opaque failure. Fix: add a Codex-side Step 0/1/2 prereq + branch-detection block, OR gate the trigger on dbt_project.yml + recce --version.


🟡 ISSUEs

1. git stash; checkout; build; checkout; pop dance is unsaferecce-analyze.md:75-81, AGENTS.md:39

  • No-changes pop-unrelated-stash: git stash with a clean tree creates no entry; later git stash pop pops a previously-unrelated stash → silent corruption.
  • Mid-flight dbt build failure → user stranded on <base-branch> with stash unrestored.
  • Untracked-file collision blocks git checkout <base-branch>.
  • Conflict on git stash pop leaves markers; agent has no detection.

This is a pre-existing pattern from /recce-setup, but the new auto-trigger (ISSUE 3) widens the blast radius. Fix: git stash push --include-untracked -m "recce-analyze-<ts>", capture stash-id, pop only by exact message, wrap with trap to always return to target branch.

2. "Byte-equivalent" parity claim is false — and not test-enforced. Divergences between recce-analyze.md and AGENTS.md:

Aspect recce-analyze.md AGENTS.md
Staleness warning text full sentence (:71) truncated to "Refreshing…" (:37)
Plugin-root var ${CLAUDE_PLUGIN_ROOT} (real) ${CODEX_PLUGIN_ROOT} (:49) — not a real env var
MCP tool names mcp__recce__impact_analysis, … (:115-119) bare impact_analysis, … (:54-57)
Error Recovery section present (:160-172) absent

The MCP tool-name divergence is the most concerning — at most one is correct.

The test suite is structurally blind to all of this: test_four_mcp_tools_referenced only reads recce-analyze.md; test_stale_base_warning_in_output only reads recce-analyze.md. AGENTS.md drift passes CI invisibly. Fix: resolve the divergences and add a parity test that diffs the relevant strings between the two files.

3. Trigger-phrase auto-execute is too aggressiveAGENTS.md:9-11, mirrored in plugins/recce-quickstart/skills/recce-guide/SKILL.md:24. Phrases like "review my changes", "show me what broke", "compare my branch to main", "what changed vs main" collide with extremely common non-Recce intents in any dbt project (git diff, code review, test failure explanation). With "execute immediately without asking for confirmation", any of these unilaterally fires recce check-base and potentially the multi-minute dbt build (the unsafe dance from ISSUE 1). The "(in dbt project context)" qualifier on "review my changes" doesn't narrow anything — dbt-project context is already the gating predicate. Fix: tighten the phrase list, or insert a single-line Y/N prompt before any branch-mutating step.

4. AC-2 / AC-3 tests are vacuous string-match proxiestests/test_merged_bootstrap.py:67, :82

assert "120" in text            # passes on "120 lines of code"
assert "stale" in text.lower()  # passes on "stalemate"

Both pass on the procedurally-broken docs_generate path (BLOCKER 1) and on AGENTS.md's truncated warning (ISSUE 2). The docstring acknowledges these are deferred to "cascade-005 E2E suite", but cascade-005 has no commitment date in this PR. AC-2 and AC-3 should not be claimed validated. Cheap fix: tighten to re.search(r'\b120\s*s\b', text) and "Base artifacts are stale" in text. (Note: test_recce_analyze_output_sections at :46 is not vacuous — it checks full section headers and is adequate.)

5. SKILL.md SessionStart suggestion still routes new users to deprecated /recce-setupSKILL.md:83-87

I notice this is a dbt project! ...
Try `/recce-setup` to get started, or ask me about: ...

The same SKILL.md adds /recce-analyze as the canonical trigger and "primary command" higher up (lines 21-41), but this prose block — which is what a fresh user actually sees on session start — still tells them to run the deprecated command as their first action. Direct consistency failure introduced by this PR. Fix: swap to /recce-analyze in the SessionStart block.


📘 NOTEs

1. recce check-base has no in-procedure fallbackrecce-analyze.md:62-66. Until recce#1353 ships, the command fails non-zero; the procedure has no handling for exit≠0, non-JSON stdout, or unknown recommendation. The author has an explicit merge-order plan (recce#1353 first), and the agent can recover gracefully on the fly, so this is a NOTE not a blocker. A one-line fallback ("if recce check-base fails or returns non-JSON, fall back to legacy setup and tell the user to upgrade recce") is cheap insurance and removes the merge-order coupling.

2. Deprecation banners on /recce-setup and /recce-pr lack removal timeline and usage telemetry — no signal for when it's safe to delete. (Author's PR description #4 already flags this.)

3. recce-pr.md:24 (untouched) still tells "MCP not running" users to run /recce-setup — also deprecated. A user landing on /recce-pr is sent on a two-hop deprecated path with no pointer to /recce-analyze. Fix: swap /recce-setup/recce-analyze in that branch.


Verification

$ uv run pytest tests/ -v
========================== 10 passed in 0.02s ==========================

Tests pass — but per ISSUE 4 and the test-blindness point in ISSUE 2, passing these tests does not imply procedural correctness.


What I could not verify

  • Exact semantics of recce check-base docs_generate (recce#1353). BLOCKER 1 holds regardless: dbt docs generate --target-path target-base from the target branch writes target SQL into target-base/ no matter what the recommendation means.
  • End-to-end execution against a real dbt project (no fixture project in this repo).
  • Whether Codex's MCP tool-naming convention matches mcp__recce__* or bare names — needs domain confirmation.

What I looked for and did not find

Type/signature mismatches in tests; secrets or PII; new race conditions in start-mcp.sh (unchanged); coverage regression (no source code, only orchestration markdown); new automatic side effects in hooks (none).


🤖 Generated with Claude Code via /recce-dev:claude-code-review

Copy link
Copy Markdown

@wcchang1115 wcchang1115 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi even,
I'm not sure the codex-related change should be a blocker.
Others need further checking.

@even-wei
Copy link
Copy Markdown
Contributor Author

even-wei commented May 6, 2026

Yup, I'm setting up Codex to check all of them now.

Address all blockers and issues from @wcchang1115's review on PR #26
(`/recce-dev:claude-code-review` output). Captain confirmed scope.

BLOCKER 1 — corrupt-base via docs_generate (recce-analyze.md, AGENTS.md)
  `dbt docs generate --target-path target-base` was running while the
  working tree was on the *target* branch, so target SQL got compiled
  into base/manifest.json. Both the `docs_generate` and `full_build`
  paths now run from the base branch via the safe stash dance.

BLOCKER 2 — AGENTS.md had no prereq / branch detection
  Added Step 1 (prereqs: dbt_project.yml, dbt, recce) and Step 2
  (branch detection) to AGENTS.md so the trigger-phrase entry point
  cannot jump straight into `recce check-base` on a fresh project.

ISSUE 1 — unsafe stash dance
  Replaced `git stash; checkout; build; checkout; pop` with named-stash
  + trap pattern: `git stash push --include-untracked -m recce-analyze-<ts>`,
  pop only by exact message, EXIT trap restores target branch on failure.
  Mirrored in both recce-analyze.md and AGENTS.md.

ISSUE 2 — false "byte-equivalent parity" claim
  Reconciled divergences between recce-analyze.md and AGENTS.md:
  - Full staleness warning sentence in both (was truncated in AGENTS.md)
  - `${CLAUDE_PLUGIN_ROOT}` in both (dropped fictional `${CODEX_PLUGIN_ROOT}`)
  - `mcp__recce__*` MCP tool prefix in both (was bare names in AGENTS.md)
  - Error Recovery section added to AGENTS.md
  Added 4 parity tests (test_stale_warning_parity_claude_codex,
  test_mcp_tool_names_parity, test_no_codex_plugin_root_in_agents_md,
  test_safe_stash_dance_in_both_paths) so drift fails CI.

ISSUE 3 — overly aggressive trigger phrases
  Removed "review my changes" and "show me what broke" from the trigger
  set (collide with general code review / test triage). Trigger set is
  now 8 phrases. Added an explicit confirmation gate inside Step 3
  before any `git checkout` or `dbt build`, regardless of how the
  command was invoked. Updated SKILL.md to clarify trigger phrases
  propose `/recce-analyze` rather than execute it silently.

ISSUE 4 — vacuous AC-2 / AC-3 string matches
  Tightened `"120" in text` to anchored regex `\b120\s*s\b` and
  `"stale" in text.lower()` to exact match on the full warning sentence.

ISSUE 5 — SKILL.md SessionStart still routed to deprecated /recce-setup
  Swapped to `/recce-analyze`. Same fix in recce-pr.md and recce-check.md
  ("MCP not running" branch) — those previously sent users on a two-hop
  deprecated path with no pointer to the canonical command.

NOTE 1 — added one-line fallback for `recce check-base` failure
  (non-zero, non-JSON, or unknown recommendation) in both files.

Tests: 16 passed (was 10), +6 parity tests catching the drift above.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: even-wei <evenwei@infuseai.io>
@even-wei
Copy link
Copy Markdown
Contributor Author

even-wei commented May 7, 2026

@wcchang1115 thanks for the thorough review — addressed all blockers + issues + 2 of 3 notes in fb7146c.

BLOCKERs

  1. docs_generate corrupts target-base/ — Fixed in both commands/recce-analyze.md and AGENTS.md. Both docs_generate and full_build now run from the base branch via the safe stash dance (see ISSUE 1 below). The docs_generate path no longer compiles target-branch SQL into base manifests.

  2. AGENTS.md missing prereq + branch detection — Added Step 1 — Prerequisites (dbt_project.yml / dbt --version / recce --version) and Step 2 — Branch detection to AGENTS.md, mirroring recce-analyze.md. Trigger phrases can no longer skip straight to recce check-base on a fresh project. (You also flagged this might not warrant blocker status — addressed anyway since the prereq layer was genuinely absent.)

ISSUEs

  1. Unsafe stash dance — Replaced git stash; checkout; build; checkout; pop with the named-stash + EXIT-trap pattern in both files: git stash push --include-untracked -m "recce-analyze-<ts>", capture stash-id, pop only by exact message, trap cleanup EXIT always returns the user to the target branch and surfaces unpopped stashes instead of swallowing them.

  2. "Byte-equivalent parity" was false — Reconciled all four divergences:

    • Full staleness warning sentence (verbatim) now appears in AGENTS.md too.
    • Dropped fictional ${CODEX_PLUGIN_ROOT} — both files use ${CLAUDE_PLUGIN_ROOT} and AGENTS.md instructs Codex to substitute the literal path (since Codex doesn't export the var).
    • MCP tool names use mcp__recce__* prefix in both files (matches the actual server-registered names).
    • Added Error Recovery section to AGENTS.md.

    Added 4 parity tests in test_merged_bootstrap.py so any future drift fails CI:

    • test_stale_warning_parity_claude_codex
    • test_mcp_tool_names_parity
    • test_no_codex_plugin_root_in_agents_md
    • test_safe_stash_dance_in_both_paths
  3. Trigger-phrase auto-execute too aggressive — Two fixes layered:

    • Dropped the two most ambiguous phrases ("review my changes", "show me what broke"). Trigger set is now 8 phrases. Fixture + count test updated.
    • Added an explicit Confirmation gate inside Step 3, REQUIRED before any git checkout or dbt build. Both files now print a one-line summary (base/target/expected runtime) and require y/N confirmation regardless of how the flow was triggered. SKILL.md now says trigger phrases propose /recce-analyze rather than execute it silently.
  4. Vacuous AC-2 / AC-3 tests — Tightened:

    • AC-2: "120" in text → anchored regex re.search(r"\b120\s*s\b", text).
    • AC-3: "stale" in text.lower() → exact substring match on the full warning sentence "⚠️ Base artifacts are stale. Refreshing with dbt docs generate…".
  5. SKILL.md SessionStart routed to deprecated /recce-setup — Swapped to /recce-analyze. Same fix applied to commands/recce-pr.md and commands/recce-check.md STATUS=NOT_RUNNING branches (NOTE 3 below).

NOTEs

  1. recce check-base no in-procedure fallback — Added a one-line fallback in both files: on non-zero exit / non-JSON / unknown recommendation, recommend pip install -U recce and fall back to full_build via the safe stash dance. Removes the recce#1353 merge-order coupling.

  2. Deprecation banners lack timeline / telemetry — Acknowledged but not addressed in this PR (out of scope; tracked separately). Will file a follow-up.

  3. recce-pr.md:24 routed to deprecated /recce-setup — Fixed (see ISSUE 5).

Verification

$ uv run pytest tests/ -v
============================== 16 passed in 0.03s ==============================

10 → 16 tests; the 6 new tests are the parity guards above plus the prereq + Error Recovery checks for AGENTS.md. All passing.

Ready for re-review.

@even-wei even-wei requested a review from wcchang1115 May 7, 2026 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants