DataRecce · kentwelcome · Mar 13, 2026 · Mar 12, 2026 · Mar 12, 2026 · Mar 12, 2026
diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
@@ -18,10 +18,20 @@
         "email": "support@datarecce.io"
       }
     },
+    {
+      "name": "recce",
+      "source": "./plugins/recce",
+      "description": "Intelligent data review workflow for dbt developers",
+      "version": "0.2.0",
+      "author": {
+        "name": "DataRecce",
+        "email": "support@datarecce.io"
+      }
+    },
     {
       "name": "recce-dev",
       "source": "./plugins/recce-dev",
-      "description": "Intelligent data review workflow for dbt developers",
+      "description": "Internal development and testing tools for the Recce project",
       "version": "0.1.0",
       "author": {
         "name": "DataRecce",

diff --git a/plugins/recce-dev/.claude-plugin/plugin.json b/plugins/recce-dev/.claude-plugin/plugin.json
@@ -1,13 +1,13 @@
 {
   "name": "recce-dev",
   "version": "0.1.0",
-  "description": "Intelligent data review workflow for dbt developers — tracks model changes and triggers progressive Recce validation",
+  "description": "Internal development and testing tools for the Recce project — MCP E2E validation, benchmarking, and plugin QA",
   "author": {
     "name": "DataRecce",
     "url": "https://datarecce.io"
   },
   "homepage": "https://github.com/DataRecce/recce-claude-plugin",
   "repository": "https://github.com/DataRecce/recce-claude-plugin",
   "license": "MIT",
-  "keywords": ["recce", "dbt", "data-validation", "data-quality", "data-review"]
+  "keywords": ["recce", "testing", "e2e", "mcp-validation", "internal"]
 }
diff --git a/plugins/recce-dev/README.md b/plugins/recce-dev/README.md
@@ -1,33 +1,17 @@
 # recce-dev
 
-Intelligent data review workflow for dbt developers.
+Internal development and testing tools for the Recce project.
 
 ## What it does
 
-recce-dev automatically tracks dbt model file changes and triggers progressive data validation using Recce. When you modify a dbt model, the plugin records the change. After your dbt run or build, it dispatches an agent that runs lineage diff, row count diff, and schema diff in sequence — producing an actionable summary with risk level before changes leave your machine.
+This plugin provides tools for Recce developers to validate the `recce` plugin's MCP integration, benchmark agent performance, and run E2E validation flows. It is **not** intended for end users of Recce.
 
 ## Components
 
-- **Skill:** `/recce-review` — triggers the data review workflow; dispatches the recce-reviewer agent with tracked model context
-- **Agent:** `recce-reviewer` — runs progressive diff analysis (lineage, row count, schema) and produces a risk-assessed summary
-- **Hooks:**
-  - `SessionStart` — detects dbt project environment and starts the Recce MCP server if prerequisites are met
-  - `PostToolUse` — suggests `/recce-check` after dbt run/build commands
-  - `PreToolUse` — tracks modified dbt model files before Write/Edit operations
-- **MCP Servers:**
-  - `recce-dev` — Recce SSE server on `http://localhost:8081/sse` (local, project-scoped)
-  - `recce-docs` — Recce documentation stdio server (local path, for doc lookups)
+- **Skill:** `/mcp-e2e-validate` — runs a full E2E validation of the `recce` plugin's event chain (SessionStart → model tracking → dbt suggestion → /recce-review → cleanup) and produces a performance benchmark report
 
 ## Requirements
 
-- **Recce >= 1.39.0** installed in the project's virtual environment (`pip install "recce>=1.39.0"`) — SSE transport (`--sse` flag) requires this version
-- The virtual environment must be activated before starting a Claude Code session so `recce` is on PATH
-- dbt project with two environments configured (base + target) for comparison diffs
-- Base artifacts generated: `dbt docs generate --target-path target-base` on the comparison branch
-
-## Known Limitations
-
-- **Port hardcoded in `.mcp.json`**: The MCP server URL is `http://localhost:8081/sse`. If you override `mcp_port` in settings (e.g., `.claude/recce-dev/settings.json`), the actual server starts on the configured port but `.mcp.json` still points to 8081. Claude Code MCP config is static — dynamic port resolution requires a future Claude Code feature.
-- **Mid-session plugin install**: Installing the plugin mid-session does not activate hooks or MCP tools. Start a new Claude Code session after installation for full functionality.
-- **recce-docs MCP path**: Uses a local symlink path (`../../packages/recce-docs-mcp/dist/cli.js`) that breaks after marketplace install. Deferred to v2 (MKTD-02).
-- **HTTP-only MCP**: The `recce-dev` MCP server uses `http://localhost:8081/sse` (not HTTPS). This is expected for a local SSE server.
+- The `recce` plugin must be installed alongside this plugin
+- A dbt project with Recce configured (same requirements as the `recce` plugin)
+- Recce installed in the project's virtual environment
diff --git a/plugins/recce-dev/skills/mcp-e2e-validate/SKILL.md b/plugins/recce-dev/skills/mcp-e2e-validate/SKILL.md
@@ -0,0 +1,173 @@
+---
+name: mcp-e2e-validate
+description: >
+  This skill should be used when the user asks to "validate MCP", "run E2E",
+  "benchmark MCP performance", "test the plugin flow", "compare MCP versions",
+  "驗證 MCP", "跑 E2E", or wants to verify the recce plugin's full event
+  chain (SessionStart → model tracking → dbt suggestion → /recce-review → cleanup)
+  works end-to-end and measure agent performance metrics.
+version: 0.1.0
+---
+
+# /mcp-e2e-validate — MCP Integration E2E Validation & Benchmark
+
+Validate the recce plugin's full event chain against a real dbt project and produce a performance benchmark report. Optionally compare against a baseline to quantify improvements across recce versions or PR changes.
+
+**Dependencies:** This skill relies on the sibling `recce` plugin's scripts (`start-mcp.sh`, `stop-mcp.sh`, `check-mcp.sh`) and hooks (`track-changes.sh`, `suggest-review.sh`). It also dispatches the `recce-reviewer` agent.
+
+**Cross-plugin path:** The `recce` plugin is a sibling under the same parent directory. Use `RECCE_PLUGIN_ROOT` (derived below) to reference its scripts:
+
+```bash
+RECCE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}/../recce"
+```
+
+---
+
+## Inputs
+
+Parse user input for optional parameters:
+
+- **`--baseline`**: Previous benchmark metrics for comparison (e.g., `"tool_uses=35 tokens=30311 duration_s=483"`)
+- **`--model`**: Model to edit for testing (default: first `.sql` file under `models/staging/`)
+- **`--marker`**: Comment marker to inject (default: `-- recce-e2e-validation`)
+- **`--skip-dbt`**: Skip the `dbt run` step if models were already built
+
+If no parameters provided, use defaults and run the full flow.
+
+---
+
+## Step 1: Pre-flight
+
+Run the pre-flight check script:
+
+```bash
+bash ${CLAUDE_PLUGIN_ROOT}/skills/mcp-e2e-validate/scripts/preflight.sh
+```
+
+Parse KEY=VALUE output. Abort if any `BLOCK=` line appears — show the message verbatim.
+
+Handle warnings:
+- `SSE_SUPPORT=false` → Inform user: editable install may need `rm -rf site-packages/recce/` then `pip install -e ".[mcp]"`. See memory for details.
+- `PORT_STATUS=occupied_by_other` → Suggest changing port in `.claude/recce/settings.json`
+- `STALE_FILES=found` → Auto-clean: `rm -f /tmp/recce-mcp-*.pid /tmp/recce-changed-*.txt`
+
+Record `RECCE_VERSION` and `PORT` for the report.
+
+---
+
+## Step 2: Start MCP Server
+
+Derive the recce plugin root and run:
+
+```bash
+RECCE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}/../recce"
+bash "${RECCE_PLUGIN_ROOT}/scripts/start-mcp.sh"
+```
+
+- If `STATUS=STARTED` or `STATUS=ALREADY_RUNNING` → record `PORT` and `PID`, proceed.
+- If `ERROR=` → abort with error details.
+
+Verify with health check:
+
+```bash
+bash "${RECCE_PLUGIN_ROOT}/scripts/check-mcp.sh"
+```
+
+Confirm `RUNNING=true` before proceeding.
+
+---
+
+## Step 3: Inject Test Edit (Tier 1 Trigger)
+
+1. Select the target model file (from `--model` or default staging model).
+2. Read the file and record its original content.
+3. Append the marker comment (`-- recce-e2e-validation`) on a new line at the end.
+4. Use the Edit tool (this triggers `track-changes.sh` PostToolUse hook).
+5. Verify tracking:
+
+```bash
+PROJECT_HASH=$(printf '%s' "$PWD" | md5 2>/dev/null | cut -c1-8 || printf '%s' "$PWD" | md5sum | cut -c1-8)
+cat /tmp/recce-changed-${PROJECT_HASH}.txt
+```
+
+- File exists and contains the edited model path → **Tier 1 PASS**
+- File missing → **Tier 1 FAIL** (record and continue)
+
+---
+
+## Step 4: dbt Run (Tier 2 Trigger)
+
+Skip if `--skip-dbt` was specified.
+
+Run dbt on the modified model and downstream:
+
+```bash
+dbt run -s {model_name}+
+```
+
+- dbt completes with `PASS` → record model count. The `suggest-review.sh` hook should inject a review suggestion into context. **Tier 2 PASS**.
+- dbt fails → **Tier 2 FAIL** (record error, continue to cleanup)
+
+---
+
+## Step 5: Dispatch Review Agent
+
+Dispatch the `recce-reviewer` agent with the tracked model context:
+
+> "Changed models (from tracked file): {model_name}. Focus review on these models using selector: {model_name}+"
+
+**Capture the full agent result**, including the `<usage>` block. Extract:
+- `tool_uses` — number of MCP tool calls
+- `total_tokens` — total token consumption
+- `duration_ms` — wall-clock time
+
+Check agent output for `## Data Review Summary`. Validate against pass criteria in `references/pass-criteria.md`:
+- Concrete row count numbers (non-zero integers)
+- Risk level present (LOW/MEDIUM/HIGH)
+- Model names in summary
+- No MCP tool errors
+
+---
+
+## Step 6: Cleanup
+
+Execute in order:
+
+1. **Revert model edit** — restore the file to its original content (remove marker comment).
+2. **Stop MCP server**:
+   ```bash
+   RECCE_PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT}/../recce"
+   bash "${RECCE_PLUGIN_ROOT}/scripts/stop-mcp.sh"
+   ```
+3. **Clean tracked files**:
+   ```bash
+   PROJECT_HASH=$(printf '%s' "$PWD" | md5 2>/dev/null | cut -c1-8 || printf '%s' "$PWD" | md5sum | cut -c1-8)
+   rm -f "/tmp/recce-changed-${PROJECT_HASH}.txt"
+   ```
+4. **Stale state check** — verify no `/tmp/recce-mcp-*.pid` or `/tmp/recce-changed-*.txt` remain.
+
+---
+
+## Step 7: Produce Benchmark Report
+
+Generate the report using the template in `references/pass-criteria.md`.
+
+If `--baseline` was provided, compute deltas:
+- `delta = current - baseline`
+- `delta_pct = (delta / baseline) * 100`
+
+Present negative deltas (improvements) with emphasis.
+
+Output the full report to the user. If all pass criteria are met, end with **Verdict: PASS**. Otherwise list failures.
+
+---
+
+## Additional Resources
+
+### Reference Files
+
+- **`references/pass-criteria.md`** — Detailed pass/fail criteria per section, performance metrics extraction guide, and the benchmark report template.
+
+### Scripts
+
+- **`scripts/preflight.sh`** — Pre-flight environment checks (dbt project, recce version, SSE support, port availability, stale files). Outputs KEY=VALUE lines.
diff --git a/plugins/recce-dev/skills/mcp-e2e-validate/references/pass-criteria.md b/plugins/recce-dev/skills/mcp-e2e-validate/references/pass-criteria.md
@@ -0,0 +1,77 @@
+# E2E Pass Criteria
+
+## Section-Level Checks
+
+| Section | Check | Pass Condition |
+|---------|-------|----------------|
+| Pre-flight | dbt project detected | `DBT_PROJECT=true` |
+| Pre-flight | recce installed with SSE | `SSE_SUPPORT=true` |
+| Pre-flight | Artifacts exist | `TARGET_EXISTS=true` AND `TARGET_BASE_EXISTS=true` |
+| MCP Startup | start-mcp.sh succeeds | `STATUS=STARTED` or `STATUS=ALREADY_RUNNING` |
+| MCP Health | check-mcp.sh confirms | `RUNNING=true` |
+| Tier 1 Track | Edit hook records model | File `/tmp/recce-changed-{hash}.txt` contains edited model path |
+| Tier 2 Suggest | dbt run triggers suggestion | Hook injects "Consider running /recce-review" context |
+| Review Agent | Summary produced | Output contains `## Data Review Summary` |
+| Review Agent | Concrete row counts | At least one model shows non-zero integer in both base and current |
+| Review Agent | Risk level present | Summary contains `LOW`, `MEDIUM`, or `HIGH` |
+| Review Agent | Model names present | Changed model name appears in summary |
+| Review Agent | No MCP errors | All MCP tool calls complete without connection/timeout errors |
+| Cleanup | Model reverted | Edited file restored to original |
+| Cleanup | MCP stopped | stop-mcp.sh returns `STATUS=STOPPED` |
+| Stale State | No leftovers | No `/tmp/recce-mcp-*.pid` or `/tmp/recce-changed-*.txt` remaining |
+
+## Performance Metrics to Capture
+
+From the review agent dispatch result, extract:
+
+| Metric | Source | Format |
+|--------|--------|--------|
+| `tool_uses` | Agent result `<usage>` block | Integer |
+| `total_tokens` | Agent result `<usage>` block | Integer |
+| `duration_ms` | Agent result `<usage>` block | Integer (convert to seconds for display) |
+
+## Benchmark Report Template
+
+```markdown
+## MCP E2E Benchmark Report
+
+**Date:** {YYYY-MM-DD}
+**recce version:** {version}
+**Project:** {dbt_project_name}
+**Environment:** {adapter_type} (dual-env | single-env)
+**Test model:** {model_name}
+
+### Event Chain Results
+
+| Step | Result | Notes |
+|------|--------|-------|
+| Pre-flight | {PASS/FAIL} | {details} |
+| MCP Startup | {PASS/FAIL} | Port {port}, PID {pid} |
+| Tier 1 Tracking | {PASS/FAIL} | |
+| Tier 2 Suggestion | {PASS/FAIL} | |
+| Review Agent | {PASS/FAIL} | Risk: {level} |
+| Cleanup | {PASS/FAIL} | |
+
+### Agent Performance
+
+| Metric | Value |
+|--------|-------|
+| Tool calls | {N} |
+| Tokens consumed | {N} |
+| Wall-clock time | {N}s |
+
+### Comparison (if baseline provided)
+
+| Metric | Baseline | Current | Delta |
+|--------|----------|---------|-------|
+| Tool calls | {N} | {N} | {±N} ({±%}) |
+| Tokens | {N} | {N} | {±N} ({±%}) |
+| Time | {N}s | {N}s | {±N}s ({±%}) |
+
+### Data Review Summary
+
+{Paste the agent's ## Data Review Summary output here}
+
+### Verdict: {PASS / FAIL}
+{If FAIL: list which criteria failed}
+```