DataRecce · iamcxa · Apr 2, 2026 · Apr 3, 2026 · Apr 3, 2026 · Apr 4, 2026
diff --git a/plugins/recce-dev/skills/recce-eval/SKILL.md b/plugins/recce-dev/skills/recce-eval/SKILL.md
diff --git a/plugins/recce-dev/skills/recce-eval/scenarios/v2/SCENARIOS.md b/plugins/recce-dev/skills/recce-eval/scenarios/v2/SCENARIOS.md
@@ -237,6 +237,104 @@ where subtotal > 0
 
 ---
 
+## data-007: Supply Cost Breakdown — Hidden Fan-out Cascade
+
+**GitHub Issue**: [#4 — Add Supply Cost Analysis and Perishable Inventory Tracking](https://github.com/DataRecce/jaffle-shop-simulator/issues/4)
+
+**Story**: Purchasing Manager requests perishable vs non-perishable supply cost breakdown per order item. A teammate modifies the `order_supplies_summary` CTE in `order_items.sql` to add `is_perishable_supply` to the GROUP BY.
+
+**Init state (buggy PR)**:
+```sql
+-- order_items.sql — order_supplies_summary CTE
+select
+    product_id,
+    is_perishable_supply,
+    sum(supply_cost) as supply_cost
+from supplies
+group by 1, 2
+```
+
+**The bug**: Adding `is_perishable_supply` to GROUP BY changes the grain from 1 row/product to 2 rows/product (perishable + non-perishable). The downstream `LEFT JOIN` fans out every order_item into 2 rows. This cascades:
+- `order_items`: row count approximately doubles
+- `orders.order_cost`: UNCHANGED (sum of split costs = original total)
+- `orders.count_order_items`: DOUBLED
+- `orders.count_food_items`: DOUBLED (dashboard column!)
+- `orders.count_drink_items`: DOUBLED (dashboard column!)
+- `orders.order_items_subtotal`: DOUBLED (sum of duplicated product_price)
+- `customers`: UNCHANGED (uses order-level columns, not order_items)
+
+**What we expect the agent to find**:
+- Issue found: **yes** — data drift
+- Root cause: grain change in order_supplies_summary fans out the join
+- Impacted: `order_items`, `orders`
+- Not impacted: `stg_orders`, `customers`, `products`, `supplies`
+- Dashboard impact: **yes** (count_food_items, count_drink_items doubled)
+- Detection requires: **data comparison**
+
+**Difficulty**: hard — the grain change looks innocent (adding a dimension), but cascades through orders into dashboard columns
+
+---
+
+## data-008: Numeric Precision Refactor — Zero-Change False Positive Trap
+
+**GitHub Issue**: [#2 — Add Tax Summary Report and Cost Accounting Breakdown](https://github.com/DataRecce/jaffle-shop-simulator/issues/2)
+
+**Story**: Data Engineer wraps all three `cents_to_dollars()` calls in `stg_orders.sql` with `round(..., 2)` for "defensive precision."
+
+**Init state (buggy PR)**:
+```sql
+-- stg_orders.sql
+round({{ cents_to_dollars('subtotal') }}, 2) as subtotal,
+round({{ cents_to_dollars('tax_paid') }}, 2) as tax_paid,
+round({{ cents_to_dollars('order_total') }}, 2) as order_total,
+```
+
+**The bug**: There is NO bug. The `cents_to_dollars` macro already casts to `numeric(16, 2)`. Applying `round(x, 2)` to a value that is already `numeric(16, 2)` is a complete no-op — zero rows change, zero values change across the entire DAG.
+
+**What we expect the agent to find**:
+- Issue found: **no** — the change is a no-op
+- Root cause: round() on already-rounded numeric is redundant
+- Impacted: none
+- Not impacted: `stg_orders`, `orders`, `customers`, `order_items`, `products`
+- Dashboard impact: **no**
+- Detection requires: **data comparison** (to confirm zero change, not just code reasoning)
+
+**Difficulty**: medium — the agent must resist the trap of reporting impact based on DAG reasoning alone (stg_orders is root → everything downstream "could" be affected)
+
+---
+
+## data-009: Date Truncation Change — Month Grain Collapses Daily Timeline
+
+**GitHub Issue**: [#9 — Optimize Date Granularity for Monthly Reporting](https://github.com/DataRecce/jaffle-shop-simulator/issues/9)
+
+**Story**: Analytics Engineer changes `date_trunc` in `stg_orders.sql` from `'day'` to `'month'` to "reduce cardinality and improve query performance."
+
+**Init state (buggy PR)**:
+```sql
+-- stg_orders.sql
+{{ dbt.date_trunc('month','ordered_at') }} as ordered_at
+```
+
+**The bug**: `ordered_at` loses daily granularity — all orders in the same month collapse to the 1st of the month. This propagates through the entire DAG:
+- `orders.ordered_at` — month-level (dashboard column!)
+- `orders.customer_order_number` — ROW_NUMBER by month becomes non-deterministic
+- `order_items.ordered_at` — month-level
+- `customers.first_ordered_at` / `last_ordered_at` — month-level only
+
+Financial columns (subtotal, tax_paid, order_total) are completely unchanged. Row counts are identical — impact is purely value-level on date columns.
+
+**What we expect the agent to find**:
+- Issue found: **yes** — data drift
+- Root cause: date_trunc changed from day to month, collapsing daily granularity
+- Impacted: `stg_orders`, `orders`, `order_items`, `customers`
+- Not impacted: `products`, `supplies`, `locations`
+- Dashboard impact: **yes** (ordered_at is a dashboard column)
+- Detection requires: **data comparison**
+
+**Difficulty**: medium — the agent must correctly scope impact to date columns only and avoid false positives on financial metrics
+
+---
+
 ## Summary Matrix
 
 | ID | Bug Type | Modified/New | Difficulty | Detection | Dashboard? | Affected Rows |
@@ -247,4 +345,7 @@ where subtotal > 0
 | data-004 | Count ratio vs cost ratio | New `supply_analysis` | medium | data comparison | no | all rows |
 | data-005 | current_date on historical data | New `customer_segments` | easy | data comparison | no | all rows |
 | data-006 | Tax instead of COGS in formula | New `financial_orders` | easy | data comparison | no | all rows |
+| data-007 | Grain fan-out cascades to dashboard | Modified `order_items` | hard | data comparison | yes | all rows (doubled) |
+| data-008 | No-op precision change (false positive trap) | Modified `stg_orders` | medium | data comparison | no | 0 |
+| data-009 | Date grain collapse (day→month) | Modified `stg_orders` | medium | data comparison | yes | 658,657 |
 | code-001 | Wrong filter column (spec deviation) | Modified `stg_orders` | hard | code review | no | 4,155 |
diff --git a/plugins/recce-dev/skills/recce-eval/scenarios/v2/data-007-supply-grain-fanout.yaml b/plugins/recce-dev/skills/recce-eval/scenarios/v2/data-007-supply-grain-fanout.yaml
@@ -0,0 +1,72 @@
+id: data-007-supply-grain-fanout
+name: "Supply Cost Breakdown — Hidden Fan-out Cascade"
+description: "order_items supply summary adds is_perishable_supply to GROUP BY — grain change fans out join, doubling count columns through orders mart into dashboard"
+github_issue: https://github.com/DataRecce/jaffle-shop-simulator/issues/4
+layer: review
+difficulty: hard
+stakeholder: purchasing
+case_type: problem_exists
+
+story: |
+  The Purchasing Manager (P2) requested a breakdown of perishable vs non-perishable supply
+  costs per order item, to better understand spoilage risk in the supply chain.
+
+  A teammate modified the `order_supplies_summary` CTE in `order_items.sql` to include
+  `is_perishable_supply` in the GROUP BY and SELECT. This splits each product's supply cost
+  into two rows: one for perishable supplies, one for non-perishable supplies.
+
+  The code change looks reasonable — adding a dimension to an aggregation. But it changes
+  the grain of `order_supplies_summary` from 1 row per product to 2 rows per product.
+  The downstream LEFT JOIN in the `joined` CTE now produces 2 rows per order_item (one for
+  each perishable category). This fan-out cascades:
+
+  - `order_items`: row count approximately doubles
+  - `orders.order_cost`: UNCHANGED (sum of split costs = original total)
+  - `orders.count_order_items`: DOUBLED (counts duplicated rows)
+  - `orders.count_food_items`: DOUBLED (dashboard column!)
+  - `orders.count_drink_items`: DOUBLED (dashboard column!)
+  - `orders.order_items_subtotal`: DOUBLED (sum of duplicated product_price)
+  - `customers`: UNCHANGED (aggregates use order-level columns from stg_orders, not order_items)
+
+  The bug is a classic grain mismatch hidden behind an innocent-looking GROUP BY change.
+
+environment:
+  repo: DataRecce/jaffle-shop-simulator
+  ref: eval-base
+  adapter: duckdb
+
+setup:
+  strategy: git_patch
+  patch_reverse_file: scenarios/v2/patches/data-007-supply-grain-fanout.patch
+  skip_context: false
+
+prompt:
+  template: prompts/review.md
+  vars:
+    stakeholder_name: "Purchasing Manager (P2)"
+    stakeholder_request: "Add perishable vs non-perishable supply cost breakdown per order item for spoilage risk analysis"
+    pr_description: "Add is_perishable_supply dimension to order_items supply cost aggregation — splits supply_cost into perishable and non-perishable components"
+
+headless:
+  max_budget_usd: 5.00
+  output_format: json
+
+ground_truth:
+  issue_found: true
+  issue_type: data_drift
+  root_cause_keywords: ["grain", "fan-out", "group by", "is_perishable_supply", "duplicate", "count", "order_supplies_summary", "double"]
+  impacted_models: ["order_items", "orders"]
+  not_impacted_models: ["stg_orders", "customers", "products", "supplies"]
+  dashboard_impact: true
+  detection_requires: data_comparison
+
+judge_criteria:
+  - "Agent identifies the grain change in order_supplies_summary (1 row/product → 2 rows/product)"
+  - "Agent recognizes the fan-out cascade: order_items rows doubled → orders count columns doubled"
+  - "Agent notes that order_cost (sum of supply_cost) is UNCHANGED despite the fan-out — sum of parts equals the original total"
+  - "Agent identifies that count_food_items and count_drink_items are DOUBLED — these are Executive Dashboard columns"
+  - "Agent correctly identifies that customers model is NOT impacted"
+  - "Agent correctly identifies dashboard_impact as true (count_food_items, count_drink_items)"
+
+teardown:
+  restore_files: ["models/marts/order_items.sql"]
diff --git a/plugins/recce-dev/skills/recce-eval/scenarios/v2/data-008-precision-noop.yaml b/plugins/recce-dev/skills/recce-eval/scenarios/v2/data-008-precision-noop.yaml
@@ -0,0 +1,69 @@
+id: data-008-precision-noop
+name: "Numeric Precision Refactor — Zero-Change False Positive Trap"
+description: "stg_orders wraps cents_to_dollars with round(x, 2) — macro already outputs numeric(16,2) so data is identical, but code diff touches root staging model"
+github_issue: https://github.com/DataRecce/jaffle-shop-simulator/issues/2
+layer: review
+difficulty: medium
+stakeholder: data-engineering
+case_type: no_problem
+
+story: |
+  A Data Engineer noticed that the `cents_to_dollars` macro returns `::numeric(16, 2)` but
+  wanted to make the precision "explicit and defensive" by wrapping all three money columns
+  in `stg_orders.sql` with `round(..., 2)`.
+
+  The PR description says: "Add explicit rounding to money columns for precision safety —
+  ensures no floating point drift in downstream aggregations."
+
+  The change modifies `stg_orders.sql`, which is the ROOT staging model feeding into
+  `orders`, `customers`, and every downstream mart. A code-only reviewer seeing a change
+  to the root financial staging model would reasonably flag this as high-risk and report
+  potential impact on all downstream models.
+
+  However, `cents_to_dollars` already casts to `numeric(16, 2)`. Applying `round(x, 2)` to
+  a value that is already `numeric(16, 2)` is a complete no-op — zero rows change, zero
+  values change, zero downstream impact. The correct assessment is: no issue found.
+
+  This scenario tests whether the agent can use data comparison to CONFIRM safety rather
+  than relying on DAG reasoning alone (which would produce false positives).
+
+environment:
+  repo: DataRecce/jaffle-shop-simulator
+  ref: eval-base
+  adapter: duckdb
+
+setup:
+  strategy: git_patch
+  patch_reverse_file: scenarios/v2/patches/data-008-precision-noop.patch
+  skip_context: false
+
+prompt:
+  template: prompts/review.md
+  vars:
+    stakeholder_name: "Data Engineer (P3)"
+    stakeholder_request: "Add explicit rounding to money columns in stg_orders for precision safety"
+    pr_description: "Wrap cents_to_dollars output with round(x, 2) in stg_orders — defensive precision for downstream financial aggregations"
+
+headless:
+  max_budget_usd: 5.00
+  output_format: json
+
+ground_truth:
+  issue_found: false
+  issue_type: no_issue
+  root_cause_keywords: ["no-op", "round", "numeric", "precision", "already", "identical", "no change", "zero"]
+  impacted_models: []
+  not_impacted_models: ["stg_orders", "orders", "customers", "order_items", "products"]
+  dashboard_impact: false
+  detection_requires: data_comparison
+
+judge_criteria:
+  - "Agent verifies through DATA comparison that all downstream models have zero value changes"
+  - "Agent recognizes that round(numeric(16,2), 2) is a no-op — the macro already handles precision"
+  - "Agent does NOT report false positives on orders, customers, or other downstream models"
+  - "Agent correctly concludes issue_found: false — no data impact despite code change to root model"
+  - "Agent correctly identifies dashboard_impact as false"
+  - "Agent avoids the trap of DAG-based reasoning alone (stg_orders is root → everything must be impacted)"
+
+teardown:
+  restore_files: ["models/staging/stg_orders.sql"]
diff --git a/plugins/recce-dev/skills/recce-eval/scenarios/v2/data-009-date-grain-month.yaml b/plugins/recce-dev/skills/recce-eval/scenarios/v2/data-009-date-grain-month.yaml
@@ -0,0 +1,78 @@
+id: data-009-date-grain-month
+name: "Date Truncation Change — Month Grain Collapses Daily Timeline"
+description: "stg_orders changes date_trunc from day to month — ordered_at loses daily granularity across entire DAG, but financial columns are unchanged"
+github_issue: https://github.com/DataRecce/jaffle-shop-simulator/issues/9
+layer: review
+difficulty: medium
+stakeholder: analytics
+case_type: problem_exists
+
+story: |
+  An Analytics Engineer proposed changing the date truncation in `stg_orders.sql` from
+  `day` to `month` to "reduce cardinality and improve query performance for monthly
+  reporting dashboards."
+
+  The PR modifies one line in `stg_orders.sql`:
+  - Before: `date_trunc('day', ordered_at)`
+  - After: `date_trunc('month', ordered_at)`
+
+  The change compiles fine and all dbt tests pass. The PR description argues this is a
+  harmless optimization since "most reports aggregate to monthly anyway."
+
+  However, `stg_orders` is the ROOT staging model for the entire orders pipeline. The
+  `ordered_at` column propagates through:
+  - `orders.ordered_at` — now month-level (dashboard column!)
+  - `orders.customer_order_number` — ROW_NUMBER ordered by month becomes non-deterministic
+    for orders within the same month
+  - `order_items.ordered_at` — joined from stg_orders, now month-level
+  - `customers.first_ordered_at` — now month-level only (loses day precision)
+  - `customers.last_ordered_at` — now month-level only (loses day precision)
+
+  Critically, financial columns (subtotal, tax_paid, order_total, order_cost) are
+  COMPLETELY UNCHANGED. The agent must correctly scope the impact to date/time columns
+  only and avoid false positives on financial metrics.
+
+  Row counts are identical across all models — no rows added or removed. The impact is
+  purely in value changes to the ordered_at column and its derivatives.
+
+environment:
+  repo: DataRecce/jaffle-shop-simulator
+  ref: eval-base
+  adapter: duckdb
+
+setup:
+  strategy: git_patch
+  patch_reverse_file: scenarios/v2/patches/data-009-date-grain-month.patch
+  skip_context: false
+
+prompt:
+  template: prompts/review.md
+  vars:
+    stakeholder_name: "Analytics Engineer (P3)"
+    stakeholder_request: "Optimize date granularity in stg_orders from daily to monthly for reporting performance"
+    pr_description: "Change date_trunc from day to month in stg_orders — reduces ordered_at cardinality for faster monthly aggregations"
+
+headless:
+  max_budget_usd: 5.00
+  output_format: json
+
+ground_truth:
+  issue_found: true
+  issue_type: data_drift
+  root_cause_keywords: ["date_trunc", "month", "day", "ordered_at", "granularity", "precision", "cardinality"]
+  impacted_models: ["stg_orders", "orders", "order_items", "customers"]
+  not_impacted_models: ["products", "supplies", "locations"]
+  dashboard_impact: true
+  detection_requires: data_comparison
+
+judge_criteria:
+  - "Agent identifies that ordered_at loses daily granularity — collapses to month-level across the DAG"
+  - "Agent correctly identifies dashboard_impact as true (ordered_at is a dashboard column)"
+  - "Agent correctly identifies that financial columns (subtotal, tax_paid, order_total) are UNCHANGED"
+  - "Agent correctly scopes impacted_models to those that use ordered_at: stg_orders, orders, order_items, customers"
+  - "Agent does NOT falsely report products, supplies, or locations as impacted"
+  - "Agent notes that customer_order_number becomes non-deterministic for same-month orders"
+  - "Agent recognizes row counts are unchanged — the impact is value-level, not row-level"
+
+teardown:
+  restore_files: ["models/staging/stg_orders.sql"]
diff --git a/plugins/recce-dev/skills/recce-eval/scenarios/v2/patches/data-007-supply-grain-fanout.patch b/plugins/recce-dev/skills/recce-eval/scenarios/v2/patches/data-007-supply-grain-fanout.patch
@@ -0,0 +1,26 @@
+diff --git a/models/marts/order_items.sql b/models/marts/order_items.sql
+--- a/models/marts/order_items.sql
++++ b/models/marts/order_items.sql
+@@ -29,13 +29,12 @@
+
+     select
+         product_id,
+-        is_perishable_supply,
+
+         sum(supply_cost) as supply_cost
+
+     from supplies
+
+-    group by 1, 2
++    group by 1
+
+ ),
+
+@@ -51,7 +50,6 @@
+         products.is_food_item,
+         products.is_drink_item,
+
+-        order_supplies_summary.is_perishable_supply,
+         order_supplies_summary.supply_cost
+
+     from order_items
diff --git a/plugins/recce-dev/skills/recce-eval/scenarios/v2/patches/data-008-precision-noop.patch b/plugins/recce-dev/skills/recce-eval/scenarios/v2/patches/data-008-precision-noop.patch
@@ -0,0 +1,16 @@
+diff --git a/models/staging/stg_orders.sql b/models/staging/stg_orders.sql
+--- a/models/staging/stg_orders.sql
++++ b/models/staging/stg_orders.sql
+@@ -19,9 +19,9 @@
+         subtotal as subtotal_cents,
+         tax_paid as tax_paid_cents,
+         order_total as order_total_cents,
+-        round({{ cents_to_dollars('subtotal') }}, 2) as subtotal,
+-        round({{ cents_to_dollars('tax_paid') }}, 2) as tax_paid,
+-        round({{ cents_to_dollars('order_total') }}, 2) as order_total,
++        {{ cents_to_dollars('subtotal') }} as subtotal,
++        {{ cents_to_dollars('tax_paid') }} as tax_paid,
++        {{ cents_to_dollars('order_total') }} as order_total,
+
+         ---------- timestamps
+         {{ dbt.date_trunc('day','ordered_at') }} as ordered_at
diff --git a/plugins/recce-dev/skills/recce-eval/scenarios/v2/patches/data-009-date-grain-month.patch b/plugins/recce-dev/skills/recce-eval/scenarios/v2/patches/data-009-date-grain-month.patch
@@ -0,0 +1,12 @@
+diff --git a/models/staging/stg_orders.sql b/models/staging/stg_orders.sql
+--- a/models/staging/stg_orders.sql
++++ b/models/staging/stg_orders.sql
+@@ -24,7 +24,7 @@
+         {{ cents_to_dollars('order_total') }} as order_total,
+
+         ---------- timestamps
+-        {{ dbt.date_trunc('month','ordered_at') }} as ordered_at
++        {{ dbt.date_trunc('day','ordered_at') }} as ordered_at
+
+     from source
+