prep_dgp: guarantee all (group, partition) cells in generate_ddd_panel_data

igerber · claude · igerber · commit 03fce79fc520 · 2026-05-16T12:45:00.000-04:00
The R1 review identified that independent marginal sampling of `group` and
`partition` could leave one of the four (group, partition) cells empty
under valid inputs — e.g., `n_units=4, group_frac=partition_frac=0.25`
rounds to `n_group_1=1` and `n_p1_g1=0`, so the (1, 1) cell collapses
before `TripleDifference.fit`'s 2x2x2 cell-presence check.

Switches to stratified allocation: assign `group` to its requested
fraction, then within each group stratum, draw `partition=1` at
`partition_frac`. Adds a targeted `ValueError` when the rounded cell
counts would leave any (group, partition) cell empty (with the four
cell counts in the error message so users can pick a feasible config).

Adds two regression tests:
- `test_infeasible_cell_counts_raise` exercises both the `n_units=4`
  small-marginal case and an `n_units=10, group_frac=0.1` case.
- `test_smallest_feasible_config_populates_all_cells` verifies the
  smallest feasible config (`n_units=4, fracs=0.5`) yields one unit per
  cell and that `TripleDifference.fit(..., time="post")` succeeds on
  it (the contract the docstring advertises).

Updates the `group_frac` / `partition_frac` docstring entries to describe
the stratified allocation guarantee, and the `[Unreleased]` CHANGELOG
entry to mention the cell-coverage invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
-- **`generate_ddd_panel_data` — panel-structured DGP for Triple-Difference power analysis** (`diff_diff/prep_dgp.py`). New public function exported from `diff_diff` and `diff_diff.prep` for panel DDD simulations. Cross-sectional `generate_ddd_data` remains available unchanged. Produces a balanced panel of `n_units × n_periods` with two unit-level binary dimensions (`group`, `partition`) and a derived `post = 1[period >= treatment_period]` indicator; columns: `unit, period, outcome, group, partition, post, treated, true_effect` (+ `x1, x2` when `add_covariates=True`). DDD-CPT identification holds because the `group * partition` interaction enters as a unit-level (time-invariant) term, leaving the triple-interaction `treatment_effect * group * partition * post` as the sole source of differential group × partition trend. Compatible with `TripleDifference.fit(..., time="post")` directly — users get panel-realistic unit fixed effects and within-unit serial correlation while the binary 2×2×2 estimator surface is unchanged. Validates `1 <= treatment_period < n_periods`, `group_frac` and `partition_frac` strictly in `(0, 1)`, and `n_units >= 4`. Deterministic recovery (`noise_sd=0`) matches `treatment_effect` to ~1e-15 (covered by `tests/test_prep.py::TestGenerateDddPanelData`, 14 tests). `power.simulate_power` is NOT yet auto-routed to the panel DGP for `TripleDifference` (the existing `_ddd_dgp_kwargs` registry entry still ignores `n_periods` and the existing `_check_ddd_dgp_compat` warning still fires on non-default kwargs) — that wiring is tracked as a follow-up in TODO.md.
+- **`generate_ddd_panel_data` — panel-structured DGP for Triple-Difference power analysis** (`diff_diff/prep_dgp.py`). New public function exported from `diff_diff` and `diff_diff.prep` for panel DDD simulations. Cross-sectional `generate_ddd_data` remains available unchanged. Produces a balanced panel of `n_units × n_periods` with two unit-level binary dimensions (`group`, `partition`) and a derived `post = 1[period >= treatment_period]` indicator; columns: `unit, period, outcome, group, partition, post, treated, true_effect` (+ `x1, x2` when `add_covariates=True`). DDD-CPT identification holds because the `group * partition` interaction enters as a unit-level (time-invariant) term, leaving the triple-interaction `treatment_effect * group * partition * post` as the sole source of differential group × partition trend. Compatible with `TripleDifference.fit(..., time="post")` directly — users get panel-realistic unit fixed effects and within-unit serial correlation while the binary 2×2×2 estimator surface is unchanged. **Stratified allocation:** the partition split is drawn stratified-by-group at the requested `partition_frac` so every `(group, partition)` cell receives at least one unit; a targeted `ValueError` is raised at fit-time when the rounded cell counts (`n_units`, `group_frac`, `partition_frac`) would leave any cell empty. This guarantees the 2x2x2 DDD surface is populated for any valid input — independent marginal sampling (the cross-sectional `generate_ddd_data` convention) could collapse cells when marginals are small (e.g., `n_units=4, group_frac=partition_frac=0.25`). Validates `1 <= treatment_period < n_periods`, `group_frac` and `partition_frac` strictly in `(0, 1)`, and `n_units >= 4`. Deterministic recovery (`noise_sd=0`) matches `treatment_effect` to ~1e-15 (covered by `tests/test_prep.py::TestGenerateDddPanelData`, 16 tests including infeasible-config rejection and smallest-feasible-config round-trip through `TripleDifference.fit`). `power.simulate_power` is NOT yet auto-routed to the panel DGP for `TripleDifference` (the existing `_ddd_dgp_kwargs` registry entry still ignores `n_periods` and the existing `_check_ddd_dgp_compat` warning still fires on non-default kwargs) — that wiring is tracked as a follow-up in TODO.md.
 - **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_<ring_label>"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (157 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley plumbing (verifies `solve_ols` is called with `vcov_type="conley"` + Conley kwargs, no silent HC1 fallback), Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction.
 - **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422.
 
diff --git a/diff_diff/prep_dgp.py b/diff_diff/prep_dgp.py
@@ -1188,10 +1188,16 @@ def generate_ddd_panel_data(
         Period (0-indexed) at which ``post`` switches from 0 to 1.
         Must satisfy ``1 <= treatment_period < n_periods``.
     group_frac : float, default=0.5
-        Fraction of units with ``group=1``. Must be in ``(0, 1)``.
+        Fraction of units with ``group=1``. Must be in ``(0, 1)``. The
+        partition split is then drawn stratified-by-group at the requested
+        ``partition_frac`` so every (group, partition) cell receives at
+        least one unit; a ``ValueError`` is raised when the rounded cell
+        counts would leave any cell empty.
     partition_frac : float, default=0.5
-        Fraction of units with ``partition=1`` (assigned independently of
-        ``group``). Must be in ``(0, 1)``.
+        Fraction of units with ``partition=1`` within each ``group``
+        stratum. Must be in ``(0, 1)``. The stratified allocation is what
+        makes TripleDifference.fit's 2x2x2 surface populated for any valid
+        ``(n_units, group_frac, partition_frac)``.
     treatment_effect : float, default=2.0
         True ATT for the triple-interaction cell (group=1, partition=1,
         post=1).
@@ -1270,16 +1276,45 @@ def generate_ddd_panel_data(
             f"got {n_units}."
         )
 
+    # Stratified allocation that guarantees every (group, partition) cell receives
+    # at least one unit. Independent marginal sampling could collapse a cell when
+    # rounded marginals are small (e.g., n_units=4, group_frac=partition_frac=0.25
+    # yields n_group_1=1 with no room for both partition values within group=1).
+    # TripleDifference.fit requires all 8 G×P×T cells, so we validate first.
+    n_group_1 = int(round(n_units * group_frac))
+    n_group_0 = n_units - n_group_1
+    n_p1_g0 = int(round(n_group_0 * partition_frac))
+    n_p1_g1 = int(round(n_group_1 * partition_frac))
+    n_p0_g0 = n_group_0 - n_p1_g0
+    n_p0_g1 = n_group_1 - n_p1_g1
+    cell_counts = {
+        (0, 0): n_p0_g0,
+        (0, 1): n_p1_g0,
+        (1, 0): n_p0_g1,
+        (1, 1): n_p1_g1,
+    }
+    if min(cell_counts.values()) < 1:
+        raise ValueError(
+            f"generate_ddd_panel_data requires every (group, partition) cell to "
+            f"contain at least one unit so TripleDifference.fit's 2x2x2 surface is "
+            f"populated. With n_units={n_units}, group_frac={group_frac}, "
+            f"partition_frac={partition_frac}, the rounded cell counts are "
+            f"(group, partition): (0,0)={n_p0_g0}, (0,1)={n_p1_g0}, "
+            f"(1,0)={n_p0_g1}, (1,1)={n_p1_g1}. Increase n_units or move "
+            f"group_frac/partition_frac closer to 0.5 so each cell receives "
+            f">=1 unit."
+        )
+
     rng = np.random.default_rng(seed)
 
-    n_group_1 = max(1, min(n_units - 1, int(round(n_units * group_frac))))
-    n_partition_1 = max(1, min(n_units - 1, int(round(n_units * partition_frac))))
     group = np.zeros(n_units, dtype=int)
     group_idx = rng.choice(n_units, size=n_group_1, replace=False)
     group[group_idx] = 1
     partition = np.zeros(n_units, dtype=int)
-    partition_idx = rng.choice(n_units, size=n_partition_1, replace=False)
-    partition[partition_idx] = 1
+    g0_units = np.where(group == 0)[0]
+    g1_units = np.where(group == 1)[0]
+    partition[rng.choice(g0_units, size=n_p1_g0, replace=False)] = 1
+    partition[rng.choice(g1_units, size=n_p1_g1, replace=False)] = 1
 
     unit_fe = rng.normal(0.0, unit_fe_sd, size=n_units)
 
diff --git a/tests/test_prep.py b/tests/test_prep.py
@@ -1234,6 +1234,59 @@ def test_n_units_validation(self):
         with pytest.raises(ValueError, match="n_units"):
             generate_ddd_panel_data(n_units=3, seed=42)
 
+    def test_infeasible_cell_counts_raise(self):
+        """Configs that round to empty (group, partition) cells fail fast."""
+        from diff_diff.prep import generate_ddd_panel_data
+
+        # n_units=4, group_frac=0.25 → n_group_1=1; partition_frac=0.25 inside
+        # the 1-unit group=1 stratum rounds to 0, leaving the (1, 1) cell empty.
+        with pytest.raises(ValueError, match=r"every \(group, partition\) cell"):
+            generate_ddd_panel_data(
+                n_units=4,
+                group_frac=0.25,
+                partition_frac=0.25,
+                seed=42,
+            )
+        # n_units=10, group_frac=0.1 → n_group_1=1 again; same failure mode for
+        # the (1, 1) cell.
+        with pytest.raises(ValueError, match=r"every \(group, partition\) cell"):
+            generate_ddd_panel_data(
+                n_units=10,
+                group_frac=0.1,
+                partition_frac=0.5,
+                seed=42,
+            )
+
+    def test_smallest_feasible_config_populates_all_cells(self):
+        """n_units=4 with fracs=0.5 yields one unit per (group, partition) cell."""
+        from diff_diff import TripleDifference
+        from diff_diff.prep import generate_ddd_panel_data
+
+        data = generate_ddd_panel_data(
+            n_units=4,
+            n_periods=4,
+            treatment_period=2,
+            group_frac=0.5,
+            partition_frac=0.5,
+            noise_sd=0.0,
+            seed=42,
+        )
+        # All four (group, partition) cells must receive exactly one unit.
+        unit_cells = data.groupby("unit")[["group", "partition"]].first()
+        cell_counts = unit_cells.groupby(["group", "partition"]).size()
+        assert len(cell_counts) == 4
+        assert (cell_counts == 1).all()
+        # TripleDifference.fit (which requires all 8 G×P×T cells) must succeed
+        # on the smallest feasible config — validates the public contract
+        # advertised in the docstring's compatibility paragraph.
+        TripleDifference().fit(
+            data,
+            outcome="outcome",
+            group="group",
+            partition="partition",
+            time="post",
+        )
+
     def test_reproducibility(self):
         """Same seed produces identical DataFrames."""
         from diff_diff.prep import generate_ddd_panel_data