Skip to content

Commit 71f84d0

Browse files
authored
Merge pull request #446 from igerber/spillover-conley-butts-wave-b
SpilloverDiD: ring-indicator spillover-aware DiD (Butts 2021)
2 parents 7b55db5 + 2b86d8c commit 71f84d0

18 files changed

Lines changed: 5632 additions & 0 deletions

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [Unreleased]
99

1010
### Added
11+
- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_<ring_label>"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (157 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley plumbing (verifies `solve_ols` is called with `vcov_type="conley"` + Conley kwargs, no silent HC1 fallback), Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction.
1112
- **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422.
1213

1314
### Changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`.
106106
- [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html) - Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies
107107
- [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html) - Borusyak, Jaravel & Spiess (2024) imputation estimator, most efficient under homogeneous effects
108108
- [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html) - Gardner (2022) two-stage estimator with GMM sandwich variance
109+
- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing
109110
- [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - Synthetic DiD combining standard DiD and synthetic control for few treated units
110111
- [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html) - triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
111112
- [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html) - Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves

TODO.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,14 @@ Deferred items from PR reviews that were not addressed before merge.
132132
| SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low |
133133
| Conley + survey weights / `survey_design`. Score-reweighted meat `s_i = w_i · X_i · ε_i` is mechanical, but PSU clustering interaction with the spatial kernel and replicate-weights variance under spatial correlation are non-trivial (Bertanha-Imbens 2014 covers cluster-sample but not the explicit Conley case). Phase 5 of the spillover-conley initiative; paper review prerequisite. Currently raises `NotImplementedError` at the linalg validator. | `linalg.py::_validate_vcov_args` | Phase 5 (spillover-conley) | Medium |
134134
| `SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). | `synthetic_did.py::SyntheticDiD` | follow-up (spillover-conley) | Low |
135+
| `SpilloverDiD` Gardner GMM first-stage uncertainty correction at stage 2. Wave B MVP uses standard `solve_ols` variance (HC1 / Conley / cluster) without the influence-function adjustment for stage-1 FE estimation. Extending `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the IF outer-product step gives the full Butts (2021) Section 3.1 + Gardner (2022) Section 4 composition. See plan Risks #2 for the IF formula. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_gmm_variance` | follow-up (Wave B) | Medium |
136+
| `SpilloverDiD(event_study=True)` per-event-time × ring decomposition (Butts Section 5 / Table 2 `S^k_{it}` / `Ring^k_{it,j}`). Currently raises `NotImplementedError`. The implementation adds event-time dummies × ring covariates to the stage-2 design and emits a MultiIndex on `spillover_effects`. | `spillover.py::SpilloverDiD.fit` | follow-up (Wave B) | Medium |
137+
| `SpilloverDiD(survey_design=...)` integration. Currently raises `NotImplementedError`. Requires threading survey weights through the inline stage 1 + stage 2 and lifting `two_stage.py`'s survey path patterns. | `spillover.py::SpilloverDiD.fit` | follow-up (Wave B) | Low |
138+
| `SpilloverDiD(ring_method="count")` extension. Currently only the nearest-treated-ring specification is exposed. Count-of-treated-in-ring (paper Section 3.2 end) is methodologically supported by Butts but re-introduces functional-form dependence; expose with an explicit kwarg gate and documentation warning. | `spillover.py::SpilloverDiD.fit` | follow-up | Low |
139+
| `SpilloverDiD` data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight cross-validation). | `spillover.py::SpilloverDiD` | follow-up | Low |
140+
| `SpilloverDiD` T22 TVA tutorial (`docs/tutorials/22_spillover_did.ipynb`): synthetic TVA-style DGP reproducing Butts (2021) Section 4 Table 1 Panel A bias-correction direction (~40% understatement). Split from the methodology PR per user-confirmed scope split (2026-05-15). | `docs/tutorials/`, `tests/test_t22_*_drift.py` | follow-up (Wave B) | Medium |
141+
| Extend `TwoStageDiD` with Conley vcov as a first-class feature (mirrors Wave A's TWFE/MPD/DiD extension). Currently `TwoStageDiD.__init__` lacks `vcov_type` / `conley_*` kwargs; `SpilloverDiD` works around this by threading Conley directly via `solve_ols` at stage 2. Promoting Conley to TwoStageDiD's API removes the workaround and lets non-spillover users access Conley + Gardner two-stage. | `diff_diff/two_stage.py` | follow-up | Medium |
142+
| `SpilloverDiD` sparse cKDTree path for the staggered nearest-treated-distance helper (mirrors the static helper's sparse branch). Currently `_compute_nearest_treated_distance_staggered` always builds dense `(n_units, n_treated_by_onset)` pairwise distance matrices per cohort; on large staggered panels with many cohorts this is avoidable memory/runtime. Add a sparse k-d-tree branch analogous to `_compute_nearest_treated_distance_sparse`, gated on `n > _CONLEY_SPARSE_N_THRESHOLD`. | `spillover.py::_compute_nearest_treated_distance_staggered` | follow-up (Wave B) | Low |
135143

136144
#### Performance
137145

diff_diff/__init__.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,10 @@
171171
TwoStageDiDResults,
172172
two_stage_did,
173173
)
174+
from diff_diff.spillover import (
175+
SpilloverDiD,
176+
)
177+
from diff_diff.results import SpilloverDiDResults # re-export
174178
from diff_diff.stacked_did import (
175179
StackedDiD,
176180
StackedDiDResults,
@@ -300,6 +304,7 @@
300304
"SunAbraham",
301305
"ImputationDiD",
302306
"TwoStageDiD",
307+
"SpilloverDiD",
303308
"TripleDifference",
304309
"TROP",
305310
"StackedDiD",
@@ -341,6 +346,7 @@
341346
"TwoStageDiDResults",
342347
"TwoStageBootstrapResults",
343348
"two_stage_did",
349+
"SpilloverDiDResults",
344350
"TripleDifferenceResults",
345351
"triple_difference",
346352
"StaggeredTripleDifference",

diff_diff/guides/llms-full.txt

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -461,6 +461,69 @@ results = est.fit(data, outcome='outcome', unit='unit',
461461
results.print_summary()
462462
```
463463

464+
### SpilloverDiD
465+
466+
Butts (2021) ring-indicator spillover-aware DiD. Augments two-stage Gardner with ring-indicator covariates that identify direct effect on treated (`tau_total`) and per-ring spillover effects on near-control units (`delta_j`). Handles non-staggered and staggered timing in one estimator. Recommends `vcov_type="conley"` with cutoff = `d_bar` (paper Section 3.1).
467+
468+
```python
469+
SpilloverDiD(
470+
rings: list[float], # K+1 sorted breakpoints; K rings
471+
d_bar: float | None = None, # Far-away cutoff (defaults to max(rings))
472+
vcov_type: str = "hc1", # "hc1", "conley", or default cluster
473+
conley_coords: tuple[str, str] | None = None, # (lat_col, lon_col), required
474+
conley_metric: str = "haversine", # or "euclidean" / callable
475+
conley_cutoff_km: float | None = None,
476+
conley_lag_cutoff: int | None = None,
477+
cluster: str | None = None,
478+
alpha: float = 0.05,
479+
anticipation: int = 0,
480+
event_study: bool = False, # Deferred: raises NotImplementedError if True
481+
horizon_max: int | None = None, # Deferred (event-study mode)
482+
rank_deficient_action: str = "warn",
483+
)
484+
```
485+
486+
**fit() parameters:**
487+
488+
```python
489+
sp.fit(
490+
data: pd.DataFrame,
491+
outcome: str,
492+
unit: str,
493+
time: str,
494+
treatment: str | None = None, # binary D_it; auto-converted to first_treat
495+
first_treat: str | None = None, # OR onset time per unit (Gardner)
496+
covariates: list[str] | None = None, # Deferred: NotImplementedError if non-None
497+
survey_design: object = None, # Deferred: NotImplementedError if non-None
498+
) -> SpilloverDiDResults
499+
```
500+
501+
**Restrictions (Wave B MVP — planned follow-ups):**
502+
503+
- `covariates=` raises `NotImplementedError`. Gardner two-stage requires covariate effects estimated on the untreated-and-unexposed Omega_0 subsample at stage 1; appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates. Planned follow-up.
504+
- `survey_design=` raises `NotImplementedError` (planned: SurveyDesign integration)
505+
- `event_study=True` raises `NotImplementedError` (planned: per-event-time × ring decomposition per Butts Table 2)
506+
- `horizon_max=` raises `NotImplementedError` (used only with event_study)
507+
- Stage-2 variance is `solve_ols` HC1 / Conley / cluster — Gardner GMM first-stage uncertainty correction NOT applied (planned follow-up; SE is biased downward / too small, CIs too narrow, p-values too small — treat reported significance conservatively until the GMM correction lands)
508+
- Only nearest-treated rings supported; `ring_method="count"` (count of treated neighbors in ring) not yet exposed
509+
510+
**Usage:**
511+
512+
```python
513+
from diff_diff import SpilloverDiD
514+
515+
est = SpilloverDiD(
516+
rings=[0, 50, 100, 200],
517+
conley_coords=("lat", "lon"),
518+
vcov_type="conley",
519+
conley_cutoff_km=200.0,
520+
conley_lag_cutoff=0,
521+
)
522+
results = est.fit(data, outcome="y", unit="unit", time="time", treatment="D")
523+
print(f"tau_total = {results.att:.4f}")
524+
print(results.spillover_effects) # per-ring DataFrame
525+
```
526+
464527
### SyntheticDiD
465528

466529
Synthetic Difference-in-Differences (Arkhangelsky et al. 2021). Combines DiD with synthetic control by re-weighting control units.

0 commit comments

Comments
 (0)