Skip to content

Commit 9ccb59f

Browse files
igerberclaude
andcommitted
ImputationDiD methodology validation (PR-B): exact FE variance + unit-clustered Eq.8 + R parity
Source-validation pass of the Borusyak, Jaravel & Spiess (2024, REStud 91(6)) audit (PR-A #529 added the paper review). Three code corrections in diff_diff/imputation.py (behavior = SE values change; point estimates unchanged): 1. Untreated v_it weights (Theorem 3 conservative variance). The covariate-free path used the BALANCED two-way closed form -(w_i/n0_i + w_t/n0_t - w/N0), wrong for the always-unbalanced Omega_0 in staggered designs -> analytical SEs ~27% too small. Replaced with the exact projection -A0 (A0'A0)^-1 A1' w (the covariate path's method), and fixed that design to keep all unit dummies (the prior drop-first-unit/no-intercept design was one rank short -> a further ~1.6% bias). SEs now match R didimputation::did_imputation (observed ~1e-10; tests assert abs=1e-7). A singular Omega_0 routes to a dense-lstsq fallback (SciPy spsolve returns NaN + MatrixRankWarning without raising; promoted to an error so the fallback fires under production filters). Bootstrap SEs (which resample the same Theorem-3 influence function) may also shift. 2. Auxiliary model (Equation 8): observation-level mean sum(v*tau)/sum(v) -> the paper's unit-clustered sum_i(sum_t v)(sum_t v*tau)/sum_i(sum_t v)^2, NaN-safe. 3. Untreated Step-1 residuals preserve NaN for missing FE (symmetric with the treated path) instead of a silent fillna(0.0). Validation: - tests/test_methodology_imputation.py: paper-equation Verified Components (Theorem 1/2; Theorem 3/eqs 6-8 + white-box unit-clustered Eq.8 hand-calc + NaN-co-group edge + singular-Omega0 dense-fallback regression; Proposition 5 K>=H_bar non-ID; Test 1/eq 9 + Proposition 9) and TestImputationDiDParityR (overall + per-horizon ATT and SE vs didimputation, no silent skips). - benchmarks/R/generate_didimputation_golden.R + benchmarks/data/didimputation_* (didimputation v0.5.0 goldens). - tests/test_imputation.py: tightened the coarser-partition conservatism test. - Full fast suite: 7585 passed (the SE change breaks nothing downstream). Docs/tracker: REGISTRY ## ImputationDiD (Eq.8 now exact unit-clustered + a Deviation-from-R note; v_it observation-weights bullet updated to the exact projection); paper review flipped to "implemented"; METHODOLOGY_REVIEW.md row -> Complete (Verified Components / Corrections Made / Deviations / R Comparison Results); CHANGELOG entry; TODO PR-B rows removed + follow-ups tracked (LOO refinement, projection-factorization caching, covariate-path R parity). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent e84f61a commit 9ccb59f

11 files changed

Lines changed: 2414 additions & 140 deletions

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1717
(`"bootstrap"`), or leave-one-out estimates (`"jackknife"`) — so the old name
1818
was misleading; the `variance_method` field disambiguates the contents. Read
1919
`result.variance_effects` going forward.
20+
- **`ImputationDiD` methodology-review-tracker promotion: In Progress → Complete.** Completes the source-validation pass (PR-B) of the Borusyak, Jaravel & Spiess (2024, REStud 91(6)) audit — PR-A (#529) added the paper review on file; this PR validates the source against the code (uncovering and fixing the SE corrections above), adds paper-equation-numbered Verified Components (`tests/test_methodology_imputation.py`: Theorem 1/2 imputation; Theorem 3 / eqs. 6-8 variance + the unit-clustered Equation 8 hand-calc; Proposition 5 `K >= H_bar` non-identification; Test 1 / eq. 9 + Proposition 9 pre-trend independence), an R `didimputation` v0.5.0 parity fixture + generator (`benchmarks/R/generate_didimputation_golden.R`), and flips the tracker row to **Complete** with a Verified Components / Corrections Made / Deviations / R Comparison Results detail block. Documented deviations: the unit-clustered Equation 8 matches R only at the cohort×event-time partition (diff-diff additionally offers coarser partitions, no R analogue); the multiplier bootstrap and survey-design TSL variance are library extensions; leave-one-out variance (Supplementary Appendix A.9) is not implemented. REGISTRY `## ImputationDiD` updated (Equation 8 now the exact unit-clustered form; `**Note (deviation from R):**`); `TODO.md` PR-B rows removed.
2021

2122
### Deprecated
2223
- **`SyntheticDiDResults.placebo_effects`** is now a read-only alias for
@@ -27,6 +28,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2728
`dataclasses.asdict(result)` now emits the `variance_effects` key — use
2829
`variance_effects`.
2930

31+
### Fixed
32+
- **`ImputationDiD` conservative standard errors were biased downward without covariates (~27% on a staggered panel) — corrected to the exact two-way-FE imputation projection (behavior change).** The Theorem-3 conservative variance (Borusyak, Jaravel & Spiess 2024, eqs. 6-7) needs the implied observation weights `v_it = -A_0 (A_0' A_0)^{-1} A_1' w` of the untreated observations. The covariate-free path used a closed form `-(w_i/n0_i + w_t/n0_t - w/N_0)` that is exact only for a **balanced** untreated set, but `Omega_0` is generically **unbalanced** in staggered designs (treated observations are removed) — so every analytical SE / t-stat / p-value / CI **without covariates** was biased (the **point estimates were always correct** — they are computed by imputation, not via `v_it`). Replaced with the exact sparse two-way-FE projection (the path the covariate case already used), and corrected that projection's design matrix to keep **all** unit dummies (the previous drop-first-unit-with-no-intercept design projected onto a space one rank short of the true two-way-FE span — a further ~1.6% bias). The standard errors now match R `didimputation::did_imputation` to ~1e-10 (`tests/test_methodology_imputation.py::TestImputationDiDParityR`, goldens `benchmarks/data/didimputation_golden.json`). **Potentially breaking:** covariate-free `ImputationDiD` analytical SEs (and dependent t/p/CI) change — they were previously too small. The multiplier bootstrap (`n_bootstrap>0`) resamples the same Theorem-3 influence function (`v_it * epsilon_tilde_it`), so **bootstrap SEs may also shift**. A genuinely rank-deficient `Omega_0` (e.g. an unidentified period FE) now routes through a least-squares fallback with a `UserWarning` instead of a silent (balanced-assumption) value.
33+
- **`ImputationDiD` auxiliary variance model (Equation 8) now uses the paper's unit-clustered aggregator.** `_compute_auxiliary_residuals_treated` computed the observation-level mean `tau_tilde_g = sum(v*tau_hat)/sum(v)`; corrected to BJS Equation 8's unit-clustered `sum_i (sum_t v)(sum_t v*tau_hat) / sum_i (sum_t v)^2` (the excess-variance-minimizing form, Supplementary Appendix A.8). The two coincide under uniform within-group weights (the unweighted overall ATT / single-horizon event study under the default `aux_partition="cohort_horizon"`) and differ for survey-weighted / heterogeneity estimands or the coarser `aux_partition="cohort"`. Zero-weight / unimputable (NaN `tau_hat`) rows are excluded from the aggregation (exact for finite `tau_hat`, NaN-safe). At the default partition this equals R `didimputation`'s `sum(v^2*tau)/sum(v^2)`.
34+
- **`ImputationDiD` untreated Step-1 residuals preserve NaN for missing fixed effects** (symmetric with the treated path) instead of a silent `fillna(0.0)`. Provably inert on valid data — every untreated observation's unit and period appear in the Step-1 FE estimates — but it stops a rank-condition logic error from masquerading as a 0 residual (any NaN is zeroed downstream in the variance product, as before).
35+
3036
## [3.5.1] - 2026-06-02
3137

3238
### Added

METHODOLOGY_REVIEW.md

Lines changed: 36 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ The catalog grew incrementally over several quarters, so formats vary across the
4747
| CallawaySantAnna | `staggered.py` | `did::att_gt()` | **Complete** | 2026-01-24 |
4848
| SunAbraham | `sun_abraham.py` | `fixest::sunab()` | **Complete** | 2026-02-15 |
4949
| StackedDiD | `stacked_did.py` | `stacked-did-weights` (Wing-Freedman-Hollingsworth code) | **Complete** | 2026-02-19 |
50-
| ImputationDiD | `imputation.py` | `didimputation` | **In Progress** | |
50+
| ImputationDiD | `imputation.py` | `didimputation` | **Complete** | 2026-06-06 |
5151
| TwoStageDiD | `two_stage.py` | `did2s` | **In Progress** ||
5252
| WooldridgeDiD (ETWFE) | `wooldridge.py` | `etwfe` (R) / `jwdid` (Stata) | **Complete** | 2026-05-22 |
5353
| EfficientDiD | `efficient_did.py` | (no canonical R package) | **Complete** | 2026-06-01 |
@@ -539,22 +539,40 @@ and covariate-adjusted specifications.)
539539
| Field | Value |
540540
|-------|-------|
541541
| Module | `imputation.py`, `imputation_bootstrap.py` |
542-
| Primary Reference | Borusyak, Jaravel & Spiess (2024), *Revisiting Event-Study Designs: Robust and Efficient Estimation*, REStud 91(6) |
543-
| R Reference | `didimputation` |
544-
| Status | **In Progress** |
545-
| Last Review ||
546-
547-
**Documentation in place:**
548-
- REGISTRY.md section: `## ImputationDiD` (paper-direct equations, edge cases, three-step algorithm)
549-
- Implementation: 87 unit tests in `tests/test_imputation.py` (basic fit, event study, group aggregation, conservative variance, auxiliary partition, unidentified-estimand handling, balanced/unbalanced panels)
550-
- Bootstrap path: `imputation_bootstrap.py` with multiplier-weight resampling
551-
- Survey support: pweight + strata/PSU/FPC via TSL (Phase 6) with PSU-bootstrap path
552-
553-
**Outstanding for promotion:**
554-
- Dedicated `tests/test_methodology_imputation.py` with paper-equation-numbered Verified Components walk-through
555-
- R parity benchmark against `didimputation` (none on file)
556-
- Formal enumeration of deviations from `didimputation` (NaN inference, refused-to-estimate behavior for unidentified estimands per Proposition 5)
557-
- "Corrections Made" listing for any implementation fixes uncovered during the walk-through
542+
| Primary Reference | Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting Event-Study Designs: Robust and Efficient Estimation. *Review of Economic Studies*, 91(6), 3253–3285. |
543+
| R Reference | `didimputation` (Kyle Butts, v0.5.0) |
544+
| Status | **Complete** |
545+
| Last Review | 2026-06-06 |
546+
547+
**Verified Components** (`tests/test_methodology_imputation.py`, paper-equation-numbered):
548+
- **Theorem 1 / 2 (eqs. 4-5):** 3-step imputation (Step 1 fit on Ω₀ only; impute Ŷ(0); weighted aggregate) recovers constant and heterogeneous-by-horizon ATTs; perturbing a treated outcome shifts the overall ATT by exactly δ/N₁ (proving treated obs never feed Step 1) — `TestB2024Theorem2Imputation`
549+
- **Theorem 3 / eqs. 6-8 (conservative clustered variance + unit-clustered Equation 8):** finite/positive SEs; the unit-clustered `τ̃_g = Σ_i a·b / Σ_i a²` aggregator hand-verified; NaN-`τ̂` co-group observation is a variance no-op — `TestB2024Theorem3Variance`, `TestB2024Eq8AuxiliaryAggregator`
550+
- **Proposition 5 (p. 3266):** no never-treated units ⇒ horizons `K ≥ H̄ = max(E_i)−min(E_i)` are NaN + warning; never-treated present ⇒ identified — `TestB2024Proposition5NoNeverTreated`
551+
- **Test 1 / eq. 9 + Proposition 9 (p. 3273-4):** robust pre-trend test on Ω₀ only (does not reject under parallel trends, rejects under a violation); the treatment-effect estimate is independent of the pre-trend request — `TestB2024Proposition9Test1`
552+
- **R parity vs `didimputation::did_imputation`:** overall + event-study ATT and SEs match on a fixed-seed staggered panel (observed |Δ| ~1e-7 ATT / ~1e-10 SE; the tests assert ATT `abs=1e-6`, SE `abs=1e-7` for cross-platform robustness) — `TestImputationDiDParityR` (goldens: `benchmarks/data/didimputation_golden.json`, generator: `benchmarks/R/generate_didimputation_golden.R`)
553+
554+
**Corrections Made** (PR-B, surfaced by the walk-through + R parity):
555+
- **Theorem 3 untreated `v_it` weights — exact projection.** The FE-only path used the *balanced* two-way closed form `-(w_i/n0_i + w_t/n0_t − w/N₀)`, which is wrong for the (always-unbalanced) Ω₀ in staggered designs — biasing **every covariate-free analytical SE downward (~27% on the parity panel)**. Replaced with the exact two-way-FE projection `-A₀(A₀'A₀)⁻¹A₁'w` (the same path the covariate case uses), and fixed the FE design to keep all unit dummies (the prior drop-first-unit-without-intercept design projected onto a space one rank short, a further ~1.6% bias). SEs now match `didimputation` (observed ~1e-10; test contract `abs=1e-7`).
556+
- **Auxiliary model (Equation 8) — unit-clustered form.** `_compute_auxiliary_residuals_treated` used the observation-level mean `Σ v·τ̂ / Σ v`; corrected to the paper's unit-clustered `Σ_i a·b / Σ_i a²`. Coincides with the old form under uniform within-group weights; differs for survey/heterogeneity estimands and the coarser `cohort` partition. NaN-safe masking of zero-weight rows.
557+
- **Untreated-residual hardening.** `_compute_residuals_untreated` now preserves NaN for missing FE (symmetric with the treated path) instead of a silent `fillna(0.0)` that could mask a rank-condition logic error (provably inert on valid data).
558+
559+
**Deviations from the reference / library extensions** (see `REGISTRY.md` `## ImputationDiD`):
560+
- **Deviation from R:** `didimputation` computes the Equation 8 aggregator (`Σ v²τ̂/Σ v²`) at the cohort×event-time partition only (no partition control); at that partition it equals the unit-clustered Equation 8 = diff-diff's default `aux_partition="cohort_horizon"`. diff-diff additionally offers `aux_partition="cohort"`/`"horizon"` (validated by hand-calc, no R analogue).
561+
- Multiplier bootstrap on the Theorem-3 influence function (library extension, not in the paper).
562+
- Survey-design TSL variance on the influence function (library extension).
563+
- NaN inference for undefined statistics; Proposition-5 refuse-to-estimate (NaN + warning).
564+
- Leave-one-out variance refinement (Supplementary Appendix A.9) not implemented (finite-sample refinement; tracked as future work).
565+
566+
**R Comparison Results** (`didimputation` v0.5.0, fixed-seed panel, `benchmarks/data/didimputation_golden.json`):
567+
568+
| Quantity | Python | R | |Δ| |
569+
|----------|--------|---|-----|
570+
| Overall ATT | 2.04566790 | 2.04566803 | 1.3e-7 |
571+
| Overall SE | 0.02087938 | 0.02087938 | 9.3e-11 |
572+
| Event-study ATT (max) ||| 1.6e-7 |
573+
| Event-study SE (max) ||| 1.7e-10 |
574+
575+
(Point-estimate Δ is iterative-FE convergence level; SE Δ is machine precision. These are reference-platform observations; the parity tests assert ATT `abs=1e-6`, SE `abs=1e-7` for cross-platform robustness.)
558576

559577
---
560578

@@ -1429,7 +1447,7 @@ Promotion priority for the **In Progress** entries, ordered by what's blocked on
14291447
**Substantive-review-blocked (still missing a methodology test file / R parity and a paper review):**
14301448

14311449
1. **PlaceboTests** — decide first whether to keep standalone or absorb into per-estimator diagnostic sections; methodologically lightweight either way.
1432-
2. **ImputationDiD / TwoStageDiD**natural pair (both single-treatment-effect-imputation methods). Each needs paper review, methodology file, R parity fixture against `didimputation` / `did2s`.
1450+
2. **TwoStageDiD**the remaining half of the imputation pair (ImputationDiD is now Complete, validated against `didimputation`). Needs a Gardner (2022) paper review, `tests/test_methodology_two_stage.py`, and an R parity fixture against `did2s`.
14331451

14341452
**Consolidation-pass-blocked (already has paper review or methodology file or R parity; mostly Verified Components walk-through):**
14351453

0 commit comments

Comments
 (0)