Skip to content

Commit aa29cec

Browse files
igerberclaude
andcommitted
docs: Gardner (2022) paper review — TwoStageDiD (PR-A)
Methodology-review PR-A for TwoStageDiD (Gardner 2022, arXiv:2207.05943; R `did2s`), the imputation-pair twin of the just-completed ImputationDiD. - New `docs/methodology/papers/gardner-2022-review.md`: eq./section-numbered scholarly review of the primary source. - REGISTRY `## TwoStageDiD` + `METHODOLOGY_REVIEW.md` tracker: corrected the variance misattributions the source read surfaced — (i) the "Equation 6 per-cluster inverse (D_c'D_c)^{-1} deviation" was fabricated (eq. 6 is the event-study spec; the variance is the unnumbered GLOBAL Newey-McFadden Thm 6.1 Jacobian-inverse sandwich, which the code already matches — not a deviation); (ii) "(Gardner 2022, Theorem 1)" (the paper has no numbered theorems); relabeled the cluster-summed meat (was "Bread"). - Corrected the `did2s` bootstrap-default claim in 3 places (paper review, REGISTRY, two_stage_results.py docstring): did2s defaults to analytical corrected clustered SEs (`bootstrap = FALSE`); block bootstrap is optional, not the default (verified vs the did2s source). - doc-deps.yaml: mapped the review under two_stage.py. TODO.md: tracked the PR-B deliverables (tests + did2s parity -> tracker flip). doc-deps integrity green; references + catalog verified. Local + CI agentic AI review clean after addressing the bootstrap-attribution and tracker- consistency findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 30c2cb9 commit aa29cec

6 files changed

Lines changed: 173 additions & 14 deletions

File tree

METHODOLOGY_REVIEW.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -588,14 +588,14 @@ and covariate-adjusted specifications.)
588588

589589
**Documentation in place:**
590590
- REGISTRY.md section: `## TwoStageDiD` (Stage 1 unit+time FE on untreated, Stage 2 OLS on residualized outcomes, GMM sandwich variance per Newey-McFadden Theorem 6.1)
591+
- Paper review: `docs/methodology/papers/gardner-2022-review.md` (PR-A — eq./section-numbered review of arXiv:2207.05943; corrected a fabricated Eq. 6 variance deviation, see "Documented alignment" below)
591592
- Implementation: 76 unit tests in `tests/test_two_stage.py` (matches ImputationDiD point estimates, R `did2s` global `(D'D)^{-1}` variance, always-treated unit exclusion, multiplier bootstrap)
592-
- Documented R alignment: uses global `(D'D)^{-1}` matching `did2s` (not paper Eq. 6)
593+
- Documented alignment: variance = global `(D'D)^{-1}` GMM sandwich (Newey-McFadden Theorem 6.1, Gardner §3.3) — **faithful to both the paper and `did2s`**. Gardner eq. (6) is the *event-study regression spec*, not a variance formula; the earlier "matches `did2s`, not paper Eq. 6" / "Newey-McFadden sandwich vs paper's Eq. 6 deviation" framing was a misattribution, corrected in PR-A across `REGISTRY.md` + the paper review.
593594

594595
**Outstanding for promotion:**
595596
- Dedicated `tests/test_methodology_two_stage.py` with paper-equation-numbered Verified Components walk-through
596597
- R parity benchmark fixture against `did2s` (none on file)
597-
- Documented deviation: Newey-McFadden Theorem 6.1 sandwich vs paper's Eq. 6 (already noted in REGISTRY but not formalized in this tracker)
598-
- "Corrections Made" listing
598+
- "Corrections Made" listing + flip Status → Complete (PR-B)
599599

600600
---
601601

@@ -1444,10 +1444,10 @@ more graceful handling of edge cases while still signaling invalid inference to
14441444

14451445
Promotion priority for the **In Progress** entries, ordered by what's blocked on substantive review work (top of list = needs review next) vs. consolidation pass (bottom of list = mostly tracker walk-through):
14461446

1447-
**Substantive-review-blocked (still missing a methodology test file / R parity and a paper review):**
1447+
**Substantive-review-blocked (each still missing one or more of: a methodology test file, R parity, or a paper review):**
14481448

14491449
1. **PlaceboTests** — decide first whether to keep standalone or absorb into per-estimator diagnostic sections; methodologically lightweight either way.
1450-
2. **TwoStageDiD** — the remaining half of the imputation pair (ImputationDiD is now Complete, validated against `didimputation`). Needs a Gardner (2022) paper review, `tests/test_methodology_two_stage.py`, and an R parity fixture against `did2s`.
1450+
2. **TwoStageDiD** — the remaining half of the imputation pair (ImputationDiD is now Complete, validated against `didimputation`). Gardner (2022) paper review **landed** (`docs/methodology/papers/gardner-2022-review.md`, PR-A); still needs `tests/test_methodology_two_stage.py` and an R parity fixture against `did2s` to flip to Complete (PR-B).
14511451

14521452
**Consolidation-pass-blocked (already has paper review or methodology file or R parity; mostly Verified Components walk-through):**
14531453

TODO.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,7 @@ Deferred items from PR reviews that were not addressed before merge.
173173
|-------|----------|----|----------|
174174
| Drift test for tutorial 24 qualitative power claims (monotonic dilution fast→slow; CS-vs-2×2 MDE crossover/near-parity at slow rollout) — pins the prose against estimator-default/simulation drift | `docs/tutorials/24_staggered_vs_collapsed_power.ipynb` | staggered-analysis-2x2 | Low |
175175
| ImputationDiD covariate-path variance lacks dedicated R `didimputation` parity / hand-calc. The PR-B FE-design correction (keep all unit dummies) affects the covariate projection too, but only the no-covariate staggered panel is R-parity'd (the covariate path shares the same validated projection code and passes the full suite). Add a covariate (time-varying X) R golden asserting overall/event-study SE parity, or a small dense-design hand-calc for the covariate projection. | `tests/test_methodology_imputation.py`, `benchmarks/R/generate_didimputation_golden.R` | imputation-validation follow-up | Low |
176+
| TwoStageDiD methodology validation PR-B: add `tests/test_methodology_two_stage.py` (eq./section-numbered Verified Components — Stage-1 FE recovery on untreated obs; Stage-2 overall ATT eq. 4 + event-study eq. 6; GMM first-stage-correction behavior; always-treated drop) + `did2s` R parity fixture (`benchmarks/R/generate_did2s_golden.R` + `benchmarks/data/did2s_golden.json` + `did2s_test_panel.csv`); then flip `METHODOLOGY_REVIEW.md` TwoStageDiD row In Progress → Complete. PR-A (paper review `gardner-2022-review.md`) merged separately. | `tests/test_methodology_two_stage.py`, `benchmarks/`, `METHODOLOGY_REVIEW.md` | two-stage-validation PR-B | Medium |
176177
| Port the CI `<notebook-prose>` extraction into the reviewer-eval harness so `docs/tutorials/*.ipynb` cases (currently guarded out of `verify-corpus`/`run`) can be reviewed with CI-equivalent context | `tools/reviewer-eval/adapters/ci_prompt.py` | local-review | Low |
177178
| **Premise corrected — no CI impact (verified 2026-06-07).** The "slow CI" motivation does not hold: no CI workflow installs R (no `setup-r` / `r-lib/actions` / `fixest` / `r-base` install anywhere in `.github/workflows/`), so every R-parity test skips in CI behind a per-file availability gate (`fixest_available` in twfe, `_check_r_contdid()` in continuous_did, `require_r` / `r_available` in `conftest.py`, etc.) — consolidating `Rscript` spawns yields zero CI speedup. The originally-cited file already session-caches its R fits: `test_methodology_twfe.py` exposes `r_twfe_results` / `r_twfe_results_with_covariate` as `scope="session"` fixtures, so each R model runs once per session, not once per test. The only residual is a LOCAL-dev micro-optimization for developers who have R installed: `test_methodology_continuous_did.py` (the `_run_r_contdid` helper plus three standalone inline `Rscript` calls) and `test_methodology_callaway.py` (`_run_r_estimation` called inline in three test methods, plus `_get_r_mpdta_and_results` re-run by the MPDTA R-parity tests) re-spawn `library(...)` per call with no session-level result cache. Applying the twfe session-fixture pattern there would speed local R-parity runs only. Low value; retained as a local-dev note. | `tests/test_methodology_continuous_did.py`, `tests/test_methodology_callaway.py` | #139 | Low |
178179
| CS R helpers hard-code `xformla = ~ 1`; no covariate-adjusted R benchmark for IRLS path | `tests/test_methodology_callaway.py` | #202 | Low |

diff_diff/two_stage_results.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,10 @@ class TwoStageBootstrapResults:
2525
Results from TwoStageDiD bootstrap inference.
2626
2727
Bootstrap uses multiplier bootstrap on the GMM influence function,
28-
consistent with other library estimators. The R `did2s` package uses
29-
block bootstrap by default; multiplier bootstrap is asymptotically
30-
equivalent.
28+
consistent with other library estimators. The R `did2s` package defaults
29+
to analytical corrected clustered SEs (``bootstrap = FALSE``); its optional
30+
block bootstrap (``bootstrap = TRUE``) and this multiplier bootstrap are
31+
asymptotically equivalent.
3132
3233
Attributes
3334
----------

docs/doc-deps.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -252,6 +252,8 @@ sources:
252252
- path: docs/methodology/REGISTRY.md
253253
section: "TwoStageDiD"
254254
type: methodology
255+
- path: docs/methodology/papers/gardner-2022-review.md
256+
type: methodology
255257
- path: docs/api/two_stage.rst
256258
type: api_reference
257259
- path: docs/tutorials/12_two_stage_did.ipynb

docs/methodology/REGISTRY.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1388,28 +1388,28 @@ Point estimates are identical to ImputationDiD (Borusyak et al. 2024). The two-s
13881388
The variance accounts for first-stage estimation error propagating into Stage 2, following the GMM framework:
13891389

13901390
```
1391-
V(tau_hat) = (D'D)^{-1} * Bread * (D'D)^{-1}
1391+
V(tau_hat) = (D'D)^{-1} * Meat * (D'D)^{-1} [(D'D)^{-1} = GLOBAL GMM bread (Jacobian inverse)]
13921392

1393-
Bread = sum_c ( sum_{i in c} psi_i )( sum_{i in c} psi_i )'
1393+
Meat = sum_c ( sum_{i in c} psi_i )( sum_{i in c} psi_i )' [score outer-product, clustered at unit]
13941394
```
13951395

13961396
where `psi_i` is the stacked influence function for unit i across all its observations, combining the Stage 2 score and the Stage 1 correction term.
13971397

1398-
**Note on Equation 6 discrepancy:** The paper's Equation 6 uses a per-cluster inverse `(D_c'D_c)^{-1}` when forming the influence function contribution. The R `did2s` implementation and our code use the GLOBAL inverse `(D'D)^{-1}` following standard GMM theory (Newey & McFadden 1994). We follow the R implementation, which is consistent with standard GMM sandwich variance estimation.
1398+
**Variance is faithful to the paper (global Jacobian inverse).** Gardner (2022) §3.3 derives the variance by reading the two stages as a joint GMM estimator (Hansen 1982) and applying Newey & McFadden (1994) Theorem 6.1: `v` is the last element of `E[∂f/∂(λ,γ,β)]^{-1} E[ff'] E[∂f/∂(λ,γ,β)]^{-1'}` — the **global** Jacobian inverse (the `(D'D)^{-1}` bread above), with the score outer-product `E[ff']` clustered at the unit per the reference Stata GMM `vce(cluster id)` (Appendix B). Our global `(D'D)^{-1}` bread + unit-clustered meat **matches** this and the R `did2s` implementation; there is **no** per-cluster inverse. (Equation (6) in the paper is the *event-study regression specification*, not a variance formula — an earlier "Equation 6 per-cluster inverse `(D_c'D_c)^{-1}`" note was a misattribution, corrected per `docs/methodology/papers/gardner-2022-review.md`.)
13991399

14001400
**No finite-sample adjustments:** The variance estimator uses the raw asymptotic sandwich without degrees-of-freedom corrections (no HC1-style `n/(n-k)` adjustment). This matches the R `did2s` implementation.
14011401

14021402
*Bootstrap:*
14031403

1404-
Our implementation uses multiplier bootstrap on the GMM influence function: cluster-level `psi` sums are pre-computed, then perturbed with multiplier weights (Rademacher by default; configurable via `bootstrap_weights` parameter to use Mammen or Webb weights, matching CallawaySantAnna). The R `did2s` package defaults to block bootstrap (resampling clusters with replacement). Both approaches are asymptotically valid; the multiplier bootstrap is computationally cheaper and consistent with the CallawaySantAnna/ImputationDiD bootstrap patterns in this library.
1404+
Our implementation uses multiplier bootstrap on the GMM influence function: cluster-level `psi` sums are pre-computed, then perturbed with multiplier weights (Rademacher by default; configurable via `bootstrap_weights` parameter to use Mammen or Webb weights, matching CallawaySantAnna). The R `did2s` package **defaults to analytical corrected clustered SEs** (`bootstrap = FALSE`, the same GMM sandwich); its block bootstrap is *optional* (`bootstrap = TRUE`, resampling clusters with replacement). All approaches are asymptotically valid; the multiplier bootstrap is computationally cheaper and consistent with the CallawaySantAnna/ImputationDiD bootstrap patterns in this library.
14051405

14061406
*Edge cases:*
14071407
- **Always-treated units:** Units treated in all observed periods have no untreated observations for Stage 1 FE estimation. These are excluded with a warning listing the affected unit IDs. Their treated observations do NOT contribute to Stage 2.
14081408
- **Rank condition violations:** If the Stage 1 design matrix (unit+time dummies on untreated obs) is rank-deficient, or if certain unit/time FE are unidentified (e.g., a unit with no untreated periods after excluding always-treated), the affected FE produce NaN. Behavior controlled by `rank_deficient_action`: "warn" (default), "error", or "silent".
14091409
- **NaN y_tilde handling:** When Stage 1 FE are unidentified for some observations, the residualized outcome `y_tilde` is NaN. These observations are zeroed out (excluded) from the Stage 2 regression and variance computation, matching the treatment of unimputable observations in ImputationDiD.
14101410
- **NaN inference for undefined statistics:** t_stat uses NaN when SE is non-finite or zero; p_value and CI also NaN. Matches CallawaySantAnna/ImputationDiD NaN convention.
14111411
- **Event study aggregation:** Horizon-specific effects use the same two-stage procedure with horizon indicator dummies in Stage 2. Unidentified horizons (e.g., long-run effects without never-treated units, per Proposition 5 of Borusyak et al. 2024) produce NaN.
1412-
- **Pre-period event study coefficients (`pretrends=True`):** When enabled, the Stage 2 design matrix `X_2` includes pre-period relative-time dummies. Pre-period observations have `y_tilde = Step 1 residual` by construction. The GMM sandwich variance accounts for Stage 1 estimation error (Gardner 2022, Theorem 1). Only affects event study aggregation; overall ATT unchanged.
1412+
- **Pre-period event study coefficients (`pretrends=True`):** When enabled, the Stage 2 design matrix `X_2` includes pre-period relative-time dummies. Pre-period observations have `y_tilde = Step 1 residual` by construction. The GMM sandwich variance accounts for Stage 1 estimation error (Gardner 2022 §3.3; Newey-McFadden 1994, Theorem 6.1 — the paper has no numbered theorems). Only affects event study aggregation; overall ATT unchanged.
14131413
- **balance_e with no qualifying cohorts:** If no cohorts have sufficient pre/post coverage for the requested `balance_e`, a warning is emitted and event study results contain only the reference period.
14141414
- **No never-treated units (Proposition 5):** When there are no never-treated units and multiple treatment cohorts, horizons h >= h_bar (where h_bar = max(groups) - min(groups)) are unidentified per Proposition 5 of Borusyak et al. (2024). These produce NaN inference with n_obs > 0 (treated observations exist but counterfactual is unidentified) and a warning listing affected horizons. Matches ImputationDiD behavior. Proposition 5 applies to event study horizons only, not cohort aggregation — a cohort whose treated obs all fall at Prop 5 horizons naturally gets n_obs=0 in group effects because all its y_tilde values are NaN.
14151415
- **Zero-observation horizons after filtering:** When `balance_e` or NaN `y_tilde` filtering results in zero observations for some non-Prop-5 event study horizons, those horizons produce NaN for all inference fields (effect, SE, t-stat, p-value, CI) with n_obs=0.
@@ -1429,7 +1429,7 @@ Our implementation uses multiplier bootstrap on the GMM influence function: clus
14291429
- [x] Stage 2: Regress residualized outcomes on treatment indicators
14301430
- [x] Point estimates match ImputationDiD
14311431
- [x] GMM sandwich variance (Newey & McFadden 1994 Theorem 6.1)
1432-
- [x] Global `(D'D)^{-1}` in variance (matches R `did2s`, not paper Eq. 6)
1432+
- [x] Global `(D'D)^{-1}` in variance (faithful to Gardner §3.3 / Newey-McFadden GMM sandwich; matches R `did2s`)
14331433
- [x] No finite-sample adjustment (raw asymptotic sandwich)
14341434
- [x] Always-treated units excluded with warning
14351435
- [x] Multiplier bootstrap on GMM influence function

0 commit comments

Comments
 (0)