You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: Gardner (2022) paper review — TwoStageDiD (PR-A)
Methodology-review PR-A for TwoStageDiD (Gardner 2022, arXiv:2207.05943;
R `did2s`), the imputation-pair twin of the just-completed ImputationDiD.
- New `docs/methodology/papers/gardner-2022-review.md`: eq./section-numbered
scholarly review of the primary source.
- REGISTRY `## TwoStageDiD` + `METHODOLOGY_REVIEW.md` tracker: corrected the
variance misattributions the source read surfaced — (i) the "Equation 6
per-cluster inverse (D_c'D_c)^{-1} deviation" was fabricated (eq. 6 is the
event-study spec; the variance is the unnumbered GLOBAL Newey-McFadden
Thm 6.1 Jacobian-inverse sandwich, which the code already matches — not a
deviation); (ii) "(Gardner 2022, Theorem 1)" (the paper has no numbered
theorems); relabeled the cluster-summed meat (was "Bread").
- Corrected the `did2s` bootstrap-default claim in 3 places (paper review,
REGISTRY, two_stage_results.py docstring): did2s defaults to analytical
corrected clustered SEs (`bootstrap = FALSE`); block bootstrap is optional,
not the default (verified vs the did2s source).
- doc-deps.yaml: mapped the review under two_stage.py. TODO.md: tracked the
PR-B deliverables (tests + did2s parity -> tracker flip).
doc-deps integrity green; references + catalog verified. Local + CI agentic
AI review clean after addressing the bootstrap-attribution and tracker-
consistency findings.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: METHODOLOGY_REVIEW.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -588,14 +588,14 @@ and covariate-adjusted specifications.)
588
588
589
589
**Documentation in place:**
590
590
- REGISTRY.md section: `## TwoStageDiD` (Stage 1 unit+time FE on untreated, Stage 2 OLS on residualized outcomes, GMM sandwich variance per Newey-McFadden Theorem 6.1)
591
+
- Paper review: `docs/methodology/papers/gardner-2022-review.md` (PR-A — eq./section-numbered review of arXiv:2207.05943; corrected a fabricated Eq. 6 variance deviation, see "Documented alignment" below)
591
592
- Implementation: 76 unit tests in `tests/test_two_stage.py` (matches ImputationDiD point estimates, R `did2s` global `(D'D)^{-1}` variance, always-treated unit exclusion, multiplier bootstrap)
592
-
- Documented R alignment: uses global `(D'D)^{-1}`matching `did2s` (not paper Eq. 6)
593
+
- Documented alignment: variance = global `(D'D)^{-1}`GMM sandwich (Newey-McFadden Theorem 6.1, Gardner §3.3) — **faithful to both the paper and `did2s`**. Gardner eq. (6) is the *event-study regression spec*, not a variance formula; the earlier "matches `did2s`, not paper Eq. 6" / "Newey-McFadden sandwich vs paper's Eq. 6 deviation" framing was a misattribution, corrected in PR-A across `REGISTRY.md` + the paper review.
593
594
594
595
**Outstanding for promotion:**
595
596
- Dedicated `tests/test_methodology_two_stage.py` with paper-equation-numbered Verified Components walk-through
596
597
- R parity benchmark fixture against `did2s` (none on file)
597
-
- Documented deviation: Newey-McFadden Theorem 6.1 sandwich vs paper's Eq. 6 (already noted in REGISTRY but not formalized in this tracker)
598
-
- "Corrections Made" listing
598
+
- "Corrections Made" listing + flip Status → Complete (PR-B)
599
599
600
600
---
601
601
@@ -1444,10 +1444,10 @@ more graceful handling of edge cases while still signaling invalid inference to
1444
1444
1445
1445
Promotion priority for the **In Progress** entries, ordered by what's blocked on substantive review work (top of list = needs review next) vs. consolidation pass (bottom of list = mostly tracker walk-through):
1446
1446
1447
-
**Substantive-review-blocked (still missing a methodology test file / R parity and a paper review):**
1447
+
**Substantive-review-blocked (each still missing one or more of: a methodology test file, R parity, or a paper review):**
1448
1448
1449
1449
1.**PlaceboTests** — decide first whether to keep standalone or absorb into per-estimator diagnostic sections; methodologically lightweight either way.
1450
-
2.**TwoStageDiD** — the remaining half of the imputation pair (ImputationDiD is now Complete, validated against `didimputation`). Needs a Gardner (2022) paper review, `tests/test_methodology_two_stage.py`, and an R parity fixture against `did2s`.
1450
+
2.**TwoStageDiD** — the remaining half of the imputation pair (ImputationDiD is now Complete, validated against `didimputation`). Gardner (2022) paper review**landed** (`docs/methodology/papers/gardner-2022-review.md`, PR-A); still needs `tests/test_methodology_two_stage.py` and an R parity fixture against `did2s` to flip to Complete (PR-B).
1451
1451
1452
1452
**Consolidation-pass-blocked (already has paper review or methodology file or R parity; mostly Verified Components walk-through):**
Copy file name to clipboardExpand all lines: TODO.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -173,6 +173,7 @@ Deferred items from PR reviews that were not addressed before merge.
173
173
|-------|----------|----|----------|
174
174
| Drift test for tutorial 24 qualitative power claims (monotonic dilution fast→slow; CS-vs-2×2 MDE crossover/near-parity at slow rollout) — pins the prose against estimator-default/simulation drift |`docs/tutorials/24_staggered_vs_collapsed_power.ipynb`| staggered-analysis-2x2 | Low |
175
175
| ImputationDiD covariate-path variance lacks dedicated R `didimputation` parity / hand-calc. The PR-B FE-design correction (keep all unit dummies) affects the covariate projection too, but only the no-covariate staggered panel is R-parity'd (the covariate path shares the same validated projection code and passes the full suite). Add a covariate (time-varying X) R golden asserting overall/event-study SE parity, or a small dense-design hand-calc for the covariate projection. |`tests/test_methodology_imputation.py`, `benchmarks/R/generate_didimputation_golden.R`| imputation-validation follow-up | Low |
176
+
| TwoStageDiD methodology validation PR-B: add `tests/test_methodology_two_stage.py` (eq./section-numbered Verified Components — Stage-1 FE recovery on untreated obs; Stage-2 overall ATT eq. 4 + event-study eq. 6; GMM first-stage-correction behavior; always-treated drop) + `did2s` R parity fixture (`benchmarks/R/generate_did2s_golden.R` + `benchmarks/data/did2s_golden.json` + `did2s_test_panel.csv`); then flip `METHODOLOGY_REVIEW.md` TwoStageDiD row In Progress → Complete. PR-A (paper review `gardner-2022-review.md`) merged separately. |`tests/test_methodology_two_stage.py`, `benchmarks/`, `METHODOLOGY_REVIEW.md`| two-stage-validation PR-B | Medium |
176
177
| Port the CI `<notebook-prose>` extraction into the reviewer-eval harness so `docs/tutorials/*.ipynb` cases (currently guarded out of `verify-corpus`/`run`) can be reviewed with CI-equivalent context |`tools/reviewer-eval/adapters/ci_prompt.py`| local-review | Low |
177
178
| **Premise corrected — no CI impact (verified 2026-06-07).** The "slow CI" motivation does not hold: no CI workflow installs R (no `setup-r` / `r-lib/actions` / `fixest` / `r-base` install anywhere in `.github/workflows/`), so every R-parity test skips in CI behind a per-file availability gate (`fixest_available` in twfe, `_check_r_contdid()` in continuous_did, `require_r` / `r_available` in `conftest.py`, etc.) — consolidating `Rscript` spawns yields zero CI speedup. The originally-cited file already session-caches its R fits: `test_methodology_twfe.py` exposes `r_twfe_results` / `r_twfe_results_with_covariate` as `scope="session"` fixtures, so each R model runs once per session, not once per test. The only residual is a LOCAL-dev micro-optimization for developers who have R installed: `test_methodology_continuous_did.py` (the `_run_r_contdid` helper plus three standalone inline `Rscript` calls) and `test_methodology_callaway.py` (`_run_r_estimation` called inline in three test methods, plus `_get_r_mpdta_and_results` re-run by the MPDTA R-parity tests) re-spawn `library(...)` per call with no session-level result cache. Applying the twfe session-fixture pattern there would speed local R-parity runs only. Low value; retained as a local-dev note. | `tests/test_methodology_continuous_did.py`, `tests/test_methodology_callaway.py` | #139 | Low |
178
179
| CS R helpers hard-code `xformla = ~ 1`; no covariate-adjusted R benchmark for IRLS path |`tests/test_methodology_callaway.py`|#202| Low |
Bread = sum_c ( sum_{i in c} psi_i )( sum_{i in c} psi_i )'
1393
+
Meat = sum_c ( sum_{i in c} psi_i )( sum_{i in c} psi_i )' [score outer-product, clustered at unit]
1394
1394
```
1395
1395
1396
1396
where `psi_i` is the stacked influence function for unit i across all its observations, combining the Stage 2 score and the Stage 1 correction term.
1397
1397
1398
-
**Note on Equation 6 discrepancy:** The paper's Equation 6 uses a per-cluster inverse `(D_c'D_c)^{-1}` when forming the influence function contribution. The R `did2s` implementation and our code use the GLOBAL inverse `(D'D)^{-1}` following standard GMM theory (Newey & McFadden 1994). We follow the R implementation, which is consistent with standard GMM sandwich variance estimation.
1398
+
**Variance is faithful to the paper (global Jacobian inverse).** Gardner (2022) §3.3 derives the variance by reading the two stages as a joint GMM estimator (Hansen 1982) and applying Newey & McFadden (1994) Theorem 6.1: `v` is the last element of `E[∂f/∂(λ,γ,β)]^{-1} E[ff'] E[∂f/∂(λ,γ,β)]^{-1'}` — the **global** Jacobian inverse (the `(D'D)^{-1}` bread above), with the score outer-product `E[ff']` clustered at the unit per the reference Stata GMM `vce(cluster id)` (Appendix B). Our global `(D'D)^{-1}` bread + unit-clustered meat **matches** this and the R `did2s` implementation; there is **no** per-cluster inverse. (Equation (6) in the paper is the *event-study regression specification*, not a variance formula — an earlier "Equation 6 per-cluster inverse `(D_c'D_c)^{-1}`" note was a misattribution, corrected per `docs/methodology/papers/gardner-2022-review.md`.)
1399
1399
1400
1400
**No finite-sample adjustments:** The variance estimator uses the raw asymptotic sandwich without degrees-of-freedom corrections (no HC1-style `n/(n-k)` adjustment). This matches the R `did2s` implementation.
1401
1401
1402
1402
*Bootstrap:*
1403
1403
1404
-
Our implementation uses multiplier bootstrap on the GMM influence function: cluster-level `psi` sums are pre-computed, then perturbed with multiplier weights (Rademacher by default; configurable via `bootstrap_weights` parameter to use Mammen or Webb weights, matching CallawaySantAnna). The R `did2s` package defaults to block bootstrap (resampling clusters with replacement). Both approaches are asymptotically valid; the multiplier bootstrap is computationally cheaper and consistent with the CallawaySantAnna/ImputationDiD bootstrap patterns in this library.
1404
+
Our implementation uses multiplier bootstrap on the GMM influence function: cluster-level `psi` sums are pre-computed, then perturbed with multiplier weights (Rademacher by default; configurable via `bootstrap_weights` parameter to use Mammen or Webb weights, matching CallawaySantAnna). The R `did2s` package **defaults to analytical corrected clustered SEs** (`bootstrap = FALSE`, the same GMM sandwich); its block bootstrap is *optional* (`bootstrap = TRUE`, resampling clusters with replacement). All approaches are asymptotically valid; the multiplier bootstrap is computationally cheaper and consistent with the CallawaySantAnna/ImputationDiD bootstrap patterns in this library.
1405
1405
1406
1406
*Edge cases:*
1407
1407
- **Always-treated units:** Units treated in all observed periods have no untreated observations for Stage 1 FE estimation. These are excluded with a warning listing the affected unit IDs. Their treated observations do NOT contribute to Stage 2.
1408
1408
- **Rank condition violations:** If the Stage 1 design matrix (unit+time dummies on untreated obs) is rank-deficient, or if certain unit/time FE are unidentified (e.g., a unit with no untreated periods after excluding always-treated), the affected FE produce NaN. Behavior controlled by `rank_deficient_action`: "warn" (default), "error", or "silent".
1409
1409
- **NaN y_tilde handling:** When Stage 1 FE are unidentified for some observations, the residualized outcome `y_tilde` is NaN. These observations are zeroed out (excluded) from the Stage 2 regression and variance computation, matching the treatment of unimputable observations in ImputationDiD.
1410
1410
- **NaN inference for undefined statistics:** t_stat uses NaN when SE is non-finite or zero; p_value and CI also NaN. Matches CallawaySantAnna/ImputationDiD NaN convention.
1411
1411
- **Event study aggregation:** Horizon-specific effects use the same two-stage procedure with horizon indicator dummies in Stage 2. Unidentified horizons (e.g., long-run effects without never-treated units, per Proposition 5 of Borusyak et al. 2024) produce NaN.
1412
-
- **Pre-period event study coefficients (`pretrends=True`):** When enabled, the Stage 2 design matrix `X_2` includes pre-period relative-time dummies. Pre-period observations have `y_tilde = Step 1 residual` by construction. The GMM sandwich variance accounts for Stage 1 estimation error (Gardner 2022, Theorem 1). Only affects event study aggregation; overall ATT unchanged.
1412
+
- **Pre-period event study coefficients (`pretrends=True`):** When enabled, the Stage 2 design matrix `X_2` includes pre-period relative-time dummies. Pre-period observations have `y_tilde = Step 1 residual` by construction. The GMM sandwich variance accounts for Stage 1 estimation error (Gardner 2022 §3.3; Newey-McFadden 1994, Theorem 6.1 — the paper has no numbered theorems). Only affects event study aggregation; overall ATT unchanged.
1413
1413
- **balance_e with no qualifying cohorts:** If no cohorts have sufficient pre/post coverage for the requested `balance_e`, a warning is emitted and event study results contain only the reference period.
1414
1414
- **No never-treated units (Proposition 5):** When there are no never-treated units and multiple treatment cohorts, horizons h >= h_bar (where h_bar = max(groups) - min(groups)) are unidentified per Proposition 5 of Borusyak et al. (2024). These produce NaN inference with n_obs > 0 (treated observations exist but counterfactual is unidentified) and a warning listing affected horizons. Matches ImputationDiD behavior. Proposition 5 applies to event study horizons only, not cohort aggregation — a cohort whose treated obs all fall at Prop 5 horizons naturally gets n_obs=0 in group effects because all its y_tilde values are NaN.
1415
1415
- **Zero-observation horizons after filtering:** When `balance_e` or NaN `y_tilde` filtering results in zero observations for some non-Prop-5 event study horizons, those horizons produce NaN for all inference fields (effect, SE, t-stat, p-value, CI) with n_obs=0.
@@ -1429,7 +1429,7 @@ Our implementation uses multiplier bootstrap on the GMM influence function: clus
1429
1429
- [x] Stage 2: Regress residualized outcomes on treatment indicators
0 commit comments