Skip to content

Commit 91a458d

Browse files
igerberclaude
andcommitted
fix(wild-bootstrap): rank-aware storage vcov (CI review P1)
The estimator stored the cluster-robust vcov via compute_robust_vcov(X, ...) on the full design, which inverts X'X directly and raises ValueError (or returns garbage) when a nuisance column is collinear — e.g. a fixed-effect dummy collinear with treatment on a full-dummy design — even though the ATT is identified and wild_bootstrap_se itself drops such columns internally. Verified: the storage call receives a rank-deficient X (rank 22 of 23) in the existing TWFE full-dummy test, and compute_robust_vcov raises on an exactly-singular X. Fix: compute the stored vcov through the rank-aware solve_ols(..., rank_deficient_action="silent") path, which drops collinear columns and NaN-expands the vcov for them — bit-identical to compute_robust_vcov on full-rank designs (verified, ~5e-17). Removed the now-unused compute_robust_vcov import. Test: a DiD fixed_effects design with a dummy that EXACTLY duplicates the treatment indicator (singular X'X) — wild-bootstrap fit stays finite, no crash, stored vcov NaN-expanded for the dropped column. Existing TWFE rank-deficient full-dummy test still passes (both backends). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 7907d83 commit 91a458d

3 files changed

Lines changed: 62 additions & 7 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1515
### Fixed
1616
- **Wild cluster bootstrap (`inference="wild_bootstrap"`) now imposes the null — fixes an invalid p-value (issue #543).** `DifferenceInDifferences`/`TwoWayFixedEffects` with `inference="wild_bootstrap"` previously produced a p-value that contradicted its own confidence interval (e.g. CI `[2.30, 2.64]` excluding 0, yet `p = 0.86`). `diff_diff.utils.wild_bootstrap_se` *claimed* to run the Wild Cluster **Restricted** bootstrap but never actually imposed the null — it re-fit the full design (keeping the treatment column) to the unchanged outcome, so the "restricted" residuals equaled the unrestricted ones and the bootstrap coefficient distribution centered on the estimate instead of 0. The p-value `mean(|t*| ≥ |t₀|)` then measured noise around the estimate (≈0.5–0.86 regardless of significance) while the percentile-of-coefficients CI happened to look fine — an internal contradiction. The bootstrap now genuinely imposes H₀ (drops the coefficient's column for the restricted fit), studentizes with the analytical CR1 SE, and derives the CI by **test inversion** so the p-value and CI are exactly consistent (`0 ∈ CI ⟺ p ≥ alpha`). For Rademacher weights with few clusters the full `2**n_clusters` sign-vector set is enumerated (deterministic), matching R's `fwildclusterboot::boottest`. **Results change** for any prior `wild_bootstrap` use: the headline `p_value`/`conf_int` are corrected (a true effect is now correctly significant), and the reported `se` is now the analytical cluster-robust (CR1) SE (numerically ~unchanged in well-behaved cases). Validated against `fwildclusterboot::boottest()` (`benchmarks/R/generate_wild_cluster_boot_golden.R`; bootstrap t-distribution to ~6e-14, `se`/`t`/interior-`p` exact, CI to ~1e-4) and an independent full-refit enumeration. See `docs/methodology/REGISTRY.md` §"Wild cluster bootstrap (WCR)".
1717
- **Cluster-robust / HC1 standard errors no longer raise `ZeroDivisionError` on a saturated design.** `linalg.compute_robust_vcov` (NumPy path) divided by `(n_eff - k)` in the HC1/CR1 small-sample adjustment without guarding a design with no residual degrees of freedom (`n_eff == k`, e.g. a 2×2 DiD with one observation per cluster-period); it now returns a NaN vcov so inference is degenerate (NaN), consistent with the all-or-nothing NaN convention, rather than crashing. Surfaced while hardening the wild cluster bootstrap (`wild_bootstrap_se` independently routes saturated / weak-identification designs to NaN, and represents a genuinely unbounded inverted CI with `±inf` instead of mixing finite point estimates with NaN endpoints).
18+
- **Wild cluster bootstrap on a rank-deficient full-dummy design no longer crashes when storing the vcov.** `_run_wild_bootstrap_inference` computed the stored cluster-robust vcov via `compute_robust_vcov(X, ...)` on the full design, which inverts `X'X` directly and raises (or returns garbage) when a nuisance column is collinear (e.g. a fixed-effect dummy collinear with treatment) — even though the ATT is identified and the bootstrap itself drops such columns. It now computes the stored vcov through the rank-aware `solve_ols(..., rank_deficient_action="silent")` path, NaN-expanding the dropped column (bit-identical to the prior result on full-rank designs).
1819
- **`TwoStageDiD` analytical GMM standard errors are now exact (match R `did2s` to ~1e-7).** The Gardner two-stage GMM sandwich `_compute_gmm_variance` derived its residuals from the *iterative* alternating-projection first-stage fixed effects (`_iterative_fe`, which converge only to ~1e-7 on unbalanced untreated panels) while computing `gamma_hat` exactly — leaving the variance ~1% off the analytical sandwich. The variance now re-solves the Stage-1 FE **exactly** (sparse OLS, reusing the `gamma_hat` factorization), and `_build_fe_design` gained an intercept column so its column space spans the grand mean (the prior intercept-free design omitted it, and the exact residual is first-order sensitive to it). Unidentified-FE obs (rank-deficient / Proposition-5) fall back to the iterative residual, so those edge cases are unchanged; the reported `overall_att` still uses the iterative FE (point-estimate equivalence with `ImputationDiD` preserved). Mirrors the same-class fix already applied to `ImputationDiD`'s exact-sparse variance.
1920
- **`LinearRegression.get_se()` / `get_inference()` no longer return a `NaN` standard error from a tiny-negative variance artifact.** A high-leverage / degenerate coefficient (e.g. an absorbed-FE dummy near-collinear with the treatment, whose Bell-McCaffrey Satterthwaite DOF already hits the noise-floor guard) can have a CR2/HC variance of ~0 (≈1e-32) whose vcov diagonal lands just-below-zero under BLAS-dependent float rounding; `np.sqrt` of the negative then produced a `NaN` SE **nondeterministically** — passing single-threaded but failing under the parallel pure-Python full-suite run (`tests/test_methodology_wls_cr2.py::TestLinearRegressionFENanGuardEndToEnd::test_did_absorbed_fe_lr_inference_nan_for_guarded_coefs`). Both SE sites now clamp the vcov diagonal at 0, so the SE is finite (0 for a genuinely-zero variance), deterministic, and BLAS-independent. **No change for any positive variance** (the clamp is a no-op there); only the previously-`NaN` degenerate case is affected.
2021
- **`TripleDifference` power analysis now honors `n_periods > 2`.** `simulate_power`,

diff_diff/estimators.py

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@
2323
LinearRegression,
2424
_expand_vcov_with_nan,
2525
compute_r_squared,
26-
compute_robust_vcov,
2726
solve_ols,
2827
)
2928
from diff_diff.results import DiDResults, MultiPeriodDiDResults, PeriodEffect
@@ -826,15 +825,25 @@ def _run_wild_bootstrap_inference(
826825
conf_int = (bootstrap_results.ci_lower, bootstrap_results.ci_upper)
827826
t_stat = bootstrap_results.t_stat_original
828827

829-
# Also compute the cluster-robust vcov for storage. When the bootstrap
830-
# itself returned degenerate (all-NaN) inference — e.g. a saturated
831-
# design with no residual degrees of freedom — the shared CR1 sandwich
832-
# would divide by zero, so store a NaN vcov instead, keeping the
833-
# all-or-nothing NaN contract rather than raising.
828+
# Also compute the cluster-robust vcov for storage. Use the rank-aware
829+
# solve_ols path (silently dropping collinear nuisance columns and
830+
# NaN-expanding the vcov for them), matching how wild_bootstrap_se itself
831+
# handles rank-deficient full-dummy designs — `compute_robust_vcov()`
832+
# inverts the full X'X directly and would raise (or return garbage) on a
833+
# rank-deficient design even though the ATT and bootstrap are identified.
834+
# On a saturated design (degenerate bootstrap, NaN se) store a NaN vcov
835+
# to keep the all-or-nothing NaN contract. (On a full-rank design this
836+
# vcov is bit-identical to the prior compute_robust_vcov result.)
834837
if np.isnan(se):
835838
vcov = np.full((X.shape[1], X.shape[1]), np.nan)
836839
else:
837-
vcov = compute_robust_vcov(X, residuals, cluster_ids)
840+
_, _, vcov = solve_ols(
841+
X,
842+
y,
843+
cluster_ids=cluster_ids,
844+
return_vcov=True,
845+
rank_deficient_action="silent",
846+
)
838847

839848
return se, p_value, conf_int, t_stat, vcov, bootstrap_results
840849

tests/test_wild_bootstrap.py

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1435,3 +1435,48 @@ def test_single_regressor_design_does_not_crash():
14351435
assert isinstance(res, WildBootstrapResults)
14361436
if np.isfinite(res.p_value):
14371437
assert (res.ci_lower <= 0.0 <= res.ci_upper) == (res.p_value >= 0.05)
1438+
1439+
1440+
def test_wild_bootstrap_rank_deficient_storage_vcov_does_not_crash():
1441+
"""The estimator's stored cluster-robust vcov is computed through the
1442+
rank-aware solve_ols path, so a wild-bootstrap fit on a rank-deficient
1443+
full-dummy design (here a fixed-effect dummy that EXACTLY duplicates the
1444+
treatment indicator) does not crash, and the stored vcov is NaN-expanded for
1445+
the dropped column rather than raising on the singular X'X. Regression for
1446+
the storage-vcov gap in `_run_wild_bootstrap_inference` (the bootstrap helper
1447+
already handled rank deficiency internally).
1448+
"""
1449+
import warnings
1450+
1451+
rng = np.random.default_rng(0)
1452+
rows = []
1453+
for u in range(16):
1454+
treated = int(u < 8)
1455+
fe = "T" if treated else "C" # the 'T' dummy == treated exactly -> singular X'X
1456+
for period in (0, 1):
1457+
y = 5 + 2 * period + (1.5 if (treated and period) else 0) + rng.normal(0, 0.5)
1458+
rows.append(
1459+
{
1460+
"unit": u,
1461+
"fe": fe,
1462+
"cluster": u % 8,
1463+
"treated": treated,
1464+
"post": period,
1465+
"outcome": y,
1466+
}
1467+
)
1468+
df = pd.DataFrame(rows)
1469+
with warnings.catch_warnings():
1470+
warnings.simplefilter("ignore") # expected rank-deficient drop warning
1471+
res = DifferenceInDifferences(
1472+
cluster="cluster", inference="wild_bootstrap", n_bootstrap=99, seed=1
1473+
).fit(df, outcome="outcome", treatment="treated", time="post", fixed_effects=["fe"])
1474+
# ATT identified, bootstrap inference finite, no exception.
1475+
assert np.isfinite(res.att)
1476+
assert np.isfinite(res.se) and res.se > 0
1477+
assert np.isfinite(res.p_value)
1478+
assert np.isfinite(res.conf_int[0]) and np.isfinite(res.conf_int[1])
1479+
# Stored vcov is rank-aware (NaN-expanded for the dropped column), not +/-inf.
1480+
assert res.vcov is not None
1481+
assert np.any(np.isnan(res.vcov))
1482+
assert not np.any(np.isinf(res.vcov))

0 commit comments

Comments
 (0)