fix(wild-bootstrap): rank-aware storage vcov (CI review P1)

igerber · claude · igerber · commit 91a458d4fed9 · 2026-06-24T15:37:36.000-04:00
The estimator stored the cluster-robust vcov via compute_robust_vcov(X, ...) on
the full design, which inverts X'X directly and raises ValueError (or returns
garbage) when a nuisance column is collinear — e.g. a fixed-effect dummy
collinear with treatment on a full-dummy design — even though the ATT is
identified and wild_bootstrap_se itself drops such columns internally. Verified:
the storage call receives a rank-deficient X (rank 22 of 23) in the existing
TWFE full-dummy test, and compute_robust_vcov raises on an exactly-singular X.

Fix: compute the stored vcov through the rank-aware solve_ols(...,
rank_deficient_action="silent") path, which drops collinear columns and
NaN-expands the vcov for them — bit-identical to compute_robust_vcov on full-rank
designs (verified, ~5e-17). Removed the now-unused compute_robust_vcov import.

Test: a DiD fixed_effects design with a dummy that EXACTLY duplicates the
treatment indicator (singular X'X) — wild-bootstrap fit stays finite, no crash,
stored vcov NaN-expanded for the dropped column. Existing TWFE rank-deficient
full-dummy test still passes (both backends).

Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -15,6 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Fixed
 - **Wild cluster bootstrap (`inference="wild_bootstrap"`) now imposes the null — fixes an invalid p-value (issue #543).** `DifferenceInDifferences`/`TwoWayFixedEffects` with `inference="wild_bootstrap"` previously produced a p-value that contradicted its own confidence interval (e.g. CI `[2.30, 2.64]` excluding 0, yet `p = 0.86`). `diff_diff.utils.wild_bootstrap_se` *claimed* to run the Wild Cluster **Restricted** bootstrap but never actually imposed the null — it re-fit the full design (keeping the treatment column) to the unchanged outcome, so the "restricted" residuals equaled the unrestricted ones and the bootstrap coefficient distribution centered on the estimate instead of 0. The p-value `mean(|t*| ≥ |t₀|)` then measured noise around the estimate (≈0.5–0.86 regardless of significance) while the percentile-of-coefficients CI happened to look fine — an internal contradiction. The bootstrap now genuinely imposes H₀ (drops the coefficient's column for the restricted fit), studentizes with the analytical CR1 SE, and derives the CI by **test inversion** so the p-value and CI are exactly consistent (`0 ∈ CI ⟺ p ≥ alpha`). For Rademacher weights with few clusters the full `2**n_clusters` sign-vector set is enumerated (deterministic), matching R's `fwildclusterboot::boottest`. **Results change** for any prior `wild_bootstrap` use: the headline `p_value`/`conf_int` are corrected (a true effect is now correctly significant), and the reported `se` is now the analytical cluster-robust (CR1) SE (numerically ~unchanged in well-behaved cases). Validated against `fwildclusterboot::boottest()` (`benchmarks/R/generate_wild_cluster_boot_golden.R`; bootstrap t-distribution to ~6e-14, `se`/`t`/interior-`p` exact, CI to ~1e-4) and an independent full-refit enumeration. See `docs/methodology/REGISTRY.md` §"Wild cluster bootstrap (WCR)".
 - **Cluster-robust / HC1 standard errors no longer raise `ZeroDivisionError` on a saturated design.** `linalg.compute_robust_vcov` (NumPy path) divided by `(n_eff - k)` in the HC1/CR1 small-sample adjustment without guarding a design with no residual degrees of freedom (`n_eff == k`, e.g. a 2×2 DiD with one observation per cluster-period); it now returns a NaN vcov so inference is degenerate (NaN), consistent with the all-or-nothing NaN convention, rather than crashing. Surfaced while hardening the wild cluster bootstrap (`wild_bootstrap_se` independently routes saturated / weak-identification designs to NaN, and represents a genuinely unbounded inverted CI with `±inf` instead of mixing finite point estimates with NaN endpoints).
+- **Wild cluster bootstrap on a rank-deficient full-dummy design no longer crashes when storing the vcov.** `_run_wild_bootstrap_inference` computed the stored cluster-robust vcov via `compute_robust_vcov(X, ...)` on the full design, which inverts `X'X` directly and raises (or returns garbage) when a nuisance column is collinear (e.g. a fixed-effect dummy collinear with treatment) — even though the ATT is identified and the bootstrap itself drops such columns. It now computes the stored vcov through the rank-aware `solve_ols(..., rank_deficient_action="silent")` path, NaN-expanding the dropped column (bit-identical to the prior result on full-rank designs).
 - **`TwoStageDiD` analytical GMM standard errors are now exact (match R `did2s` to ~1e-7).** The Gardner two-stage GMM sandwich `_compute_gmm_variance` derived its residuals from the *iterative* alternating-projection first-stage fixed effects (`_iterative_fe`, which converge only to ~1e-7 on unbalanced untreated panels) while computing `gamma_hat` exactly — leaving the variance ~1% off the analytical sandwich. The variance now re-solves the Stage-1 FE **exactly** (sparse OLS, reusing the `gamma_hat` factorization), and `_build_fe_design` gained an intercept column so its column space spans the grand mean (the prior intercept-free design omitted it, and the exact residual is first-order sensitive to it). Unidentified-FE obs (rank-deficient / Proposition-5) fall back to the iterative residual, so those edge cases are unchanged; the reported `overall_att` still uses the iterative FE (point-estimate equivalence with `ImputationDiD` preserved). Mirrors the same-class fix already applied to `ImputationDiD`'s exact-sparse variance.
 - **`LinearRegression.get_se()` / `get_inference()` no longer return a `NaN` standard error from a tiny-negative variance artifact.** A high-leverage / degenerate coefficient (e.g. an absorbed-FE dummy near-collinear with the treatment, whose Bell-McCaffrey Satterthwaite DOF already hits the noise-floor guard) can have a CR2/HC variance of ~0 (≈1e-32) whose vcov diagonal lands just-below-zero under BLAS-dependent float rounding; `np.sqrt` of the negative then produced a `NaN` SE **nondeterministically** — passing single-threaded but failing under the parallel pure-Python full-suite run (`tests/test_methodology_wls_cr2.py::TestLinearRegressionFENanGuardEndToEnd::test_did_absorbed_fe_lr_inference_nan_for_guarded_coefs`). Both SE sites now clamp the vcov diagonal at 0, so the SE is finite (0 for a genuinely-zero variance), deterministic, and BLAS-independent. **No change for any positive variance** (the clamp is a no-op there); only the previously-`NaN` degenerate case is affected.
 - **`TripleDifference` power analysis now honors `n_periods > 2`.** `simulate_power`,
diff --git a/diff_diff/estimators.py b/diff_diff/estimators.py
@@ -23,7 +23,6 @@
     LinearRegression,
     _expand_vcov_with_nan,
     compute_r_squared,
-    compute_robust_vcov,
     solve_ols,
 )
 from diff_diff.results import DiDResults, MultiPeriodDiDResults, PeriodEffect
@@ -826,15 +825,25 @@ def _run_wild_bootstrap_inference(
         conf_int = (bootstrap_results.ci_lower, bootstrap_results.ci_upper)
         t_stat = bootstrap_results.t_stat_original
 
-        # Also compute the cluster-robust vcov for storage. When the bootstrap
-        # itself returned degenerate (all-NaN) inference — e.g. a saturated
-        # design with no residual degrees of freedom — the shared CR1 sandwich
-        # would divide by zero, so store a NaN vcov instead, keeping the
-        # all-or-nothing NaN contract rather than raising.
+        # Also compute the cluster-robust vcov for storage. Use the rank-aware
+        # solve_ols path (silently dropping collinear nuisance columns and
+        # NaN-expanding the vcov for them), matching how wild_bootstrap_se itself
+        # handles rank-deficient full-dummy designs — `compute_robust_vcov()`
+        # inverts the full X'X directly and would raise (or return garbage) on a
+        # rank-deficient design even though the ATT and bootstrap are identified.
+        # On a saturated design (degenerate bootstrap, NaN se) store a NaN vcov
+        # to keep the all-or-nothing NaN contract. (On a full-rank design this
+        # vcov is bit-identical to the prior compute_robust_vcov result.)
         if np.isnan(se):
             vcov = np.full((X.shape[1], X.shape[1]), np.nan)
         else:
-            vcov = compute_robust_vcov(X, residuals, cluster_ids)
+            _, _, vcov = solve_ols(
+                X,
+                y,
+                cluster_ids=cluster_ids,
+                return_vcov=True,
+                rank_deficient_action="silent",
+            )
 
         return se, p_value, conf_int, t_stat, vcov, bootstrap_results
 
diff --git a/tests/test_wild_bootstrap.py b/tests/test_wild_bootstrap.py
@@ -1435,3 +1435,48 @@ def test_single_regressor_design_does_not_crash():
     assert isinstance(res, WildBootstrapResults)
     if np.isfinite(res.p_value):
         assert (res.ci_lower <= 0.0 <= res.ci_upper) == (res.p_value >= 0.05)
+
+
+def test_wild_bootstrap_rank_deficient_storage_vcov_does_not_crash():
+    """The estimator's stored cluster-robust vcov is computed through the
+    rank-aware solve_ols path, so a wild-bootstrap fit on a rank-deficient
+    full-dummy design (here a fixed-effect dummy that EXACTLY duplicates the
+    treatment indicator) does not crash, and the stored vcov is NaN-expanded for
+    the dropped column rather than raising on the singular X'X. Regression for
+    the storage-vcov gap in `_run_wild_bootstrap_inference` (the bootstrap helper
+    already handled rank deficiency internally).
+    """
+    import warnings
+
+    rng = np.random.default_rng(0)
+    rows = []
+    for u in range(16):
+        treated = int(u < 8)
+        fe = "T" if treated else "C"  # the 'T' dummy == treated exactly -> singular X'X
+        for period in (0, 1):
+            y = 5 + 2 * period + (1.5 if (treated and period) else 0) + rng.normal(0, 0.5)
+            rows.append(
+                {
+                    "unit": u,
+                    "fe": fe,
+                    "cluster": u % 8,
+                    "treated": treated,
+                    "post": period,
+                    "outcome": y,
+                }
+            )
+    df = pd.DataFrame(rows)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")  # expected rank-deficient drop warning
+        res = DifferenceInDifferences(
+            cluster="cluster", inference="wild_bootstrap", n_bootstrap=99, seed=1
+        ).fit(df, outcome="outcome", treatment="treated", time="post", fixed_effects=["fe"])
+    # ATT identified, bootstrap inference finite, no exception.
+    assert np.isfinite(res.att)
+    assert np.isfinite(res.se) and res.se > 0
+    assert np.isfinite(res.p_value)
+    assert np.isfinite(res.conf_int[0]) and np.isfinite(res.conf_int[1])
+    # Stored vcov is rank-aware (NaN-expanded for the dropped column), not +/-inf.
+    assert res.vcov is not None
+    assert np.any(np.isnan(res.vcov))
+    assert not np.any(np.isinf(res.vcov))