wooldridge: CI R1 fixes — plot_event_study weights propagation + cohort_trends × group/calendar coverage + TODO row

igerber · igerber · commit 379e065f7455 · 2026-05-22T16:05:40.000-04:00
diff --git a/TODO.md b/TODO.md
@@ -96,6 +96,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | WooldridgeDiD: response-scale APE / log-link coefficient bridge for R `etwfe(family="poisson")` + `etwfe(family="logit")` cell-level numerical parity. diff-diff `WooldridgeDiD(method="poisson"\|"logit")` returns ATT on the response scale (counterfactual μ_1 − μ_0 / p_1 − p_0 per paper W2023 ASF / APE framework); R `etwfe` returns the cell-level log-link coefficient. PR-B Stage D ships log-link goldens at `benchmarks/data/wooldridge_golden.json` and surface tests (fit completes + goldens well-formed); cell-level numerical parity requires either `emfx()`-based APE extraction on the R side or link-function inversion with baseline-mean adjustment. | `benchmarks/R/generate_wooldridge_golden.R`, `tests/test_methodology_wooldridge.py::TestWooldridgeParityRPoisson/TestWooldridgeParityRLogit` | PR-B follow-up | Medium |
 | WooldridgeDiD: design-consistent cohort totals for `aggregate(weights="cohort_share")` on survey-weighted fits. Current impl populates `_n_g_per_cohort` from `unit.nunique()` (raw counts); composing these unweighted cohort shares with the design-weighted ATTs targets a mixed estimand inconsistent with paper W2025 Section 7's design-population cohort-share form. PR-B Stage E fail-closes the surface (raises `ValueError` when `survey_design is not None`); the follow-up implements survey-weighted unit totals per cohort and re-enables the surface. | `wooldridge.py` `_n_g_per_cohort` population, `wooldridge_results.py::aggregate` survey gate | PR-B follow-up | Medium |
 | WooldridgeDiD: unconditional inference for `aggregate(weights="cohort_share")` accounting for sampling uncertainty in the cohort shares ω̂_g / ω̂_{ge} (paper W2025 Section 7.5). Current impl fail-closes the t-stat / p-value / conf-int fields to NaN under cohort-share aggregation because the analytical SE is conditional-on-shares. Proper APE/GMM-style aggregate inference (Wooldridge 2023 Section 4 framework) re-enables full inference. | `wooldridge_results.py::aggregate` cohort_share inference branch | PR-B follow-up | Medium |
+| WooldridgeDiD: `cohort_trends=True` + `survey_design` composition. PR-B Stage E fail-closes the cross-product with `NotImplementedError` at `fit()` because the full-dummy `dg_i · t` design composed with the survey TSL variance hasn't been validated against R-parity goldens. Follow-up: validate the composition (or implement a survey-aware alternative) and re-enable the surface. | `wooldridge.py` fit guard, `wooldridge_results.py::aggregate` (if survey-aware cohort_trends variance plumbing is added) | PR-B follow-up | Low |
 | WooldridgeDiD: optional *efficiency hint* (NOT a canonical-link violation per W2023 Prop 3.1) when method/outcome pairing is sub-optimal — e.g., `method="ols"` on binary data is consistent under QMLE, but `method="logit"` is typically more efficient. The original framing in this row as a "canonical link requirement" tied to Prop 3.1 was incorrect: Wooldridge (2023) Table 1 lists Gaussian/OLS for "any response" and logistic-Bernoulli for "binary OR fractional". A useful hint exists (efficiency), but should not be framed as a methodology violation. See PR #453 R1 review for the corrected reading. | `wooldridge.py` | #216 | Low |
 | WooldridgeDiD: Stata `jwdid` golden value tests — add R/Stata reference script and `TestReferenceValues` class. | `tests/test_wooldridge.py` | #216 | Medium |
 <!-- The PreTrendsPower R parity row (PR-C, 2026-05-19) and the four PR-A-tagged PreTrendsPower rows (CS/SA Σ_22 fidelity, helper `violation_weights`, custom-weight persistence, linear γ-unit MDV; resolved in PR-B 2026-05-18) are all closed — see CHANGELOG.md [Unreleased] Added/Changed/Fixed entries for the new behavior. -->
diff --git a/diff_diff/wooldridge_results.py b/diff_diff/wooldridge_results.py
@@ -693,10 +693,33 @@ def to_dataframe(self, aggregation: str = "event") -> pd.DataFrame:
         rows = mapping.get(aggregation, [])
         return pd.DataFrame(rows)
 
-    def plot_event_study(self, **kwargs) -> None:
-        """Event study plot. Calls aggregate('event') if needed."""
+    def plot_event_study(self, weights: str = "cell", **kwargs) -> None:
+        """Event study plot. Calls ``aggregate('event', weights=weights)`` if needed.
+
+        Parameters
+        ----------
+        weights : "cell" | "cohort_share", default "cell"
+            Aggregation weighting scheme threaded into the underlying
+            ``aggregate("event", ...)`` call. ``"cohort_share"`` produces
+            paper W2025 Eq. 7.6 cohort-share-by-exposure weights
+            (post-treatment ``k >= 0`` only); inference fields are
+            fail-closed to NaN per the Section 7.5 conditional-on-shares
+            contract documented in REGISTRY.
+        **kwargs
+            Forwarded to ``diff_diff.visualization.plot_event_study``.
+        """
+        # Recompute under the active weighting scheme if the cached
+        # event_study_effects was built under a different scheme — or
+        # has not been built yet. Aggregating under "cell" then under
+        # "cohort_share" (or vice versa) replaces ``event_study_effects``
+        # in place per the existing aggregate() contract.
         if self.event_study_effects is None:
-            self.aggregate("event")
+            self.aggregate("event", weights=weights)
+        elif weights == "cohort_share":
+            # Force re-aggregation so the cohort-share contract is
+            # honored from a wrapper call that may have been preceded
+            # by an aggregate("event", weights="cell") at fit time.
+            self.aggregate("event", weights=weights)
         from diff_diff.visualization import plot_event_study  # type: ignore
 
         effects = {k: v["att"] for k, v in (self.event_study_effects or {}).items()}
diff --git a/tests/test_methodology_wooldridge.py b/tests/test_methodology_wooldridge.py
@@ -1375,6 +1375,100 @@ def test_cohort_trends_true_rejects_survey_design(self) -> None:
                 survey_design=survey,
             )
 
+    def test_cohort_trends_true_plus_aggregate_group(self) -> None:
+        """CI R1 P1 fix: ``cohort_trends=True`` + ``aggregate('group')`` runs cleanly.
+
+        Closes the parameter-interaction coverage gap codex flagged:
+        cohort_trends was only tested with event and simple
+        aggregations. The group aggregation operates on per-cohort
+        cells; cohort-trend columns are excluded by construction.
+        """
+        rng = np.random.default_rng(_BASE_SEED_SECTION8 + 13)
+        panel = _make_heterogeneous_trends_panel(rng, n_per_cohort=80, sigma=0.05)
+        res = WooldridgeDiD(method="ols", cohort_trends=True).fit(
+            panel, outcome="y", unit="unit", time="time", cohort="cohort"
+        )
+        res.aggregate("group")
+        assert res.group_effects is not None
+        finite_count = 0
+        for g, eff in res.group_effects.items():
+            if np.isfinite(eff["att"]):
+                assert np.isfinite(eff["se"]) and eff["se"] > 0
+                finite_count += 1
+        assert finite_count >= 1
+
+    def test_cohort_trends_true_plus_aggregate_calendar(self) -> None:
+        """CI R1 P1 fix: ``cohort_trends=True`` + ``aggregate('calendar')`` runs cleanly."""
+        rng = np.random.default_rng(_BASE_SEED_SECTION8 + 14)
+        panel = _make_heterogeneous_trends_panel(rng, n_per_cohort=80, sigma=0.05)
+        res = WooldridgeDiD(method="ols", cohort_trends=True).fit(
+            panel, outcome="y", unit="unit", time="time", cohort="cohort"
+        )
+        res.aggregate("calendar")
+        assert res.calendar_effects is not None
+        finite_count = 0
+        for t, eff in res.calendar_effects.items():
+            if np.isfinite(eff["att"]):
+                assert np.isfinite(eff["se"]) and eff["se"] > 0
+                finite_count += 1
+        assert finite_count >= 1
+
+    def test_plot_event_study_propagates_weights_kwarg(self) -> None:
+        """CI R1 P1 fix: ``plot_event_study(weights=...)`` propagates through aggregate().
+
+        Before the fix, ``plot_event_study()`` hardcoded
+        ``aggregate("event")`` (cell weights) so the new opt-in
+        ``weights="cohort_share"`` surface was unreachable from the
+        plot wrapper. Verifies the kwarg is plumbed through and that
+        the resulting ``event_study_effects`` reflects the requested
+        scheme (specifically, the k>=0 restriction Stage 4 added on
+        the cohort_share event path).
+        """
+        from unittest.mock import patch
+
+        rng = np.random.default_rng(_BASE_SEED_SECTION8 + 15)
+        # Use never_treated + OLS to expose k<0 placebo cells in the
+        # default cell-weighted event aggregation; the cohort_share
+        # re-aggregation must restrict to k>=0.
+        panel = _make_three_cohort_four_period_panel(rng, n_per_cohort=80, sigma=0.05)
+        with warnings.catch_warnings():
+            warnings.filterwarnings("ignore", category=UserWarning)
+            res = WooldridgeDiD(method="ols", control_group="never_treated").fit(
+                panel, outcome="y", unit="unit", time="time", cohort="cohort"
+            )
+        # Default plot — uses weights="cell"
+        with patch("diff_diff.visualization.plot_event_study") as mock_plot:
+            res.plot_event_study()
+        assert mock_plot.call_count == 1
+        assert res.event_study_effects is not None
+        cell_event_keys = sorted(res.event_study_effects.keys())
+        assert any(k < 0 for k in cell_event_keys), (
+            "DGP precondition: never_treated + OLS should expose k<0 "
+            "placebo cells under default cell weighting"
+        )
+        # Plot under weights="cohort_share" — should re-aggregate +
+        # restrict to k>=0 (paper Eq. 7.6 scope)
+        with warnings.catch_warnings():
+            warnings.filterwarnings("ignore", category=UserWarning)
+            with patch("diff_diff.visualization.plot_event_study") as mock_plot:
+                res.plot_event_study(weights="cohort_share")
+        assert mock_plot.call_count == 1
+        assert res.event_study_effects is not None
+        cohort_share_keys = sorted(res.event_study_effects.keys())
+        assert all(k >= 0 for k in cohort_share_keys), (
+            f"plot_event_study(weights='cohort_share') must restrict to "
+            f"k>=0 per paper Eq. 7.6 scope; got {cohort_share_keys}"
+        )
+        # The cell-weighted and cohort_share-weighted event_study_effects
+        # have different key sets (cell includes k<0 placebos; cohort_share
+        # restricts to k>=0). This proves the kwarg is propagated.
+        assert set(cohort_share_keys) != set(cell_event_keys), (
+            "plot_event_study(weights='cohort_share') should produce a "
+            "different event_study_effects key set than the default "
+            "(cell weights) — keys should differ on the k<0 placebo "
+            "leads."
+        )
+
     def test_cohort_trends_true_plus_bootstrap_preserves_bootstrap_se(self) -> None:
         """R5 P1 fix: ``cohort_trends=True`` + ``n_bootstrap > 0`` runs cleanly.