Skip to content

Commit 379e065

Browse files
committed
wooldridge: CI R1 fixes — plot_event_study weights propagation + cohort_trends × group/calendar coverage + TODO row
1 parent 58331d6 commit 379e065

3 files changed

Lines changed: 121 additions & 3 deletions

File tree

TODO.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ Deferred items from PR reviews that were not addressed before merge.
9696
| WooldridgeDiD: response-scale APE / log-link coefficient bridge for R `etwfe(family="poisson")` + `etwfe(family="logit")` cell-level numerical parity. diff-diff `WooldridgeDiD(method="poisson"\|"logit")` returns ATT on the response scale (counterfactual μ_1 − μ_0 / p_1 − p_0 per paper W2023 ASF / APE framework); R `etwfe` returns the cell-level log-link coefficient. PR-B Stage D ships log-link goldens at `benchmarks/data/wooldridge_golden.json` and surface tests (fit completes + goldens well-formed); cell-level numerical parity requires either `emfx()`-based APE extraction on the R side or link-function inversion with baseline-mean adjustment. | `benchmarks/R/generate_wooldridge_golden.R`, `tests/test_methodology_wooldridge.py::TestWooldridgeParityRPoisson/TestWooldridgeParityRLogit` | PR-B follow-up | Medium |
9797
| WooldridgeDiD: design-consistent cohort totals for `aggregate(weights="cohort_share")` on survey-weighted fits. Current impl populates `_n_g_per_cohort` from `unit.nunique()` (raw counts); composing these unweighted cohort shares with the design-weighted ATTs targets a mixed estimand inconsistent with paper W2025 Section 7's design-population cohort-share form. PR-B Stage E fail-closes the surface (raises `ValueError` when `survey_design is not None`); the follow-up implements survey-weighted unit totals per cohort and re-enables the surface. | `wooldridge.py` `_n_g_per_cohort` population, `wooldridge_results.py::aggregate` survey gate | PR-B follow-up | Medium |
9898
| WooldridgeDiD: unconditional inference for `aggregate(weights="cohort_share")` accounting for sampling uncertainty in the cohort shares ω̂_g / ω̂_{ge} (paper W2025 Section 7.5). Current impl fail-closes the t-stat / p-value / conf-int fields to NaN under cohort-share aggregation because the analytical SE is conditional-on-shares. Proper APE/GMM-style aggregate inference (Wooldridge 2023 Section 4 framework) re-enables full inference. | `wooldridge_results.py::aggregate` cohort_share inference branch | PR-B follow-up | Medium |
99+
| WooldridgeDiD: `cohort_trends=True` + `survey_design` composition. PR-B Stage E fail-closes the cross-product with `NotImplementedError` at `fit()` because the full-dummy `dg_i · t` design composed with the survey TSL variance hasn't been validated against R-parity goldens. Follow-up: validate the composition (or implement a survey-aware alternative) and re-enable the surface. | `wooldridge.py` fit guard, `wooldridge_results.py::aggregate` (if survey-aware cohort_trends variance plumbing is added) | PR-B follow-up | Low |
99100
| WooldridgeDiD: optional *efficiency hint* (NOT a canonical-link violation per W2023 Prop 3.1) when method/outcome pairing is sub-optimal — e.g., `method="ols"` on binary data is consistent under QMLE, but `method="logit"` is typically more efficient. The original framing in this row as a "canonical link requirement" tied to Prop 3.1 was incorrect: Wooldridge (2023) Table 1 lists Gaussian/OLS for "any response" and logistic-Bernoulli for "binary OR fractional". A useful hint exists (efficiency), but should not be framed as a methodology violation. See PR #453 R1 review for the corrected reading. | `wooldridge.py` | #216 | Low |
100101
| WooldridgeDiD: Stata `jwdid` golden value tests — add R/Stata reference script and `TestReferenceValues` class. | `tests/test_wooldridge.py` | #216 | Medium |
101102
<!-- The PreTrendsPower R parity row (PR-C, 2026-05-19) and the four PR-A-tagged PreTrendsPower rows (CS/SA Σ_22 fidelity, helper `violation_weights`, custom-weight persistence, linear γ-unit MDV; resolved in PR-B 2026-05-18) are all closed — see CHANGELOG.md [Unreleased] Added/Changed/Fixed entries for the new behavior. -->

diff_diff/wooldridge_results.py

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -693,10 +693,33 @@ def to_dataframe(self, aggregation: str = "event") -> pd.DataFrame:
693693
rows = mapping.get(aggregation, [])
694694
return pd.DataFrame(rows)
695695

696-
def plot_event_study(self, **kwargs) -> None:
697-
"""Event study plot. Calls aggregate('event') if needed."""
696+
def plot_event_study(self, weights: str = "cell", **kwargs) -> None:
697+
"""Event study plot. Calls ``aggregate('event', weights=weights)`` if needed.
698+
699+
Parameters
700+
----------
701+
weights : "cell" | "cohort_share", default "cell"
702+
Aggregation weighting scheme threaded into the underlying
703+
``aggregate("event", ...)`` call. ``"cohort_share"`` produces
704+
paper W2025 Eq. 7.6 cohort-share-by-exposure weights
705+
(post-treatment ``k >= 0`` only); inference fields are
706+
fail-closed to NaN per the Section 7.5 conditional-on-shares
707+
contract documented in REGISTRY.
708+
**kwargs
709+
Forwarded to ``diff_diff.visualization.plot_event_study``.
710+
"""
711+
# Recompute under the active weighting scheme if the cached
712+
# event_study_effects was built under a different scheme — or
713+
# has not been built yet. Aggregating under "cell" then under
714+
# "cohort_share" (or vice versa) replaces ``event_study_effects``
715+
# in place per the existing aggregate() contract.
698716
if self.event_study_effects is None:
699-
self.aggregate("event")
717+
self.aggregate("event", weights=weights)
718+
elif weights == "cohort_share":
719+
# Force re-aggregation so the cohort-share contract is
720+
# honored from a wrapper call that may have been preceded
721+
# by an aggregate("event", weights="cell") at fit time.
722+
self.aggregate("event", weights=weights)
700723
from diff_diff.visualization import plot_event_study # type: ignore
701724

702725
effects = {k: v["att"] for k, v in (self.event_study_effects or {}).items()}

tests/test_methodology_wooldridge.py

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1375,6 +1375,100 @@ def test_cohort_trends_true_rejects_survey_design(self) -> None:
13751375
survey_design=survey,
13761376
)
13771377

1378+
def test_cohort_trends_true_plus_aggregate_group(self) -> None:
1379+
"""CI R1 P1 fix: ``cohort_trends=True`` + ``aggregate('group')`` runs cleanly.
1380+
1381+
Closes the parameter-interaction coverage gap codex flagged:
1382+
cohort_trends was only tested with event and simple
1383+
aggregations. The group aggregation operates on per-cohort
1384+
cells; cohort-trend columns are excluded by construction.
1385+
"""
1386+
rng = np.random.default_rng(_BASE_SEED_SECTION8 + 13)
1387+
panel = _make_heterogeneous_trends_panel(rng, n_per_cohort=80, sigma=0.05)
1388+
res = WooldridgeDiD(method="ols", cohort_trends=True).fit(
1389+
panel, outcome="y", unit="unit", time="time", cohort="cohort"
1390+
)
1391+
res.aggregate("group")
1392+
assert res.group_effects is not None
1393+
finite_count = 0
1394+
for g, eff in res.group_effects.items():
1395+
if np.isfinite(eff["att"]):
1396+
assert np.isfinite(eff["se"]) and eff["se"] > 0
1397+
finite_count += 1
1398+
assert finite_count >= 1
1399+
1400+
def test_cohort_trends_true_plus_aggregate_calendar(self) -> None:
1401+
"""CI R1 P1 fix: ``cohort_trends=True`` + ``aggregate('calendar')`` runs cleanly."""
1402+
rng = np.random.default_rng(_BASE_SEED_SECTION8 + 14)
1403+
panel = _make_heterogeneous_trends_panel(rng, n_per_cohort=80, sigma=0.05)
1404+
res = WooldridgeDiD(method="ols", cohort_trends=True).fit(
1405+
panel, outcome="y", unit="unit", time="time", cohort="cohort"
1406+
)
1407+
res.aggregate("calendar")
1408+
assert res.calendar_effects is not None
1409+
finite_count = 0
1410+
for t, eff in res.calendar_effects.items():
1411+
if np.isfinite(eff["att"]):
1412+
assert np.isfinite(eff["se"]) and eff["se"] > 0
1413+
finite_count += 1
1414+
assert finite_count >= 1
1415+
1416+
def test_plot_event_study_propagates_weights_kwarg(self) -> None:
1417+
"""CI R1 P1 fix: ``plot_event_study(weights=...)`` propagates through aggregate().
1418+
1419+
Before the fix, ``plot_event_study()`` hardcoded
1420+
``aggregate("event")`` (cell weights) so the new opt-in
1421+
``weights="cohort_share"`` surface was unreachable from the
1422+
plot wrapper. Verifies the kwarg is plumbed through and that
1423+
the resulting ``event_study_effects`` reflects the requested
1424+
scheme (specifically, the k>=0 restriction Stage 4 added on
1425+
the cohort_share event path).
1426+
"""
1427+
from unittest.mock import patch
1428+
1429+
rng = np.random.default_rng(_BASE_SEED_SECTION8 + 15)
1430+
# Use never_treated + OLS to expose k<0 placebo cells in the
1431+
# default cell-weighted event aggregation; the cohort_share
1432+
# re-aggregation must restrict to k>=0.
1433+
panel = _make_three_cohort_four_period_panel(rng, n_per_cohort=80, sigma=0.05)
1434+
with warnings.catch_warnings():
1435+
warnings.filterwarnings("ignore", category=UserWarning)
1436+
res = WooldridgeDiD(method="ols", control_group="never_treated").fit(
1437+
panel, outcome="y", unit="unit", time="time", cohort="cohort"
1438+
)
1439+
# Default plot — uses weights="cell"
1440+
with patch("diff_diff.visualization.plot_event_study") as mock_plot:
1441+
res.plot_event_study()
1442+
assert mock_plot.call_count == 1
1443+
assert res.event_study_effects is not None
1444+
cell_event_keys = sorted(res.event_study_effects.keys())
1445+
assert any(k < 0 for k in cell_event_keys), (
1446+
"DGP precondition: never_treated + OLS should expose k<0 "
1447+
"placebo cells under default cell weighting"
1448+
)
1449+
# Plot under weights="cohort_share" — should re-aggregate +
1450+
# restrict to k>=0 (paper Eq. 7.6 scope)
1451+
with warnings.catch_warnings():
1452+
warnings.filterwarnings("ignore", category=UserWarning)
1453+
with patch("diff_diff.visualization.plot_event_study") as mock_plot:
1454+
res.plot_event_study(weights="cohort_share")
1455+
assert mock_plot.call_count == 1
1456+
assert res.event_study_effects is not None
1457+
cohort_share_keys = sorted(res.event_study_effects.keys())
1458+
assert all(k >= 0 for k in cohort_share_keys), (
1459+
f"plot_event_study(weights='cohort_share') must restrict to "
1460+
f"k>=0 per paper Eq. 7.6 scope; got {cohort_share_keys}"
1461+
)
1462+
# The cell-weighted and cohort_share-weighted event_study_effects
1463+
# have different key sets (cell includes k<0 placebos; cohort_share
1464+
# restricts to k>=0). This proves the kwarg is propagated.
1465+
assert set(cohort_share_keys) != set(cell_event_keys), (
1466+
"plot_event_study(weights='cohort_share') should produce a "
1467+
"different event_study_effects key set than the default "
1468+
"(cell weights) — keys should differ on the k<0 placebo "
1469+
"leads."
1470+
)
1471+
13781472
def test_cohort_trends_true_plus_bootstrap_preserves_bootstrap_se(self) -> None:
13791473
"""R5 P1 fix: ``cohort_trends=True`` + ``n_bootstrap > 0`` runs cleanly.
13801474

0 commit comments

Comments
 (0)