Skip to content

Commit 3e98f26

Browse files
igerberclaude
andcommitted
Address CI review: invalidate confidence-set cache on placebo rebuild
CI codex (gpt-5.5) findings on PR #527: - P1: confidence_set() caches effect_confidence_set / _confidence_set_df against the CURRENT in-space placebo reference set, but a later explicit in_space_placebo() rebuild (which _require_placebo_reference suggests via n_starts) overwrote the reference without invalidating the cache -> a stale set could be reported by summary()/to_dict()/_scm_native. Now clear both at the start of in_space_placebo() (after the snapshot check) so every rebuild drops the stale cache. - P2: add a regression test (confidence_set -> in_space_placebo(n_starts=) -> assert effect_confidence_set is None, get_confidence_set_df() raises, DR status "not_run"). - P3: update the Firpo-Possebom review intro from "forthcoming PR-B" to shipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 5080a25 commit 3e98f26

3 files changed

Lines changed: 30 additions & 2 deletions

File tree

diff_diff/synthetic_control_results.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -721,6 +721,13 @@ def in_space_placebo(
721721
from diff_diff.synthetic_control import _floored_pre_mspe, _mspe, _placebo_fit_unit
722722

723723
snap = self._fit_snapshot
724+
# A rebuilt placebo reference set invalidates any previously computed confidence set
725+
# (test_sharp_null / confidence_set re-rank against THIS reference set), so drop the
726+
# cached confidence-set outputs up front — a stale set must never be reported after an
727+
# explicit in_space_placebo() re-run (e.g. with a different n_starts). The snapshot
728+
# check above has already passed, so the reference IS about to be rebuilt on every exit.
729+
self.effect_confidence_set = None
730+
self._confidence_set_df = None
724731
donors = list(snap.donor_ids)
725732
n_donors = len(donors)
726733
if n_starts is None:

docs/methodology/papers/firpo-possebom-2018-review.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,13 @@
55
**PDF reviewed:** https://doi.org/10.1515/jci-2016-0026 (published *Journal of Causal Inference* version, open access; received 15 Nov 2016, revised 6 Aug 2018, accepted 11 Aug 2018, 26 pp). Per the project's PDFs-never-committed convention the local PDF is kept outside the repository; the published J. Causal Inference version (DOI 10.1515/jci-2016-0026) is the authoritative source. All equation, section, and footnote numbers below are pinned to that version.
66
**Review date:** 2026-06-01
77

8-
> Scope note: this paper extends the **permutation / placebo inference** procedure of Abadie, Diamond & Hainmueller (the SCM benchmark) in two ways — (1) a **sensitivity analysis** that parametrically re-weights the placebo p-value away from the equal-weights benchmark, and (2) testing **any sharp null hypothesis** (not only "no effect whatsoever") via a modified RMSPE statistic, which it **inverts to construct confidence sets** for the treatment-effect path. It also generalizes to arbitrary test statistics, multiple outcomes (familywise error control), and multiple treated units (a pooled effect). This review is the **Step-1 fidelity artifact** for a forthcoming SCM **confidence-set / CI-by-test-inversion** implementation (PR-B) layered on the existing `SyntheticControl` estimator; the sensitivity-analysis and multiple-outcome / multiple-treated extensions are documented here but flagged **deferred**. The estimator itself (donor weights `W`, predictor importance `V`) is taken as given from ADH 2010/2015 — already implemented as `SyntheticControl` — and is recapped only as the paper frames it. Nothing here is sourced from outside this paper.
8+
> Scope note: this paper extends the **permutation / placebo inference** procedure of Abadie, Diamond & Hainmueller (the SCM benchmark) in two ways — (1) a **sensitivity analysis** that parametrically re-weights the placebo p-value away from the equal-weights benchmark, and (2) testing **any sharp null hypothesis** (not only "no effect whatsoever") via a modified RMSPE statistic, which it **inverts to construct confidence sets** for the treatment-effect path. It also generalizes to arbitrary test statistics, multiple outcomes (familywise error control), and multiple treated units (a pooled effect). This review is the **Step-1 fidelity artifact** for the SCM **confidence-set / CI-by-test-inversion** implementation (PR-B, **shipped** — `SyntheticControlResults.test_sharp_null()` / `confidence_set()`) layered on the existing `SyntheticControl` estimator; the sensitivity-analysis and multiple-outcome / multiple-treated extensions are documented here but flagged **deferred**. The estimator itself (donor weights `W`, predictor importance `V`) is taken as given from ADH 2010/2015 — already implemented as `SyntheticControl` — and is recapped only as the paper frames it. Nothing here is sourced from outside this paper.
99

1010
---
1111

1212
## Methodology Registry Entry
1313

14-
*Formatted to match docs/methodology/REGISTRY.md. This documents an **inference procedure on the existing `SyntheticControl` estimator**, not a new estimator — the `## SyntheticControl` heading mirrors `abadie-2021-review.md`. The REGISTRY implementation contract (`docs/methodology/REGISTRY.md` §SyntheticControl) is unchanged by this docs-only PR-A; PR-B will add the confidence-set methodology subsection and flip the relevant checklist items.*
14+
*Formatted to match docs/methodology/REGISTRY.md. This documents an **inference procedure on the existing `SyntheticControl` estimator**, not a new estimator — the `## SyntheticControl` heading mirrors `abadie-2021-review.md`. The REGISTRY implementation contract (`docs/methodology/REGISTRY.md` §SyntheticControl) was unchanged by the docs-only PR-A; PR-B (shipped) added the confidence-set methodology subsection there and flipped the relevant checklist items below.*
1515

1616
## SyntheticControl
1717

tests/test_methodology_synthetic_control.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3426,6 +3426,27 @@ def test_get_confidence_set_df_requires_run():
34263426
res.get_confidence_set_df()
34273427

34283428

3429+
def test_in_space_placebo_rerun_invalidates_confidence_set():
3430+
# CI-review P1: a confidence set is computed against the CURRENT placebo reference set,
3431+
# so an explicit in_space_placebo() rebuild (which _require_placebo_reference even
3432+
# suggests, via n_starts) must INVALIDATE the cached set rather than report a stale one.
3433+
res = _exact_combo_fit(effect=3.0)
3434+
with warnings.catch_warnings():
3435+
warnings.simplefilter("ignore")
3436+
res.confidence_set(family="constant", gamma=0.25)
3437+
assert res.effect_confidence_set is not None
3438+
native = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"]
3439+
assert native["confidence_set"]["status"] == "ran"
3440+
with warnings.catch_warnings():
3441+
warnings.simplefilter("ignore")
3442+
res.in_space_placebo(n_starts=2) # rebuild the reference set
3443+
assert res.effect_confidence_set is None
3444+
with pytest.raises(ValueError, match="No confidence set"):
3445+
res.get_confidence_set_df()
3446+
native2 = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"]
3447+
assert native2["confidence_set"]["status"] == "not_run"
3448+
3449+
34293450
def test_confidence_set_too_few_donors_raises():
34303451
# One donor -> in_space_placebo cannot form a reference set -> CI / test raise.
34313452
df, years, T0 = _make_panel(n_donors=1, T=10, T0=6)

0 commit comments

Comments
 (0)