Address CI review: invalidate confidence-set cache on placebo rebuild

igerber · claude · igerber · commit 3e98f262a1be · 2026-06-02T08:16:58.000-04:00
CI codex (gpt-5.5) findings on PR #527: - P1: confidence_set() caches effect_confidence_set / _confidence_set_df against the CURRENT in-space placebo reference set, but a later explicit in_space_placebo() rebuild (which _require_placebo_reference suggests via n_starts) overwrote the reference without invalidating the cache -> a stale set could be reported by summary()/to_dict()/_scm_native. Now clear both at the start of in_space_placebo() (after the snapshot check) so every rebuild drops the stale cache. - P2: add a regression test (confidence_set -> in_space_placebo(n_starts=) -> assert effect_confidence_set is None, get_confidence_set_df() raises, DR status "not_run"). - P3: update the Firpo-Possebom review intro from "forthcoming PR-B" to shipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
diff --git a/diff_diff/synthetic_control_results.py b/diff_diff/synthetic_control_results.py
@@ -721,6 +721,13 @@ def in_space_placebo(
         from diff_diff.synthetic_control import _floored_pre_mspe, _mspe, _placebo_fit_unit
 
         snap = self._fit_snapshot
+        # A rebuilt placebo reference set invalidates any previously computed confidence set
+        # (test_sharp_null / confidence_set re-rank against THIS reference set), so drop the
+        # cached confidence-set outputs up front — a stale set must never be reported after an
+        # explicit in_space_placebo() re-run (e.g. with a different n_starts). The snapshot
+        # check above has already passed, so the reference IS about to be rebuilt on every exit.
+        self.effect_confidence_set = None
+        self._confidence_set_df = None
         donors = list(snap.donor_ids)
         n_donors = len(donors)
         if n_starts is None:
diff --git a/docs/methodology/papers/firpo-possebom-2018-review.md b/docs/methodology/papers/firpo-possebom-2018-review.md
@@ -5,13 +5,13 @@
 **PDF reviewed:** https://doi.org/10.1515/jci-2016-0026 (published *Journal of Causal Inference* version, open access; received 15 Nov 2016, revised 6 Aug 2018, accepted 11 Aug 2018, 26 pp). Per the project's PDFs-never-committed convention the local PDF is kept outside the repository; the published J. Causal Inference version (DOI 10.1515/jci-2016-0026) is the authoritative source. All equation, section, and footnote numbers below are pinned to that version.
 **Review date:** 2026-06-01
 
-> Scope note: this paper extends the **permutation / placebo inference** procedure of Abadie, Diamond & Hainmueller (the SCM benchmark) in two ways — (1) a **sensitivity analysis** that parametrically re-weights the placebo p-value away from the equal-weights benchmark, and (2) testing **any sharp null hypothesis** (not only "no effect whatsoever") via a modified RMSPE statistic, which it **inverts to construct confidence sets** for the treatment-effect path. It also generalizes to arbitrary test statistics, multiple outcomes (familywise error control), and multiple treated units (a pooled effect). This review is the **Step-1 fidelity artifact** for a forthcoming SCM **confidence-set / CI-by-test-inversion** implementation (PR-B) layered on the existing `SyntheticControl` estimator; the sensitivity-analysis and multiple-outcome / multiple-treated extensions are documented here but flagged **deferred**. The estimator itself (donor weights `W`, predictor importance `V`) is taken as given from ADH 2010/2015 — already implemented as `SyntheticControl` — and is recapped only as the paper frames it. Nothing here is sourced from outside this paper.
+> Scope note: this paper extends the **permutation / placebo inference** procedure of Abadie, Diamond & Hainmueller (the SCM benchmark) in two ways — (1) a **sensitivity analysis** that parametrically re-weights the placebo p-value away from the equal-weights benchmark, and (2) testing **any sharp null hypothesis** (not only "no effect whatsoever") via a modified RMSPE statistic, which it **inverts to construct confidence sets** for the treatment-effect path. It also generalizes to arbitrary test statistics, multiple outcomes (familywise error control), and multiple treated units (a pooled effect). This review is the **Step-1 fidelity artifact** for the SCM **confidence-set / CI-by-test-inversion** implementation (PR-B, **shipped** — `SyntheticControlResults.test_sharp_null()` / `confidence_set()`) layered on the existing `SyntheticControl` estimator; the sensitivity-analysis and multiple-outcome / multiple-treated extensions are documented here but flagged **deferred**. The estimator itself (donor weights `W`, predictor importance `V`) is taken as given from ADH 2010/2015 — already implemented as `SyntheticControl` — and is recapped only as the paper frames it. Nothing here is sourced from outside this paper.
 
 ---
 
 ## Methodology Registry Entry
 
-*Formatted to match docs/methodology/REGISTRY.md. This documents an **inference procedure on the existing `SyntheticControl` estimator**, not a new estimator — the `## SyntheticControl` heading mirrors `abadie-2021-review.md`. The REGISTRY implementation contract (`docs/methodology/REGISTRY.md` §SyntheticControl) is unchanged by this docs-only PR-A; PR-B will add the confidence-set methodology subsection and flip the relevant checklist items.*
+*Formatted to match docs/methodology/REGISTRY.md. This documents an **inference procedure on the existing `SyntheticControl` estimator**, not a new estimator — the `## SyntheticControl` heading mirrors `abadie-2021-review.md`. The REGISTRY implementation contract (`docs/methodology/REGISTRY.md` §SyntheticControl) was unchanged by the docs-only PR-A; PR-B (shipped) added the confidence-set methodology subsection there and flipped the relevant checklist items below.*
 
 ## SyntheticControl
 
diff --git a/tests/test_methodology_synthetic_control.py b/tests/test_methodology_synthetic_control.py
@@ -3426,6 +3426,27 @@ def test_get_confidence_set_df_requires_run():
         res.get_confidence_set_df()
 
 
+def test_in_space_placebo_rerun_invalidates_confidence_set():
+    # CI-review P1: a confidence set is computed against the CURRENT placebo reference set,
+    # so an explicit in_space_placebo() rebuild (which _require_placebo_reference even
+    # suggests, via n_starts) must INVALIDATE the cached set rather than report a stale one.
+    res = _exact_combo_fit(effect=3.0)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res.confidence_set(family="constant", gamma=0.25)
+    assert res.effect_confidence_set is not None
+    native = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"]
+    assert native["confidence_set"]["status"] == "ran"
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res.in_space_placebo(n_starts=2)  # rebuild the reference set
+    assert res.effect_confidence_set is None
+    with pytest.raises(ValueError, match="No confidence set"):
+        res.get_confidence_set_df()
+    native2 = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"]
+    assert native2["confidence_set"]["status"] == "not_run"
+
+
 def test_confidence_set_too_few_donors_raises():
     # One donor -> in_space_placebo cannot form a reference set -> CI / test raise.
     df, years, T0 = _make_panel(n_donors=1, T=10, T0=6)