igerber
diff --git a/‎CHANGELOG.md‎
Lines changed: 17 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 17 additions & 0 deletions
diff --git a/‎TODO.md‎
Lines changed: 0 additions & 4 deletions b/‎TODO.md‎
Lines changed: 0 additions & 4 deletions
diff --git a/‎diff_diff/guides/llms-full.txt‎
Lines changed: 2 additions & 1 deletion b/‎diff_diff/guides/llms-full.txt‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎diff_diff/results.py‎
Lines changed: 46 additions & 9 deletions b/‎diff_diff/results.py‎
Lines changed: 46 additions & 9 deletions
@@ -10,6 +10,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Added
 - **`SyntheticControl` conformal inference (Chernozhukov, Wüthrich & Zhu 2021, *JASA* 116(536)).** Three opt-in `SyntheticControlResults` methods give valid p-values for the post-period effect trajectory and pointwise confidence intervals — what the in-space placebo / Firpo-Possebom test-inversion paths cannot. Unlike the Firpo path (which re-ranks the cross-unit placebo gaps), the conformal layer fits its **own** time-permutation-invariant constrained-LS synthetic-control proxy (CWZ §2.3 eqs 3–4 — simplex weights on raw outcomes over **all** periods under the null, no `V`-matrix, no intercept) and permutes residuals **over time** for the single treated unit (CWZ's exactness theory requires a time-symmetric proxy, which the headline ADH `V`-matrix fit is not). **`conformal_test(effect, q=1, scheme="moving_block", n_iid=10000, seed=None)`** computes the joint sharp-null permutation p-value (eqs 1–2) of `S_q(û) = ((1/√T*)·Σ_{t>T0}|û_t|^q)^{1/q}` (`q ∈ {1, 2, ∞}`); the proxy is fit once and only residuals are permuted (footnote 7). **`conformal_confidence_intervals(alpha=0.1, scheme="moving_block", bounds=None, n_grid=100, seed=None)`** returns pointwise per-period CIs by test inversion (Algorithm 1 — each period `t` uses `Z = (pre-periods, t)` with the other post-periods dropped, a clean `T*=1` test). **`conformal_average_effect(alpha=0.1, scheme="moving_block", bounds=None, n_grid=200, seed=None)`** returns a CI for the average post-period effect by collapsing the panel into non-overlapping `T*`-blocks and permuting the block residuals (Appendix A.1). Permutation schemes: `"moving_block"` (`Π_→` cyclic shifts, valid under serial dependence — the default) and `"iid"` (`Π_all`, sampled, finer p-values); both include the identity so the p-value floor is `1/|Π|` (no extra `+1`). Fail-closed handling for `<1` donor / unpickled result / non-finite panel / non-converged grid points (treated as indeterminate, not rejected) / grid-limited / empty / unbounded sets; a single donor and `T*≥T0` warn. Surfaced under `conformal_inference` / `get_conformal_grid_df()` and `DiagnosticReport`'s `estimator_native_diagnostics`; the analytical `se`/`t_stat`/`p_value`/`conf_int`/`is_significant` stay NaN throughout. Core in the new `diff_diff/conformal.py` (reuses the Frank-Wolfe simplex solver). *Deferred:* one-sided variants (§7), covariates folded into the proxy, and the AR/innovation-permutation path (Lemmas 5–7).
 
+### Changed
+- **`SyntheticDiDResults.placebo_effects` renamed to `variance_effects`.** The
+  array's contents are method-specific — placebo treatment effects
+  (`variance_method="placebo"`), per-draw bootstrap ATT estimates
+  (`"bootstrap"`), or leave-one-out estimates (`"jackknife"`) — so the old name
+  was misleading; the `variance_method` field disambiguates the contents. Read
+  `result.variance_effects` going forward.
+
+### Deprecated
+- **`SyntheticDiDResults.placebo_effects`** is now a read-only alias for
+  `variance_effects` that emits a `DeprecationWarning` on access; it will be
+  removed in v4.0.0. The alias is a property, not a dataclass field, so it is
+  read-only (assignment raises `AttributeError`) and
+  `dataclasses.replace(result, placebo_effects=...)` no longer works /
+  `dataclasses.asdict(result)` now emits the `variance_effects` key — use
+  `variance_effects`.
+
 ## [3.5.1] - 2026-06-02
 
 ### Added
 
@@ -175,7 +175,6 @@ Deferred items from PR reviews that were not addressed before merge.
 | R comparison tests spawn separate `Rscript` per test (slow CI) | `tests/test_methodology_twfe.py:294` | #139 | Low |
 | CS R helpers hard-code `xformla = ~ 1`; no covariate-adjusted R benchmark for IRLS path | `tests/test_methodology_callaway.py` | #202 | Low |
 | Validating the `.txt` AI guides (`diff_diff/guides/llms-full.txt`, `llms-practitioner.txt`) as executable snippets is **not low-lift** (re-scoped 2026-06-01): of their ~112 fenced Python blocks only ~20% are standalone-runnable — the rest are API-signature references (`Foo(param: type = default)` pseudo-signatures that are `SyntaxError` by design), context fragments (e.g. `results.att` on an undefined `results`), or dataset-shape-specific blocks. The guides are reference documentation, not runnable examples; a real implementation needs signature-block detection + a context/data skip-allowlist + per-snippet fixtures (multi-round curation), unlike the curated `.rst` files the existing smoke test covers. | `tests/test_doc_snippets.py` | #239 | Low |
-| SyntheticDiD: rename internal `placebo_effects` variable to `variance_effects` (or `resampled_effects`). Misleading name across the placebo/bootstrap/jackknife dispatch paths — holds three different contents depending on variance method. Low-risk refactor; user-facing field rename should preserve `placebo_effects` as a deprecated alias for one release. | `synthetic_did.py`, `results.py` | follow-up | Medium |
 | `TestWorkflowDoesNotExecutePRHeadCode` (CodeQL #14 dismissal guard) does not model: `bash <script>` / `sh <script>` / `./<script>` / `source <script>` / `. <script>` direct shell-script execution; multi-line `python3 -c` bodies (line-by-line shlex can't reassemble across newlines — the workflow's 5 sanitizer bodies are exempt by invisibility); shell-variable-expansion indirection (`SCRIPT="$X"; python3 "$SCRIPT"`); `eval`; `find -exec`; `xargs -I {}`. Each represents a path by which PR-head bytes COULD execute without the test failing. The guard catches accidental regressions of common forms (16 tests covering pip/npm/cargo/maturin/etc. installs, python file exec, bash -c indirection with compound flags, env-var prefixes, line continuations, subshells/brace groups, single-line python -c, write-overwrites of allowlisted /tmp paths). Closing the residuals would require multi-line shell parsing with command-substitution awareness + script-execution allowlists — significant work for diminishing return given the dismissal's primary defense is the documented threat model on the alert and in `.github/workflows/ai_pr_review.yml` comment block. | `tests/test_openai_review.py`, `.github/workflows/ai_pr_review.yml` | #436 | Low |
 | Render `docs/methodology/REPORTING.md` and `docs/methodology/REGISTRY.md` as in-site Sphinx pages so cross-references can use `:doc:` instead of off-site GitHub `blob/main` URLs. Current state (#410 fix-audit-r2) restores navigable links via `blob/main`, but stable-docs readers can land on a different revision than the package version they are reading. Two viable paths: (a) add `myst-parser` to `docs/conf.py` extensions + docs extras and link with `:doc:`, or (b) convert both files to `.rst`. | `docs/conf.py`, `docs/api/business_report.rst`, `docs/api/diagnostic_report.rst`, `docs/tutorials/18_geo_experiments.ipynb`, `docs/tutorials/19_dcdh_marketing_pulse.ipynb` | follow-up | Low |
 | ImputationDiD methodology validation (PR-B): add `tests/test_methodology_imputation.py` with paper-equation-numbered Verified Components (Theorems 1-3, eqs. 5-9, Props. 5/9) and an R `didimputation` parity fixture (none on file). Flips the METHODOLOGY_REVIEW.md row to Complete. | `tests/test_methodology_imputation.py` | imputation-validation (PR-B) | Medium |
@@ -190,11 +189,8 @@ Ordered paydown view across the tables above. Tier A → D is by effort × risk,
 
 _(No active items. The sole prior entry — the WooldridgeDiD method/outcome efficiency hint — has shipped; see CHANGELOG `## [Unreleased]` and REGISTRY §WooldridgeDiD "Nonlinear extensions".)_
 
-(SyntheticDiD `placebo_effects` → `variance_effects` rename moved to Tier B — the user-facing field rename + one-release deprecation alias is too large for ≤1 day / ≤3 CI rounds.)
-
 #### Tier B — Mid-size methodology (5-10 CI rounds expected, per memory cascade priors)
 
-- SyntheticDiD: rename internal `placebo_effects` → `variance_effects` AND public `placebo_effects` field with deprecation alias retained for one release (`synthetic_did.py`, `results.py`)
 - StaggeredTripleDifference R parity: commit CSV fixtures + add covariate-adjusted scenarios + aggregation-SE assertions (`tests/test_methodology_staggered_triple_diff.py`, `benchmarks/R/benchmark_staggered_triplediff.R`)
 - StaggeredTripleDifference: per-cohort group-effect SE WIF override for exact R `triplediff` match (`staggered_triple_diff.py`)
 - WooldridgeDiD: QMLE Stata-parity `qmle` weight type + Stata golden values (`wooldridge.py`, `linalg.py`, `tests/test_wooldridge.py`)
 
@@ -1263,6 +1263,7 @@ Returned by `SyntheticDiD.fit()`.
 | `pre_periods` | `list` | Pre-treatment periods |
 | `post_periods` | `list` | Post-treatment periods |
 | `variance_method` | `str` | "bootstrap", "jackknife", or "placebo" |
+| `variance_effects` | `np.ndarray` | Per-iteration draws (placebo effects, bootstrap ATT draws, or jackknife LOO estimates per `variance_method`); deprecated alias `placebo_effects` (removed v4.0.0) |
 | `noise_level` | `float` | Estimated noise level |
 | `zeta_omega` | `float` | Unit weight regularization |
 | `zeta_lambda` | `float` | Time weight regularization |
@@ -1272,7 +1273,7 @@ Returned by `SyntheticDiD.fit()`.
 
 **Validation diagnostics** (call after `fit()`):
 - `get_weight_concentration(top_k=5)` - effective N and top-k weight share; flags fragile synthetic controls dominated by a few donor units
-- `get_loo_effects_df()` - per-unit leave-one-out influence from the jackknife pass (DataFrame includes both control and treated rows). Requires `variance_method="jackknife"` with unit-level LOO granularity: available on non-survey and pweight-only jackknife fits; raises `NotImplementedError` on full-design survey jackknife (PSU-level LOO, see `result.placebo_effects` for raw PSU-level replicates) and `ValueError` when LOO is unavailable (single treated unit, only one control with nonzero effective weight, etc.)
+- `get_loo_effects_df()` - per-unit leave-one-out influence from the jackknife pass (DataFrame includes both control and treated rows). Requires `variance_method="jackknife"` with unit-level LOO granularity: available on non-survey and pweight-only jackknife fits; raises `NotImplementedError` on full-design survey jackknife (PSU-level LOO, see `result.variance_effects` for raw PSU-level replicates) and `ValueError` when LOO is unavailable (single treated unit, only one control with nonzero effective weight, etc.)
 - `in_time_placebo()` - re-estimate on shifted fake treatment dates in the pre-period; near-zero placebo ATTs indicate a credible design
 - `sensitivity_to_zeta_omega()` - re-estimate across a grid of unit-weight regularization values; checks ATT robustness to the auto-selected zeta_omega
 
 
@@ -4,6 +4,7 @@
 Provides statsmodels-style output with a more Pythonic interface.
 """
 
+import warnings
 from dataclasses import dataclass, field
 from typing import Any, Dict, List, Optional, Tuple
 
@@ -1079,12 +1080,13 @@ class SyntheticDiDResults:
         Arkhangelsky et al. 2021 Algorithm 2 step 2, and R's default
         ``synthdid::vcov(method="bootstrap")``), ``"jackknife"``, or
         ``"placebo"``.
-    placebo_effects : np.ndarray, optional
+    variance_effects : np.ndarray, optional
         Method-specific per-iteration estimates: placebo treatment effects
         (for ``"placebo"``), bootstrap ATT estimates with re-estimated
         weights per draw (for ``"bootstrap"``), or leave-one-out estimates
         (for ``"jackknife"``). The ``variance_method`` field disambiguates
-        the contents.
+        the contents. (The deprecated read-only alias ``placebo_effects``
+        returns this array and is removed in v4.0.0.)
     synthetic_pre_trajectory : np.ndarray, optional
         Synthetic control trajectory in pre-treatment periods, shape
         ``(n_pre,)``. Equal to ``Y_pre_control @ omega_eff`` where
@@ -1122,7 +1124,7 @@ class SyntheticDiDResults:
     zeta_omega: Optional[float] = field(default=None)
     zeta_lambda: Optional[float] = field(default=None)
     pre_treatment_fit: Optional[float] = field(default=None)
-    placebo_effects: Optional[np.ndarray] = field(default=None)
+    variance_effects: Optional[np.ndarray] = field(default=None)
     n_bootstrap: Optional[int] = field(default=None)
     # Survey design metadata (SurveyMetadata instance from diff_diff.survey)
     survey_metadata: Optional[Any] = field(default=None)
@@ -1145,7 +1147,7 @@ def __post_init__(self):
         # Plain attributes rather than dataclass fields so asdict()-style
         # recursion cannot serialize internal panel state.
         self._loo_unit_ids: Optional[List[Any]] = None
-        # Granularity of the `placebo_effects` LOO array: "unit" (non-
+        # Granularity of the `variance_effects` LOO array: "unit" (non-
         # survey + pweight-only jackknife), "psu" (full-design survey
         # jackknife), or None (non-jackknife variance methods). Governs
         # which accessors are well-defined. Set by `fit()` at result
@@ -1180,6 +1182,20 @@ def __getstate__(self) -> Dict[str, Any]:
         state["_fit_snapshot"] = None
         return state
 
+    def __setstate__(self, state: Dict[str, Any]) -> None:
+        """Restore from pickle, migrating the legacy field name.
+
+        Results pickled before the ``placebo_effects`` → ``variance_effects``
+        rename (<= 3.5.x) carry the old key in their state; map it so the
+        stored variance draws survive and remain reachable through both
+        ``variance_effects`` and the deprecated ``placebo_effects`` alias.
+        Remove together with the alias in v4.0.0.
+        """
+        if "placebo_effects" in state and "variance_effects" not in state:
+            state = dict(state)
+            state["variance_effects"] = state.pop("placebo_effects")
+        self.__dict__.update(state)
+
     @property
     def coef_var(self) -> float:
         """Coefficient of variation: SE / abs(ATT). NaN when ATT is 0 or SE non-finite."""
@@ -1189,6 +1205,27 @@ def coef_var(self) -> float:
             return np.nan
         return self.se / abs(self.att)
 
+    @property
+    def placebo_effects(self) -> Optional[np.ndarray]:
+        """Deprecated alias for :attr:`variance_effects` (removed in v4.0.0).
+
+        .. deprecated:: 3.6.0
+            Renamed to ``variance_effects`` because the array's contents are
+            method-specific (placebo effects, bootstrap ATT draws, or
+            leave-one-out estimates depending on ``variance_method``).
+        """
+        # `3.6.0` is the assumed next-minor (current is 3.5.1); confirm/resolve
+        # at bump-version time. The v4.0.0 removal target is fixed.
+        warnings.warn(
+            "SyntheticDiDResults.placebo_effects is deprecated; use "
+            "variance_effects instead. The array holds placebo effects, "
+            "bootstrap ATT draws, or leave-one-out estimates depending on "
+            "variance_method. Will be removed in v4.0.0.",
+            DeprecationWarning,
+            stacklevel=2,
+        )
+        return self.variance_effects
+
     def summary(self, alpha: Optional[float] = None) -> str:
         """
         Generate a formatted summary of the estimation results.
@@ -1388,7 +1425,7 @@ def get_loo_effects_df(self) -> pd.DataFrame:
         * full-design survey jackknife fits (strata / PSU / FPC set in
           ``SurveyDesign``) - the underlying replicates are PSU-level
           ``τ̂_{(h,j)}`` (Rust & Rao 1996), not unit-level. See
-          ``result.placebo_effects`` for the raw PSU-level replicate
+          ``result.variance_effects`` for the raw PSU-level replicate
           array and REGISTRY §SyntheticDiD "Note (survey + jackknife
           composition)" for the aggregation formula.
 
@@ -1424,7 +1461,7 @@ def get_loo_effects_df(self) -> pd.DataFrame:
             )
         # Survey-jackknife fits use PSU-level LOO (Rust & Rao 1996) with
         # stratum aggregation rather than unit-level LOO. The returned
-        # ``placebo_effects`` array in that path is a flat list of
+        # ``variance_effects`` array in that path is a flat list of
         # PSU-level τ̂_{(h,j)} replicates (variable length, ordered by
         # stratum then PSU), not a length-N unit-indexed array. Mapping
         # these onto the fit-time unit IDs would mislabel PSU replicates
@@ -1441,19 +1478,19 @@ def get_loo_effects_df(self) -> pd.DataFrame:
                 "stratum aggregation, Rust & Rao 1996); the underlying "
                 "replicates are PSU-level, not unit-level, so joining them "
                 "back to fit-time unit IDs is not well-defined. See "
-                "``result.placebo_effects`` for the raw PSU-level replicate "
+                "``result.variance_effects`` for the raw PSU-level replicate "
                 "array and ``docs/methodology/REGISTRY.md`` §SyntheticDiD "
                 '"Note (survey + jackknife composition)" for the '
                 "aggregation formula."
             )
-        if self._loo_unit_ids is None or self._loo_roles is None or self.placebo_effects is None:
+        if self._loo_unit_ids is None or self._loo_roles is None or self.variance_effects is None:
             raise ValueError(
                 "Leave-one-out estimates are unavailable (jackknife returned "
                 "NaN or an empty array). See prior warnings from fit() for the "
                 "cause (e.g., single treated unit, all weight on one control)."
             )
 
-        att_loo = np.asarray(self.placebo_effects, dtype=float)
+        att_loo = np.asarray(self.variance_effects, dtype=float)
         delta = att_loo - self.att
         df = pd.DataFrame(
             {