tutorial: address CI codex P3 — limit clean-tail claim to what's shown

igerber · claude · igerber · commit 7819486da831 · 2026-05-31T13:17:26.000-04:00
The flat-effects cell demonstrates clean-tail 2x2 unbiasedness (point-estimate targeting),
not comparative power, so soften 'more powerful than CS' to 'can be more powerful, since it
pools' and state the demonstrated claim is unbiasedness. Markdown-only (cell 14 + decision
table).

Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/docs/tutorials/24_staggered_vs_collapsed_power.ipynb b/docs/tutorials/24_staggered_vs_collapsed_power.ipynb
@@ -736,17 +736,7 @@
    "cell_type": "markdown",
    "id": "f59b0cbf",
    "metadata": {},
-   "source": [
-    "## A couple of refinements\n",
-    "\n",
-    "**When can a 2×2 ever be unbiased?** Under the assumptions this whole tutorial relies on —\n",
-    "parallel trends vs. the never-treated control, no anticipation, and absorbing treatment — a\n",
-    "2×2 lands on the effect-on-treated only if you (a) use *just* the clean all-treated tail (drop\n",
-    "the rollout window) **and** (b) effects don't change with time-since-launch. When effects are\n",
-    "flat, that clean-tail 2×2 targets the same estimand as CS, and *in this design* it comes out\n",
-    "more powerful. When effects **grow**, no 2×2 recovers the effect-on-treated — the naive one\n",
-    "reads low, the clean-tail one reads high (it captures the grown tail):"
-   ]
+   "source": "## A couple of refinements\n\n**When can a 2×2 ever be unbiased?** Under the assumptions this whole tutorial relies on —\nparallel trends vs. the never-treated control, no anticipation, and absorbing treatment — a\n2×2 lands on the effect-on-treated only if you (a) use *just* the clean all-treated tail (drop\nthe rollout window) **and** (b) effects don't change with time-since-launch. When effects are\nflat, that clean-tail 2×2 targets the same estimand as CS — the cell below demonstrates that\n*unbiasedness* (it doesn't chart power). Because it pools the whole tail it can also be more\npowerful than CS, but that's a separate claim this notebook doesn't measure. When effects\n**grow**, no 2×2 recovers the effect-on-treated — the naive one reads low, the clean-tail one\nreads high (it captures the grown tail):"
   },
   {
    "cell_type": "code",
@@ -805,22 +795,7 @@
    "cell_type": "markdown",
    "id": "9b987d86",
    "metadata": {},
-   "source": [
-    "## Decision guide\n",
-    "\n",
-    "| Your situation | Use | Why |\n",
-    "|---|---|---|\n",
-    "| Fast rollout, only need \"did it work + rough size\" | collapsed 2×2 | cheapest power, dilution mild; check CS as a sanity pass |\n",
-    "| Effects are flat **and** you have a clean all-treated tail | clean-tail 2×2 | unbiased *and* more powerful than CS; CS = diagnostic |\n",
-    "| Slow / spread-out rollout | **CS** | the 2×2's power edge shrinks and its dilution is worst; honest coverage |\n",
-    "| Effects grow, or you need the magnitude / ROI / dynamics | **CS** | only estimator targeting the effect-on-treated; the event study is the deliverable |\n",
-    "\n",
-    "**Bottom line.** \"CS will kill my power\" is true in magnitude but mis-aimed. CS does cost\n",
-    "power versus the 2×2 — but the cost is smallest exactly where the rollout is staggered enough\n",
-    "to need it, and the 2×2's apparent power is bought by quietly estimating a smaller, diluted\n",
-    "number. The real decision isn't *power vs. no power* — it's *which estimand you actually\n",
-    "want, and whether 50 states can detect it.*"
-   ]
+   "source": "## Decision guide\n\n| Your situation | Use | Why |\n|---|---|---|\n| Fast rollout, only need \"did it work + rough size\" | collapsed 2×2 | cheapest power, dilution mild; check CS as a sanity pass |\n| Effects are flat **and** you have a clean all-treated tail | clean-tail 2×2 | unbiased (and can be more powerful, since it pools); CS = diagnostic |\n| Slow / spread-out rollout | **CS** | the 2×2's power edge shrinks and its dilution is worst; honest coverage |\n| Effects grow, or you need the magnitude / ROI / dynamics | **CS** | only estimator targeting the effect-on-treated; the event study is the deliverable |\n\n**Bottom line.** \"CS will kill my power\" is true in magnitude but mis-aimed. CS does cost\npower versus the 2×2 — but the cost is smallest exactly where the rollout is staggered enough\nto need it, and the 2×2's apparent power is bought by quietly estimating a smaller, diluted\nnumber. The real decision isn't *power vs. no power* — it's *which estimand you actually\nwant, and whether 50 states can detect it.*"
   },
   {
    "cell_type": "markdown",