Skip to content

Commit 7819486

Browse files
igerberclaude
andcommitted
tutorial: address CI codex P3 — limit clean-tail claim to what's shown
The flat-effects cell demonstrates clean-tail 2x2 unbiasedness (point-estimate targeting), not comparative power, so soften 'more powerful than CS' to 'can be more powerful, since it pools' and state the demonstrated claim is unbiasedness. Markdown-only (cell 14 + decision table). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 2b4cb8d commit 7819486

1 file changed

Lines changed: 2 additions & 27 deletions

File tree

docs/tutorials/24_staggered_vs_collapsed_power.ipynb

Lines changed: 2 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -736,17 +736,7 @@
736736
"cell_type": "markdown",
737737
"id": "f59b0cbf",
738738
"metadata": {},
739-
"source": [
740-
"## A couple of refinements\n",
741-
"\n",
742-
"**When can a 2×2 ever be unbiased?** Under the assumptions this whole tutorial relies on —\n",
743-
"parallel trends vs. the never-treated control, no anticipation, and absorbing treatment — a\n",
744-
"2×2 lands on the effect-on-treated only if you (a) use *just* the clean all-treated tail (drop\n",
745-
"the rollout window) **and** (b) effects don't change with time-since-launch. When effects are\n",
746-
"flat, that clean-tail 2×2 targets the same estimand as CS, and *in this design* it comes out\n",
747-
"more powerful. When effects **grow**, no 2×2 recovers the effect-on-treated — the naive one\n",
748-
"reads low, the clean-tail one reads high (it captures the grown tail):"
749-
]
739+
"source": "## A couple of refinements\n\n**When can a 2×2 ever be unbiased?** Under the assumptions this whole tutorial relies on —\nparallel trends vs. the never-treated control, no anticipation, and absorbing treatment — a\n2×2 lands on the effect-on-treated only if you (a) use *just* the clean all-treated tail (drop\nthe rollout window) **and** (b) effects don't change with time-since-launch. When effects are\nflat, that clean-tail 2×2 targets the same estimand as CS — the cell below demonstrates that\n*unbiasedness* (it doesn't chart power). Because it pools the whole tail it can also be more\npowerful than CS, but that's a separate claim this notebook doesn't measure. When effects\n**grow**, no 2×2 recovers the effect-on-treated — the naive one reads low, the clean-tail one\nreads high (it captures the grown tail):"
750740
},
751741
{
752742
"cell_type": "code",
@@ -805,22 +795,7 @@
805795
"cell_type": "markdown",
806796
"id": "9b987d86",
807797
"metadata": {},
808-
"source": [
809-
"## Decision guide\n",
810-
"\n",
811-
"| Your situation | Use | Why |\n",
812-
"|---|---|---|\n",
813-
"| Fast rollout, only need \"did it work + rough size\" | collapsed 2×2 | cheapest power, dilution mild; check CS as a sanity pass |\n",
814-
"| Effects are flat **and** you have a clean all-treated tail | clean-tail 2×2 | unbiased *and* more powerful than CS; CS = diagnostic |\n",
815-
"| Slow / spread-out rollout | **CS** | the 2×2's power edge shrinks and its dilution is worst; honest coverage |\n",
816-
"| Effects grow, or you need the magnitude / ROI / dynamics | **CS** | only estimator targeting the effect-on-treated; the event study is the deliverable |\n",
817-
"\n",
818-
"**Bottom line.** \"CS will kill my power\" is true in magnitude but mis-aimed. CS does cost\n",
819-
"power versus the 2×2 — but the cost is smallest exactly where the rollout is staggered enough\n",
820-
"to need it, and the 2×2's apparent power is bought by quietly estimating a smaller, diluted\n",
821-
"number. The real decision isn't *power vs. no power* — it's *which estimand you actually\n",
822-
"want, and whether 50 states can detect it.*"
823-
]
798+
"source": "## Decision guide\n\n| Your situation | Use | Why |\n|---|---|---|\n| Fast rollout, only need \"did it work + rough size\" | collapsed 2×2 | cheapest power, dilution mild; check CS as a sanity pass |\n| Effects are flat **and** you have a clean all-treated tail | clean-tail 2×2 | unbiased (and can be more powerful, since it pools); CS = diagnostic |\n| Slow / spread-out rollout | **CS** | the 2×2's power edge shrinks and its dilution is worst; honest coverage |\n| Effects grow, or you need the magnitude / ROI / dynamics | **CS** | only estimator targeting the effect-on-treated; the event study is the deliverable |\n\n**Bottom line.** \"CS will kill my power\" is true in magnitude but mis-aimed. CS does cost\npower versus the 2×2 — but the cost is smallest exactly where the rollout is staggered enough\nto need it, and the 2×2's apparent power is bought by quietly estimating a smaller, diluted\nnumber. The real decision isn't *power vs. no power* — it's *which estimand you actually\nwant, and whether 50 states can detect it.*"
824799
},
825800
{
826801
"cell_type": "markdown",

0 commit comments

Comments
 (0)