tutorial: address CI codex P3 (round 2) — drop unseen-evidence appeal in holdout note

igerber · claude · igerber · commit 2b4cb8d96ae1 · 2026-05-31T13:10:24.000-04:00
The holdout paragraph still referenced 'a separate sweep, not shown' which appeals to
evidence the reader can't verify. Reframe purely as a mechanism/hypothesis to test on your
own design, explicitly noting this notebook does not measure it. Markdown-only change.

Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/docs/tutorials/24_staggered_vs_collapsed_power.ipynb b/docs/tutorials/24_staggered_vs_collapsed_power.ipynb
@@ -799,21 +799,7 @@
    "cell_type": "markdown",
    "id": "9dca90b2",
    "metadata": {},
-   "source": [
-    "**Holdout size.** Geo experiments usually hold back only a few states. We fix the holdout at\n",
-    "10 states here and don't sweep it, but the direction is worth knowing: a small control group\n",
-    "hurts the *2×2's* standard error too, so shrinking the holdout doesn't widen CS's relative\n",
-    "power gap — if anything it narrows it (a separate sweep, not shown in this notebook, bears\n",
-    "this out). Treat this as design intuition, not a demonstrated rule.\n",
-    "\n",
-    "**A 50-state caveat: few clusters.** Our 2×2 helper already clusters by state\n",
-    "(`cluster=\"unit\"`), and with ~50 states (only ~10 controls) cluster-robust SEs lean on\n",
-    "large-sample approximations that are shaky at this scale. For a real 50-state test, prefer\n",
-    "wild-cluster bootstrap or small-sample corrections: `DifferenceInDifferences` supports\n",
-    "`inference=\"wild_bootstrap\"` (it resamples at the cluster level), and `CallawaySantAnna`\n",
-    "supports a multiplier bootstrap via `n_bootstrap=`. See the estimator docstrings for the\n",
-    "exact requirements."
-   ]
+   "source": "**Holdout size.** Geo experiments usually hold back only a few states. We hold this fixed at\n10 and don't vary it. One mechanism to keep in mind if you do vary it on your own design: a\nsmall control group inflates the *2×2's* standard error too — not only CS's — so a smaller\nholdout won't necessarily widen the CS-vs-2×2 power gap. This notebook doesn't measure that,\nso treat it as a hypothesis to test, not a result.\n\n**A 50-state caveat: few clusters.** Our 2×2 helper already clusters by state\n(`cluster=\"unit\"`), and with ~50 states (only ~10 controls) cluster-robust SEs lean on\nlarge-sample approximations that are shaky at this scale. For a real 50-state test, prefer\nwild-cluster bootstrap or small-sample corrections: `DifferenceInDifferences` supports\n`inference=\"wild_bootstrap\"` (it resamples at the cluster level), and `CallawaySantAnna`\nsupports a multiplier bootstrap via `n_bootstrap=`. See the estimator docstrings for the\nexact requirements."
   },
   {
    "cell_type": "markdown",
@@ -877,4 +863,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
+}