Skip to content

Commit a68f1b5

Browse files
igerberclaude
andcommitted
tutorial: address CI codex P3 — qualify the holdout-size claim
The 'Holdout size' paragraph stated CS's relative power cost is smallest when the holdout is small, but the notebook fixes the holdout at 10 states and does not sweep it. Reframe as design intuition (not a demonstrated in-notebook rule), noting a separate sweep supports the direction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent ef8b9e1 commit a68f1b5

1 file changed

Lines changed: 56 additions & 54 deletions

File tree

docs/tutorials/24_staggered_vs_collapsed_power.ipynb

Lines changed: 56 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"cells": [
33
{
44
"cell_type": "markdown",
5-
"id": "bc1e6c9b",
5+
"id": "269c802e",
66
"metadata": {},
77
"source": [
88
"# Tutorial 24: Staggered Rollout or a Simple 2×2? A Power-Analysis Decision Guide\n",
@@ -34,13 +34,13 @@
3434
{
3535
"cell_type": "code",
3636
"execution_count": 1,
37-
"id": "000ed3c8",
37+
"id": "0c4f03f8",
3838
"metadata": {
3939
"execution": {
40-
"iopub.execute_input": "2026-05-31T16:40:39.280684Z",
41-
"iopub.status.busy": "2026-05-31T16:40:39.280504Z",
42-
"iopub.status.idle": "2026-05-31T16:40:40.308671Z",
43-
"shell.execute_reply": "2026-05-31T16:40:40.308369Z"
40+
"iopub.execute_input": "2026-05-31T16:59:51.198041Z",
41+
"iopub.status.busy": "2026-05-31T16:59:51.197962Z",
42+
"iopub.status.idle": "2026-05-31T16:59:52.322294Z",
43+
"shell.execute_reply": "2026-05-31T16:59:52.321996Z"
4444
}
4545
},
4646
"outputs": [],
@@ -114,7 +114,7 @@
114114
},
115115
{
116116
"cell_type": "markdown",
117-
"id": "a2db376f",
117+
"id": "8400e7f2",
118118
"metadata": {},
119119
"source": [
120120
"## The scenario\n",
@@ -131,13 +131,13 @@
131131
{
132132
"cell_type": "code",
133133
"execution_count": 2,
134-
"id": "9d68d0d4",
134+
"id": "08de7530",
135135
"metadata": {
136136
"execution": {
137-
"iopub.execute_input": "2026-05-31T16:40:40.310130Z",
138-
"iopub.status.busy": "2026-05-31T16:40:40.309978Z",
139-
"iopub.status.idle": "2026-05-31T16:40:40.418677Z",
140-
"shell.execute_reply": "2026-05-31T16:40:40.418393Z"
137+
"iopub.execute_input": "2026-05-31T16:59:52.323568Z",
138+
"iopub.status.busy": "2026-05-31T16:59:52.323415Z",
139+
"iopub.status.idle": "2026-05-31T16:59:52.434068Z",
140+
"shell.execute_reply": "2026-05-31T16:59:52.433793Z"
141141
}
142142
},
143143
"outputs": [
@@ -272,7 +272,7 @@
272272
},
273273
{
274274
"cell_type": "markdown",
275-
"id": "5c179b0a",
275+
"id": "99001403",
276276
"metadata": {},
277277
"source": [
278278
"## 1. \"Simplifying\" silently changes the question\n",
@@ -293,13 +293,13 @@
293293
{
294294
"cell_type": "code",
295295
"execution_count": 3,
296-
"id": "e8b4ca62",
296+
"id": "8496ad42",
297297
"metadata": {
298298
"execution": {
299-
"iopub.execute_input": "2026-05-31T16:40:40.419755Z",
300-
"iopub.status.busy": "2026-05-31T16:40:40.419678Z",
301-
"iopub.status.idle": "2026-05-31T16:40:40.440839Z",
302-
"shell.execute_reply": "2026-05-31T16:40:40.440602Z"
299+
"iopub.execute_input": "2026-05-31T16:59:52.435188Z",
300+
"iopub.status.busy": "2026-05-31T16:59:52.435105Z",
301+
"iopub.status.idle": "2026-05-31T16:59:52.459409Z",
302+
"shell.execute_reply": "2026-05-31T16:59:52.459137Z"
303303
}
304304
},
305305
"outputs": [
@@ -334,13 +334,13 @@
334334
{
335335
"cell_type": "code",
336336
"execution_count": 4,
337-
"id": "92b7f831",
337+
"id": "77f02fec",
338338
"metadata": {
339339
"execution": {
340-
"iopub.execute_input": "2026-05-31T16:40:40.441827Z",
341-
"iopub.status.busy": "2026-05-31T16:40:40.441747Z",
342-
"iopub.status.idle": "2026-05-31T16:40:40.505346Z",
343-
"shell.execute_reply": "2026-05-31T16:40:40.505071Z"
340+
"iopub.execute_input": "2026-05-31T16:59:52.460449Z",
341+
"iopub.status.busy": "2026-05-31T16:59:52.460368Z",
342+
"iopub.status.idle": "2026-05-31T16:59:52.527876Z",
343+
"shell.execute_reply": "2026-05-31T16:59:52.527624Z"
344344
}
345345
},
346346
"outputs": [
@@ -381,7 +381,7 @@
381381
},
382382
{
383383
"cell_type": "markdown",
384-
"id": "794e7433",
384+
"id": "58d4791c",
385385
"metadata": {},
386386
"source": [
387387
"The slower the rollout, the more dilution. Let's sweep rollout speed and, across many\n",
@@ -392,13 +392,13 @@
392392
{
393393
"cell_type": "code",
394394
"execution_count": 5,
395-
"id": "4521cf5d",
395+
"id": "cd8bb89e",
396396
"metadata": {
397397
"execution": {
398-
"iopub.execute_input": "2026-05-31T16:40:40.506364Z",
399-
"iopub.status.busy": "2026-05-31T16:40:40.506293Z",
400-
"iopub.status.idle": "2026-05-31T16:40:48.949713Z",
401-
"shell.execute_reply": "2026-05-31T16:40:48.949465Z"
398+
"iopub.execute_input": "2026-05-31T16:59:52.528877Z",
399+
"iopub.status.busy": "2026-05-31T16:59:52.528803Z",
400+
"iopub.status.idle": "2026-05-31T17:00:01.603356Z",
401+
"shell.execute_reply": "2026-05-31T17:00:01.603100Z"
402402
}
403403
},
404404
"outputs": [
@@ -502,7 +502,7 @@
502502
},
503503
{
504504
"cell_type": "markdown",
505-
"id": "3a247dcd",
505+
"id": "43a88906",
506506
"metadata": {},
507507
"source": [
508508
"Why does the slow rollout dilute so much? The 2×2 averages over **all 16 post-weeks**, but\n",
@@ -517,7 +517,7 @@
517517
},
518518
{
519519
"cell_type": "markdown",
520-
"id": "2341d75c",
520+
"id": "c6236105",
521521
"metadata": {},
522522
"source": [
523523
"## 2. So does CS cost you power? (the headline)\n",
@@ -534,13 +534,13 @@
534534
{
535535
"cell_type": "code",
536536
"execution_count": 6,
537-
"id": "52f3e96c",
537+
"id": "823fbfb7",
538538
"metadata": {
539539
"execution": {
540-
"iopub.execute_input": "2026-05-31T16:40:48.950751Z",
541-
"iopub.status.busy": "2026-05-31T16:40:48.950663Z",
542-
"iopub.status.idle": "2026-05-31T16:40:51.826627Z",
543-
"shell.execute_reply": "2026-05-31T16:40:51.826336Z"
540+
"iopub.execute_input": "2026-05-31T17:00:01.604532Z",
541+
"iopub.status.busy": "2026-05-31T17:00:01.604442Z",
542+
"iopub.status.idle": "2026-05-31T17:00:04.651956Z",
543+
"shell.execute_reply": "2026-05-31T17:00:04.651680Z"
544544
}
545545
},
546546
"outputs": [
@@ -570,13 +570,13 @@
570570
{
571571
"cell_type": "code",
572572
"execution_count": 7,
573-
"id": "fb4e7123",
573+
"id": "5f8e0c4b",
574574
"metadata": {
575575
"execution": {
576-
"iopub.execute_input": "2026-05-31T16:40:51.827731Z",
577-
"iopub.status.busy": "2026-05-31T16:40:51.827661Z",
578-
"iopub.status.idle": "2026-05-31T16:41:42.127057Z",
579-
"shell.execute_reply": "2026-05-31T16:41:42.126798Z"
576+
"iopub.execute_input": "2026-05-31T17:00:04.653058Z",
577+
"iopub.status.busy": "2026-05-31T17:00:04.652982Z",
578+
"iopub.status.idle": "2026-05-31T17:00:56.428034Z",
579+
"shell.execute_reply": "2026-05-31T17:00:56.427769Z"
580580
}
581581
},
582582
"outputs": [
@@ -706,7 +706,7 @@
706706
},
707707
{
708708
"cell_type": "markdown",
709-
"id": "cd29c79d",
709+
"id": "86537377",
710710
"metadata": {},
711711
"source": [
712712
"There's the answer to \"how does the MDE change as the rollout gets more staggered?\" Reading\n",
@@ -734,7 +734,7 @@
734734
},
735735
{
736736
"cell_type": "markdown",
737-
"id": "c66efe44",
737+
"id": "f59b0cbf",
738738
"metadata": {},
739739
"source": [
740740
"## A couple of refinements\n",
@@ -751,13 +751,13 @@
751751
{
752752
"cell_type": "code",
753753
"execution_count": 8,
754-
"id": "1901e49c",
754+
"id": "9a902d30",
755755
"metadata": {
756756
"execution": {
757-
"iopub.execute_input": "2026-05-31T16:41:42.128160Z",
758-
"iopub.status.busy": "2026-05-31T16:41:42.128089Z",
759-
"iopub.status.idle": "2026-05-31T16:41:42.978026Z",
760-
"shell.execute_reply": "2026-05-31T16:41:42.977753Z"
757+
"iopub.execute_input": "2026-05-31T17:00:56.429298Z",
758+
"iopub.status.busy": "2026-05-31T17:00:56.429210Z",
759+
"iopub.status.idle": "2026-05-31T17:00:57.301865Z",
760+
"shell.execute_reply": "2026-05-31T17:00:57.301603Z"
761761
}
762762
},
763763
"outputs": [
@@ -797,12 +797,14 @@
797797
},
798798
{
799799
"cell_type": "markdown",
800-
"id": "7d119ff3",
800+
"id": "9dca90b2",
801801
"metadata": {},
802802
"source": [
803-
"**Holdout size.** Geo experiments usually hold back only a few states, and that's actually\n",
804-
"forgiving to CS: a small control group hurts the 2×2's standard error too, so CS's\n",
805-
"*relative* power cost is smallest exactly when the holdout is small.\n",
803+
"**Holdout size.** Geo experiments usually hold back only a few states. We fix the holdout at\n",
804+
"10 states here and don't sweep it, but the direction is worth knowing: a small control group\n",
805+
"hurts the *2×2's* standard error too, so shrinking the holdout doesn't widen CS's relative\n",
806+
"power gap — if anything it narrows it (a separate sweep, not shown in this notebook, bears\n",
807+
"this out). Treat this as design intuition, not a demonstrated rule.\n",
806808
"\n",
807809
"**A 50-state caveat: few clusters.** Our 2×2 helper already clusters by state\n",
808810
"(`cluster=\"unit\"`), and with ~50 states (only ~10 controls) cluster-robust SEs lean on\n",
@@ -815,7 +817,7 @@
815817
},
816818
{
817819
"cell_type": "markdown",
818-
"id": "5535009f",
820+
"id": "9b987d86",
819821
"metadata": {},
820822
"source": [
821823
"## Decision guide\n",
@@ -836,7 +838,7 @@
836838
},
837839
{
838840
"cell_type": "markdown",
839-
"id": "f2937f6d",
841+
"id": "52d8c4b6",
840842
"metadata": {},
841843
"source": [
842844
"## Run this on your own design\n",

0 commit comments

Comments
 (0)