Skip to content

Commit 1249ded

Browse files
igerberclaude
andcommitted
imputation: suppress cluster_name/n_clusters under ANY survey design (incl. replicate)
The Results-metadata suppression gate previously fired only when `resolved_survey.psu is not None`, which left replicate-weight survey fits (psu=None by SurveyDesign mutual-exclusion rules) leaking cluster_name="unit" and n_clusters=n_units onto Results. Summary then printed "Number of clusters" plus the unit-cluster CR1 label, even though the new public contract says both fields are None under survey designs because replicate-variance ignores PSU/cluster entirely (replicates encode the design implicitly via BRR / Fay / JK1 / JKn / SDR reweighting). Fix: gate on `resolved_survey is not None` so the suppression also covers the replicate-weight branch. Regression test added: `test_cluster_name_suppressed_under_replicate_survey` asserts both fields are None and summary omits the Number-of-clusters line + the CR1 label under a JK1 replicate design. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b819aa8 commit 1249ded

2 files changed

Lines changed: 35 additions & 4 deletions

File tree

diff_diff/imputation.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -887,16 +887,18 @@ def _refit_imp(w_r):
887887
)[0]
888888

889889
# Resolve cluster_name / n_clusters for Results metadata.
890-
# Suppress under survey designs (the survey block in summary()
891-
# already renders the design's PSU/strata metadata).
890+
# Suppress under ANY survey design (the survey block in summary()
891+
# already renders the design's PSU/strata/replicate metadata, and
892+
# replicate-weight variance ignores PSU/cluster entirely — keeping
893+
# cluster_name/n_clusters populated on a replicate fit would
894+
# misreport the inference source).
892895
# Otherwise:
893896
# bare cluster= -> populate with the user-named cluster column
894897
# cluster=None -> the Theorem 3 variance still clusters at the
895898
# `unit` column by default (cluster_var = unit
896899
# at L418), so the summary label must report
897900
# unit-cluster CR1, not generic HC1.
898-
_survey_active = resolved_survey is not None and resolved_survey.psu is not None
899-
if _survey_active:
901+
if resolved_survey is not None:
900902
_cluster_name_for_results: Optional[str] = None
901903
_n_clusters_for_results: Optional[int] = None
902904
elif self.cluster is not None:

tests/test_imputation.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2831,6 +2831,35 @@ def test_cluster_name_suppressed_under_survey(self):
28312831
assert r.cluster_name is None
28322832
assert r.n_clusters is None
28332833

2834+
def test_cluster_name_suppressed_under_replicate_survey(self):
2835+
# Replicate-weight survey designs have psu=None but still must
2836+
# suppress cluster_name/n_clusters: replicate variance is computed
2837+
# by replicate reweighting (BRR / Fay / JK1 / JKn / SDR) and
2838+
# ignores PSU/cluster entirely, so populating cluster_name="unit"
2839+
# and n_clusters=n_units would misreport the inference source.
2840+
# Summary must also omit the "Number of clusters:" line and the
2841+
# CR1 cluster-robust label.
2842+
data, rep_cols = _imputation_replicate_panel()
2843+
design = SurveyDesign(
2844+
weights="weight",
2845+
replicate_weights=rep_cols,
2846+
replicate_method="JK1",
2847+
weight_type="pweight",
2848+
)
2849+
r = ImputationDiD().fit(
2850+
data,
2851+
outcome="outcome",
2852+
unit="unit",
2853+
time="time",
2854+
first_treat="first_treat",
2855+
survey_design=design,
2856+
)
2857+
assert r.cluster_name is None
2858+
assert r.n_clusters is None
2859+
text = r.summary()
2860+
assert "Number of clusters:" not in text
2861+
assert "CR1 cluster-robust" not in text
2862+
28342863
def test_fit_clone_idempotent_on_vcov_type(self):
28352864
data = generate_test_data(seed=11)
28362865
imp1 = ImputationDiD(vcov_type="hc1")

0 commit comments

Comments
 (0)