Skip to content

feat: --auto-spindump + multi-kind cluster sidecar (HangBuster PR 2/3)#92

Merged
conorluddy merged 1 commit into
mainfrom
feat/hangbuster-auto-spindump
May 25, 2026
Merged

feat: --auto-spindump + multi-kind cluster sidecar (HangBuster PR 2/3)#92
conorluddy merged 1 commit into
mainfrom
feat/hangbuster-auto-spindump

Conversation

@conorluddy

Copy link
Copy Markdown
Owner

Context

Building on PR #91 (which wired --auto-sample to xcrun simctl spawn sample), this PR:

  1. Adds a second capture mechanism via xcrun simctl spawn spindump behind --auto-spindump.
  2. Fixes a latent gap: auto-samples stashed during capture were never reaching summary.json. The worker writes to auto_samples.jsonl, but SummaryBuilder.build never read it back — so --get-details on a stopped session showed no stack. Only the --resample path (which re-fires sample at inspection time) worked. Bundling the fix here because PR 2 has to touch the same code paths anyway.
  3. Generalises the sidecar to multi-kind so both captures coexist under one fingerprint.

This is PR 2 of 3 in the stack-enrichment plan. PR 3 will add atos post-processing on the captured stacks for source-line symbolication.

Changes

Storage (hang_sessions.py):

  • read_auto_samples return type: dict[str, dict]dict[str, list[dict]], preserving write order. Callers disambiguate via the kind field on each payload.
  • build_summary now reads auto_samples and passes them to SummaryBuilder.build so they survive into summary.json.

Pipeline (hang_pipeline.py):

  • Cluster.auto_samples: list[dict] | None alongside existing auto_sample (kept for backward compat with old summary.json files; rehydrates from either).
  • SummaryBuilder.build takes auto_samples_by_fp and attaches matching captures to clusters at build time.
  • format_cluster_detail iterates samples, renders each kind under its own header (simctl-sample stack (top 10): / spindump stack (top 10):). Also fixes a latent bug from PR 1: stack[:10] sliced 10 chars when stack became a string; now splits to lines correctly.

Watcher (hang_watcher.py):

  • New --auto-spindump CLI flag, parallel to --auto-sample.
  • New _attempt_auto_spindump(udid, pid) shelling to xcrun simctl spawn <udid> spindump <pid> 1 -file - with a 10s timeout (spindump is heavier than sample).
  • Worker adds a parallel spindumped_fingerprints dedup set so each kind captures at most once per fingerprint.
  • --resample populates the new auto_samples list slot.

Tests

8 new units in test_auto_sample.py / test_hang_pipeline.py / test_hang_sessions.py covering:

  • spindump subprocess: success / missing-udid / timeout / non-zero exit
  • multi-kind storage round-trip under one fingerprint, write-order preserved
  • SummaryBuilder.build attaches by matching fingerprint, ignores non-matching
  • format_cluster_detail renders both kinds with labels, falls back to legacy auto_sample, reports failure reasons
pytest tests/         153 passed
ruff check            clean
black --check         clean

Verification

  • Unit: pytest tests/ — covered.
  • Manual smoke: boot a sim, run python scripts/hang_watcher.py --start --auto-sample --auto-spindump --bundle-id <id>, reproduce a hang, then --stop and --get-details --cluster 1 should now show both a simctl-sample stack block and a spindump stack block in the cluster detail. Without --auto-spindump the existing PR 1 behaviour is unchanged (single sample block). (Not run as part of CI — sim-coupled.)

Adds spindump capture alongside --auto-sample, and fixes a latent gap
where auto-samples stashed during capture never made it into summary.json.

Storage: read_auto_samples returns dict[str, list[dict]] (write-order
preserved) so a fingerprint can carry both a simctl-sample and a spindump
record. Worker uses parallel dedup sets so each kind fires at most once
per fingerprint.

Pipeline: Cluster gains auto_samples: list[dict] | None alongside the
legacy auto_sample (kept for backward compat with old summary.json
files). SummaryBuilder.build() now takes auto_samples_by_fp and attaches
matching captures to clusters. SessionStore.build_summary passes it
through automatically. format_cluster_detail iterates and renders each
kind with its own label; also fixes a bug where the old format treated
stack as a list when PR 1 made it text — now splits by lines correctly.

CLI: new --auto-spindump flag, parallel to --auto-sample. Updated
--resample to use the new auto_samples list shape.

Tests: 8 new units covering spindump subprocess paths, multi-kind
storage round-trip, SummaryBuilder attachment by fingerprint, multi-
kind format rendering, and legacy auto_sample backward compat.
@conorluddy conorluddy merged commit d46081b into main May 25, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant