feat: Workspace filesystem cleanup #391
14 issues
Medium
Discovered test total changed from 52 to 57 without corresponding item list update - `src/snapshot-tests/__fixtures__/json/device/test--failure.json:34`
The discovered.total field was updated from 52 to 57, but the visible items array in the surrounding context only shows a small subset and there's no indication that 5 new test items were added to the fixture. If the total count changed without a matching behavioral change in test discovery, this may indicate a fixture being patched to make tests pass rather than reflecting an intentional behavior change. Reviewers should verify that the new total of 57 corresponds to actual added test cases in the fixture's items array.
Also found at:
src/snapshot-tests/__fixtures__/mcp/simulator/test--failure.txt:10-24
New testCases entries omit the 'suite' field present on existing entries - `src/snapshot-tests/__fixtures__/json/simulator/test--failure.json:68-238`
The 34 newly added testCases objects (lines 68-238) only contain 'test', 'status', and 'durationMs' fields, while every pre-existing testCase entry in the same array includes a 'suite' field (e.g., 'CalculatorAppTests'). This indicates the JSON structured output envelope shape is no longer stable across entries within the same array, which violates the guardrail that JSON fixtures preserve stable structured output envelopes. Consumers parsing testCases will encounter inconsistent schemas within a single response.
Startup registry lock not released if SIGTERM/SIGINT arrives before listen callback - `src/daemon.ts:507-510`
releaseStartupRegistryLock is only invoked from three places: the listen-callback's try/finally, handleStartupServerError, and the outer catch. The signal handlers (SIGTERM/SIGINT → shutdown(0)) are registered at lines 507-508, but shutdown() does not release the startup registry lock. If a signal is delivered between server.listen(...) being scheduled and its callback firing, shutdown() will run, process.exit will occur via the cleanup pipeline, and the filesystem-based registry mutation lock will only be reclaimed via lease expiry (DAEMON_REGISTRY_LOCK_LEASE_MS = 30s). This can transiently block another daemon's startup for the same workspace.
Also found at:
src/daemon.ts:444-452
canRemoveRegistryEntry treats missing instanceId as non-removable for live owners - `src/daemon/daemon-registry.ts:219-225`
When allowLiveOwner is true and the caller's pid matches the live entry's pid, removal still requires entry.instanceId !== undefined && options.instanceId === entry.instanceId. Older entries written before instanceId was introduced (the field is optional in the interface and validator) will have entry.instanceId === undefined, making them permanently un-removable by their own owning process even when pid matches. This can leave stale registry files that legitimately belong to the current live process and block subsequent daemon lifecycle operations that depend on cleanup.
Also found at:
src/daemon/daemon-registry.ts:132-140
Ownerless lock-dir recovery skips post-quarantine ownership verification, allowing destruction of a freshly written lock - `src/utils/fs-lock.ts:117-124`
In tryRecoverExpiredLockDir, when shouldRecoverLockDir returns recovery.owner === null (the lock dir existed but had no owner.json and was older than the lease), the function quarantines the directory and immediately removes it without re-reading the quarantined contents. Between the initial owner read and the rename, another process could have completed createLock (mkdir succeeded earlier, then writeFile of owner.json finished), making the directory a valid live lock. We then rename it away and rm it, silently destroying that process's lock and letting two holders believe they own the same resource. The owner!=null branch guards against this with fsLockOwnersEqual, but the null branch does not.
Also found at:
src/utils/process-liveness.ts:7-12
Scheduled sweep cooldown can be bypassed before completion, allowing concurrent scheduling for same scope - `src/utils/workspace-filesystem-lifecycle.ts:404-428`
scheduleWorkspaceFilesystemLifecycleSweep only updates lastScheduledAtByScope and lastScheduledAtByPreKey after the sweep completes (in .then). Between scheduling and completion, the runningScheduledSweeps set guards against same-scope re-entry, but the pre-key cooldown check uses lastScheduledAtByPreKey which has not yet been written. Two callers using different preKey values (e.g. one passing workspaceKey, another passing logDir that resolves to the same scope) can both pass cooldown gates and one will then early-return at the runningScheduledSweeps.has check — but a caller using only a logDir override that maps to a different scheduleKey could schedule a redundant concurrent sweep targeting overlapping paths. The cooldown is best-effort, but the asymmetry between pre-key and scope keys means rapid bursts of artifact-created events trigger more sweeps than intended.
Fire-and-forget runStartupLifecycleSweep can crash the daemon via unhandledRejection - `src/daemon.ts:493`
The setImmediate block invokes void runStartupLifecycleSweep() without attaching a .catch() handler, while enrichSentryMetadata() directly above it is properly wrapped with .catch(). Because process.on('unhandledRejection', handleCrash) is registered immediately afterward, any rejection from the lifecycle sweep would be treated as a daemon crash and trigger shutdown(1). This defeats the stated intent in the comment that the sweep is fire-and-forget and should not impact request serving — a transient filesystem error during reconciliation would terminate a freshly started daemon.
Socket path relocation breaks discovery of existing daemons - `src/daemon/socket-path.ts:33-44`
The socket file location changed from ~/.xcodebuildmcp/daemons/{key}/daemon.sock to {tmpdir}/xcodebuildmcp-{compactKey}/d.sock, and the registry/log paths also moved. Any daemon process started before this change will not be discoverable by the new code, and vice versa. Without a migration path or fallback lookup, upgrading users may end up with orphaned daemon processes and duplicate daemons being spawned for the same workspace.
Also found at:
src/daemon/socket-path.ts:21-23
Low
Failed quarantine restore leaks .stale.<pid>.<uuid> directories indefinitely - `src/utils/fs-lock.ts:56-63`
restoreQuarantinedLockDir intentionally leaves the quarantined directory in place when rename-back fails (e.g., because another contender now holds lockDir). Because the quarantine name embeds the current pid and a fresh UUID, nothing else will ever reclaim or clean it up from this code path. Over time, repeated contention produces an unbounded number of orphan .stale.* directories under the lock parent, a slow disk-fill / inode-exhaustion DoS on long-lived workspaces.
Lock treated as still valid when expiresAtMs equals now - `src/utils/fs-lock.ts:80-86`
shouldRecoverLockDir uses staleOwner.expiresAtMs > now to decide non-expiry. When the clock equals expiresAtMs exactly, the lock is considered live and recovery is refused, even though the lease has nominally elapsed. This is a minor off-by-one that delays recovery by one tick but does not cause correctness issues; consider >= now.
...and 4 more
15 skills analyzed
| Skill | Findings | Duration | Cost |
|---|---|---|---|
| xcodebuildmcp-docs-release-review | 0 | 5.8s | $0.06 |
| xcodebuildmcp-docs-command-review | 0 | 3.4s | $0.02 |
| xcodebuildmcp-rendering-streaming-review | 0 | 4m 16s | $0.24 |
| xcodebuildmcp-runtime-boundary-review | 0 | 4m 23s | $1.43 |
| xcodebuildmcp-snapshot-fixture-review | 2 | 6m 42s | $3.01 |
| xcodebuildmcp-structured-output-review | 0 | 3m 51s | $3.49 |
| xcodebuildmcp-test-boundary-review | 0 | 3m 4s | $14.81 |
| xcodebuildmcp-tool-contract-review | 0 | 4m | $0.09 |
| wrdn-pii | 0 | 13m 36s | $6.77 |
| wrdn-authz | 0 | 8m 34s | $1.83 |
| wrdn-code-execution | 0 | 7m 37s | $5.09 |
| wrdn-data-exfil | 0 | 9m 35s | $4.84 |
| find-bugs | 7 | 18m 13s | $4.34 |
| code-review | 3 | 16m 17s | $3.99 |
| code-simplifier | 2 | 14m 34s | $4.58 |
Duration: 114m 53s · Tokens: 16.1M in / 48.3k out · Cost: $54.68 (+merge: $0.01, +dedup: $0.10)