Skip to content

fix: route contact_supervisor by stable session id (#35)#37

Open
adityavkk wants to merge 1 commit into
nicobailon:mainfrom
adityavkk:fix/stable-supervisor-session-id
Open

fix: route contact_supervisor by stable session id (#35)#37
adityavkk wants to merge 1 commit into
nicobailon:mainfrom
adityavkk:fix/stable-supervisor-session-id

Conversation

@adityavkk

Copy link
Copy Markdown

Summary

Fixes #35. contact_supervisor could fail to reach a still-connected supervisor when the supervisor was targeted by a display name/fallback alias that more than one session shared. The child was told to use contact_supervisor, but target resolution threw Multiple sessions named "..." and the ask was never sent.

This change lets contact_supervisor route by a stable broker session ID first, falling back to the existing name-based target for backwards compatibility.

Root cause

readChildOrchestratorMetadata() only captured PI_SUBAGENT_ORCHESTRATOR_TARGET (a display name or ID), and contact_supervisor resolved it via resolveSessionTarget(...). When two connected sessions shared that name, resolution threw before the message was sent, and the child had no ID to retry with.

What changed

  • Stable-ID routing: contact_supervisor now prefers a stable broker session ID when one is available, then falls back to the name target.
    • New optional PI_SUBAGENT_ORCHESTRATOR_SESSION_ID for the supervisor's broker session ID.
    • If absent, an inherited PI_INTERCOM_SESSION_ID from the parent process environment is used as the supervisor ID.
    • PI_SUBAGENT_ORCHESTRATOR_TARGET remains for display and name-based routing, so existing behavior is unchanged when no ID is provided.
  • ID publication: each connected session now publishes its own broker session ID as PI_INTERCOM_SESSION_ID in process.env, so a pi-subagents bridge that inherits the parent environment can route children back to the exact parent session. The previous value is saved and restored on disconnect/shutdown to avoid leaking state across sessions.
  • Graceful fallback: resolveSupervisorTarget(...) tries the session ID, then the name; if a stale ID can no longer be resolved, it falls back to the name target instead of failing hard. When resolution still fails and no stable ID was supplied, the error now hints which env vars enable stable routing.
  • Docs: README "When the Tool Appears" documents the new env contract and routing precedence.

Backwards compatibility

  • No pi-subagents change is required for this PR to be safe to merge: when the new env vars are absent, routing is identical to today.
  • To fully resolve the duplicate-name failure end to end, pi-subagents should populate PI_SUBAGENT_ORCHESTRATOR_SESSION_ID (or rely on inherited PI_INTERCOM_SESSION_ID). This PR makes pi-intercom ready for that without breaking the current name-based path.

Tests

Added focused integration tests in intercom.integration.test.ts:

  • sessions publish their broker id for subagent inheritance (and restore the prior value on shutdown)
  • contact_supervisor routes by stable broker id when two supervisors share a name
  • contact_supervisor uses an inherited parent broker id when no explicit id is set
  • contact_supervisor falls back to the name target when the broker id is stale

Refactored duplicated client setup into a connectTestClient(name) helper.

npm test
# 40 pass, 0 fail

@elecnix

elecnix commented Jun 17, 2026

Copy link
Copy Markdown

Reviewed this — the diagnosis is right: resolveSessionTarget resolves the display name/fallback alias via findSessions with no uniqueness, so a send / contact_supervisor to a shared name either errors with "Multiple sessions named …" or lands in an unrelated session. Routing by a stable broker session ID first (name as display/fallback only) is the right call, and keeping today's behavior when the env vars are absent is a good backward-compatible landing.

While reviewing I dug into the surrounding code and found the related issues that share this root cause (name-based addressing with no identity binding):

Before I'd want this merged, I'm going to validate the fix locally first:

  1. Run the unit/integration suite on this branch (npm test).
  2. Real-world repro in a tmux session with two pi agents — ideally stripped down to just pi-intercom (no full pi-subagents stack) to isolate the intercom layer — loading the local extension build via pi -e. Reproduce the duplicate-name case (two sessions sharing a name), confirm the misdelivery/ambiguity on main, then confirm correct routing on this branch.

Heads up: as the PR notes, fully closing this end-to-end needs the pi-subagents bridge to set PI_SUBAGENT_ORCHESTRATOR_SESSION_ID (or inherit PI_INTERCOM_SESSION_ID). Happy to take that side on a pi-subagents branch once this lands.


Validation results (local)

Ran the plan above. Setup: cloned nicobailon/pi-intercom (bare) with worktrees at origin/main (5caa4aa, v0.6.0) and this PR's head fix/stable-supervisor-session-id (b4db34e); npm install in both. Confirmed childOrchestratorMetadata = readChildOrchestratorMetadata() runs at extension-load (index.ts:1082), i.e. before the child connects and publishIntercomSessionId overwrites PI_INTERCOM_SESSION_ID — so the inherited parent id is captured correctly.

1. Unit/integration suite (npm test)

  • main (5caa4aa): 36/36 pass (pre-existing tests; the routing tests don't exist here).
  • PR fix: route contact_supervisor by stable session id (#35) #37 (b4db34e): 40/40 pass, 0 fail (14.8s). All four new tests added by this PR pass:
    • sessions publish their broker id for subagent inheritance
    • child supervisor tool routes by stable broker id when supervisor names collide ← the core fix
    • child supervisor tool uses inherited parent broker id when explicit id is absent ← inherited-id path
    • child supervisor tool falls back to target when broker id is stale ← stale-id fallback
  • These drive the real index.ts extension + a real spawned broker + real IntercomClient (via a harness, no LLM), exercising resolveSupervisorTarget/resolveSessionTarget against a live broker with two colliding session names. The fix is confirmed at the routing layer.
  • One caveat: in a first full-suite run, child supervisor tool preserves delivery failure reasons showed as ✖, but it passes in isolation and in a clean full re-run (40/40) — a test-harness socket/ordering flake (the new tests share a per-process broker socket and depend on cleanup ordering), not a logic regression. Worth tightening for determinism before merge.

2. Real-world repro (tmux, two real pi agents, just intercom via pi -e)

Loaded each branch's own extension with pi --no-extensions -e <worktree>/index.ts (so the installed 0.6.0 didn't interfere), on the real per-user broker. Started two real pi sessions both named repro-collide (collision), then a one-shot intercom send to:"repro-collide":

Identical ambiguity on both — i.e. this PR does not change the general send/ask-by-name path (only contact_supervisor). That matches the design, but it means the broad misdelivery vector tracked in #16 (any send to a non-unique name) is still open live and should be the next layer (broker-side: reject duplicate name on register + verify peer UID), independent of this PR.

3. Why I stopped short of a real-pi contact_supervisor repro

The inherited-id path needs the child to inherit PI_INTERCOM_SESSION_ID from the parent process env — i.e. the pi-subagents parent→child spawn flow, which is exactly the pi-subagents-side change noted as still needed. With "just intercom" there's no clean way to obtain the supervisor's full broker UUID: intercom list only renders short id prefixes, not the full UUID that PI_INTERCOM_SESSION_ID requires. So the authoritative validation of the fix's routing is the integration tests in (1); a full end-to-end real-pi demonstration will need the pi-subagents bridge to set PI_SUBAGENT_ORCHESTRATOR_SESSION_ID (or rely on inherited PI_INTERCOM_SESSION_ID).

Verdict

The fix is correct and validated at the routing layer (40/40, including colliding-name + inherited-id + stale-fallback against a real broker). End-to-end closure still needs the pi-subagents side. Separately, the general send-by-name misdelivery (#16) is reproduced live and remains unfixed by this PR — recommend tracking that as follow-up broker-side hardening (duplicate-name rejection + peer UID verification on register).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

contact_supervisor should route by stable session ID, not duplicate names

2 participants