Skip to content

feat: helix-org prototype with MCP, prompt-driven CLI, transports (webhook/email/github), and Role/Identity split#2286

Draft
philwinder wants to merge 51 commits into
mainfrom
feat/helix-org-prompt-driven-mcp
Draft

feat: helix-org prototype with MCP, prompt-driven CLI, transports (webhook/email/github), and Role/Identity split#2286
philwinder wants to merge 51 commits into
mainfrom
feat/helix-org-prompt-driven-mcp

Conversation

@philwinder
Copy link
Copy Markdown
Member

@philwinder philwinder commented Apr 25, 2026

Summary

Introduces helix-org, a standalone Go prototype for a hybrid human/AI organization system. This PR is a WIP/Draft collecting the core infrastructure, three transport implementations, MCP prompts (slash commands), and a set of runnable demos.

Core platform

  • Model Context Protocol (MCP) Integration: All mutations flow through MCP endpoints at /workers/{id}/mcp using Streamable HTTP transport. Tool visibility is grant-filtered per worker.

  • MCP Prompts (Slash Commands): Server-defined prompts registered in the MCP surface alongside tools. Each prompt has a name, title, description, arguments, and a render method that produces seed messages. Grant-gated (a prompt requires a tool to be visible). Auto-generated /help command that walks the registry at render time — new prompts automatically appear without manual updates. /role command drafts a new Role from a title hint, expands to full interview template, saves via create_role, then offers edits or chains to hire_worker.

  • Chat Typeahead: UI dropdown showing available slash commands on every keyup in the chat textarea. Server-side expansion in the chat bridge: SendHandler intercepts /name inputs, expands them from template before sending to claude. User sees original input in their bubble; claude gets the expanded text. Enables interactive discovery and reduces friction.

  • Enum Schema Hints: WorkerKind and TransportKind surface as enums in the JSON Schema that MCP clients see, enabling better autocomplete. Validation errors are self-documenting: unknown worker kind "foo" (valid: "human", "ai") so clients can self-correct.

  • Prompt-Driven CLI: New helix-org prompt subcommand spawns Claude Code with inline MCP configuration, enabling natural-language orchestration of the entire organization graph (Roles, Workers, Positions, Streams, Grants).

  • Role vs Worker Split: Separates the job (Role: owner-edited markdown, fanned out via update_role) from the person (Worker: per-hire identity, immutable). Allows live edits to job descriptions without touching identities.

  • Environment Provisioning & Push Dispatch: Each Worker gets an isolated environment directory. When events land on subscribed Streams, the system spawns a fresh Claude Code activation (one-shot) with that worker's MCP endpoint. Role and identity are stamped into the environment; the agent reads them and acts on the event trigger.

  • Canonical Message envelope: Every Event.Body is a domain.Message JSON (From / To / Subject / Body / ThreadID / InReplyTo / MessageID / Extra). The spawner renders every populated field into the activation prompt so Workers branch on transport-shaped metadata directly, without a separate read_events round-trip.

  • Simplified Grant Model: Grants are strictly (WorkerID, ToolName) pairs with no enforcement/scope logic. A grant is the permission; the agent is trusted to comply.

Transports

Streams own their I/O. Three transport kinds, each behind its own package:

  • Local (default): in-process pub/sub between Workers.
  • Webhook: bidirectional HTTP. Outbound POSTs to a configured URL on every published event; inbound deliveries are HMAC-verified and fanned out to subscribed Workers. Demo: secretary worker bridges an external webhook to internal channels.
  • Email (Postmark): outbound via Postmark API; inbound via Postmark's webhook with alias-based stream routing. Demo: two-worker email exchange (Sam <-> Lee).
  • GitHub (inbound only): single /github/webhook endpoint, HMAC-verified via X-Hub-Signature-256, fans out to every Stream whose repo + events whitelist matches. Acting on a repo (label, comment, review, open PR) is the Worker's job via gh in its Environment; publish on a github stream returns a loud error. Demos: doc-engineer reviews docs PRs and tags docs issues; github-engineer implements features on a GitHub Project v2 board.

Operational config

  • DB-stored, redacted-by-default: provider credentials live in transport.<kind> keys with explicit Secrets: []string declarations. helix-org config get redacts every declared secret; regression tests pin the spec for both transport.postmark and transport.github so a future refactor can't silently drop a redaction entry.

Design Philosophy

  • Data/text over code: Configuration lives in Role markdown and prompts, not Go logic.
  • Keep core generic: Tools define their own scope and schemas; new tools are addable without core changes.
  • No workflow in code: Orchestration logic lives in Role prompts, not implicit chains in the codebase.
  • Smallest thing that works: No speculative abstractions.

What's Inside

  • domain/: Core types (Role, Worker, Position, Stream, Grant, Event, Message, Transport) + enum validators
  • prompts/: Prompt interface, Registry, builtins (/help, /role)
  • store/sqlite/: GORM-driven SQLite with AutoMigrate (no raw SQL migrations)
  • tools/: 13 MCP tools + spawner + registry + JSON schema enum hints
  • server/: HTTP endpoints for reads + MCP mutation handler + jsonapi.org serialization + chat bridge with slash expansion
  • cmd/helix-org/: CLI with serve, bootstrap, chat, config subcommands
  • broadcast/ & dispatch/: Event bus for push-based worker activation
  • transports/postmark, transports/github: provider-specific I/O packages
  • demos/: getting-started, newsroom, webhook, email, github, github-engineer - runnable end-to-end
  • design/: design docs for the canonical envelope, the email transport, the github transport

Testing

All code is tested end-to-end:

  • Bootstrap -> role create -> worker hire -> event publish -> worker activation with MCP -> live-edit role -> behavior change
  • Prompt registry auto-generation (Help sees new prompts registered after it)
  • Chat slash expansion and typeahead filtering
  • Enum schema and validation error formatting
  • Transport unit tests for HMAC verification, payload mapping, redaction
  • make check passes: 0 lint issues, race detector clean

Next Steps (Post-WIP)

  • Add persistent authentication (currently all callers are treated as root owner)
  • Move provider credentials to per-Worker scope so different teams can use different GitHub identities / inboxes
  • Extend to support human operators at the REPL
  • Integrate with the broader Helix platform

WIP because: the core prototype is complete and tested, but we're still validating the design with the broader team before finalizing the API surface and documentation.

Co-Authored-By: Claude Haiku 4.5 noreply@anthropic.com
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com


Update — domain/runtime split + unified Helix session shape

  • Helix-specific Worker fields moved off domain.Worker to a sidecar WorkerRuntimeState keyed on (workerID, backend, key). Six methods dropped from the domain interface.
  • Runtime layer moved out of tools/: new agent/, agent/claude/, agent/helix/ packages plus helix/helixclient/. tools/ now holds only org-graph MCP tools.
  • SpecsPublisher -> agent.WorkspaceSync. Logical-name contract (role.md, identity.md); each backend translates to its own layout. Fixes the prior path mismatch where update_role wrote job/* but the activation mandate read .context/*.
  • agent.md moved from tools/templates/ to agent/policy.md and embedded as agent.Policy so both runtimes share one source.
  • Unified Helix session shape: helix.Runtime (zed_agent) and helix.AgentType (zed_external) are non-configurable constants used by every project apply and every /sessions/chat post. Drops chat.agent_type config key and the Runtime fields on the spawner/applier so the spawner and chat backend can no longer drift to claude_code.

Verified end-to-end against app.helix.ml (getting-started demo).


Demos

The PR now includes seven runnable end-to-end demos:

  1. getting-started — bootstrap, hire echo worker, publish/read events, live edit role.
  2. webhook — inbound/outbound webhook transport, secretary summarizes and forwards.
  3. email — bidirectional Postmark, two-worker support escalation with threading.
  4. newsroom — multi-worker publishing pipeline (editor, fact-checker, publisher).
  5. github — GitHub webhook inbound, multiple workers acting on issues/PRs via gh CLI.
  6. github-engineer — GitHub Project v2 board worker implementing features spec-style.
  7. manufacturing — NCR triage with Helix backend + comms-demo mock-channels: operator raises NCR → agent fans out (Slack/SMS/Email) → supervisor approves → agent confirms. Shows the hold pattern and the agent/human split.

Notes for reviewers

Manufacturing demo is the newest and was verified end-to-end against app.helix.ml:

  • Uses Helix-backed spawner + chat (not local claude).
  • Three webhook streams (supervisor DM, customer SMS, supplier email).
  • Role file bakes reference data (SPC, maintenance log, related NCRs, affected orders) so no external systems needed.
  • Two agent activations: NCR raised → fan out; supervisor reply → confirm & conditional send.
  • ~90 seconds on stage, pre-flight & setup ~5 minutes.
  • Demonstrates the core value: agent assembles evidence and drafts; humans make three decisions (not chase data across seven systems).

All demos pass make ci (formatting, lint, race tests).

@philwinder philwinder force-pushed the feat/helix-org-prompt-driven-mcp branch 2 times, most recently from d9a9c99 to 01e9388 Compare April 27, 2026 13:23
@philwinder philwinder changed the title feat: helix-org prototype with MCP, prompt-driven CLI, and Role/Identity split feat: helix-org prototype with MCP, prompt-driven CLI, transports (webhook/email/github), and Role/Identity split Apr 28, 2026
philwinder and others added 27 commits May 4, 2026 11:43
…ity split

Adds a complete proto-implementation of helix-org as a standalone Go project with:

- **MCP Integration**: All mutations flow through Model Context Protocol at /workers/{id}/mcp
  using Streamable HTTP transport. Tool list is grant-filtered per worker.

- **Prompt-Driven CLI**: New `helix-org prompt` subcommand spawns Claude Code with inline
  MCP config, enabling natural-language orchestration of the entire org graph.

- **Role vs Worker Split**: Roles are job descriptions (owner-edited markdown, fanned out
  via update_role). Workers are people in positions (per-hire identities, immutable).

- **Environment Provisioning**: Each Worker gets an isolated environment directory with:
  - role.md (propagated via update_role)
  - identity.md (per-hire, immutable)
  - agent.md (fixed stub: "Read role.md and identity.md, act on trigger")
  - mcp.json (dynamically generated per activation)

- **Push-Dispatch Event Loop**: When events land on subscribed channels, the system spawns
  a fresh Claude Code instance (one-shot activation) with that worker's MCP endpoint.

- **channel_members Tool**: Read-only MCP tool that lists workers subscribed to a channel,
  enabling Workers to query org membership without side effects.

- **Simplified Grant Model**: Grants are now strictly (workerID, toolName) pairs. Removed
  enforcement/scope entirely—a grant IS the permission, and the agent is trusted to comply.

- **Humanized Demos**: Getting-started and newsroom demos now use prompt-based CLIs with
  natural-language orchestration instead of raw API calls.

Major components:
- domain/: Core types (Role, Worker, Position, Channel, Grant, Event)
- store/sqlite: GORM-driven SQLite storage with AutoMigrate
- tools/: 13 MCP tools (create_role, hire_worker, etc.) + spawner
- server/: HTTP endpoints + MCP handler + jsonapi.org serialization
- cmd/helix-org: CLI with serve, bootstrap, prompt subcommands
- broadcast/dispatch: Event bus for push-based activation
- demos/: Two runnable examples (getting-started, newsroom editorial team)

Design principles embedded:
- Prefer data/text over code (config in Role markdown, not Go)
- Keep core generic (tools define their own scope and schemas)
- No workflow in code (agents orchestrate via prompts, not implicit chains)
- Write smallest thing that works (no speculative abstractions)

All code tested end-to-end: bootstrap → role create → worker hire → event publish →
worker activation with MCP → live-edit role → behaviour change on next activation.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
A minimal three-Worker demo that produces an opinionated MLOps
newsletter with a fresh angle each issue. Shows the prompt-driven
philosophy at its tightest:

- Only files on disk are 3 short role markdown files (~25 lines each)
- A single helix-org prompt call creates the roles, positions,
  channels, and hires the team
- Editor picks the angle, researcher hunts for matching news,
  journalist crafts the narrative
- Re-run with a different brief and the same team produces a
  completely different angle on the same broad subject

Tested end-to-end: two briefs produced two distinct angles
("platform team tax" vs "feature stores as MLOps' open secret
graveyard") with named subjects (Stitch Fix, Chime, Modal Labs,
Tecton) — proving the angle truly varies per brief.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Adds a new \`helix-org tail [glob...]\` CLI plus the \`GET /tail\`
endpoint it talks to. Lets the human watch the cascade of a running
team in real time without curl + jq incantations.

- Defaults to '*' (all channels). Globs use Go's path.Match:
  'c-*', 'c-news?', 'c-newsletter'. Multiple globs unioned.
- Long-polls (default 30s wait, configurable via --wait).
- Pretty output: HH:MM:SS  channel  source  body, with subsequent
  body lines indented under the body column. ANSI colour when
  stdout is a TTY; --no-color to disable.
- New broadcast.Broadcaster.SubscribeAll for wildcard wakes, so
  channels created mid-tail (e.g. by an editor's hire trigger)
  also wake the tail loop.
- New store.Events.ListSince(channelIDs, since, limit) returning
  oldest-first events strictly newer than the named event.
- URL surface designed to extend: bare globs are channel IDs
  today; future namespace prefixes (channel:c-*, activation:w-*)
  can be added without breaking compatibility.

Tested: store + broadcaster unit tests, server endpoint test
covering glob match, since cursor, and default match. Live-tested
against the running mlops-newsletter demo (history backfill, live
event arrival via long-poll, multi-glob union).

Newsletter README updated to use \`helix-org tail\` instead of curl.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Both demos previously asked the user to either tail per-Worker
activation.log files or curl the channel events endpoint. Replace
both with helix-org tail:

- newsroom: drop "tile seven terminals" instruction in favour of one
  tail window (default '*' = all channels). Recommend per-channel
  globs (tail c-bullpen, tail c-recruiting) for narrower focus.
  "What to point at during the demo" callouts now name the exact
  tail command to run.
- getting-started: replace tail -f activation.log + curl-and-jq
  round-trip check with helix-org tail. Keep activation.log as a
  parenthetical for debugging the worker's internal claude stream.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…h Transport extensibility

## Abstraction Simplification

- **Channel → Stream**: Unified the Channel concept into Stream, removing redundant abstraction. Streams now hold the single named pub/sub channel.
- **Stream → Subscription**: Renamed the worker-channel edge from Stream to Subscription using a composite key (worker_id, stream_id). This eliminates synthetic stream IDs and clarifies the semantic: a subscription is a worker's interest in a stream, not the stream itself.
- **Transport Field**: Added optional Transport field to Stream to support future integrations (Slack, email, webhook, RSS, tick). Defaults to "local" (in-process pub/sub). Designed to be extensible without core changes.

## Architecture Changes

### Domain Layer (domain/)
- Added `transport.go`: Transport struct with Kind (enum) and optional Config (json.RawMessage)
- Added `subscription.go`: Subscription struct with WorkerID, StreamID, CreatedAt (composite key, no synthetic ID)
- Updated `stream.go`: Renamed from Channel; now holds ID, Name, Description, CreatedBy, CreatedAt, Transport
- Updated `event.go`: Changed ChannelID field to StreamID
- Updated `id.go`: Removed ChannelID type

### Store Layer (store/sqlite/)
- Added `subscription.go`: Subscriptions repository with Create, Delete, Find, ListForWorker, ListForStream
- Updated `stream.go`: Renamed from channel.go; added TransportKind and TransportConfig columns
- Updated `event.go`: Changed column references from channel_id to stream_id; JOINs on subscriptions instead of streams
- Updated `streams_and_events_test.go`: Renamed from feed_and_channels_test.go; comprehensive test coverage for new abstractions
- Updated `store.go`: Renamed Channels → Streams; replaced Streams → Subscriptions

### Broadcast & Dispatch (broadcast/, dispatch/)
- Renamed all channelID references to streamID throughout
- Updated method signatures to use StreamID instead of ChannelID

### Tools Layer (tools/)
- Added `create_stream.go`: New tool taking optional transport argument
- Added `read_events.go`: Replaces read_feed.go; queries subscriptions then long-polls streams
- Added `read_*.go` (streams, grants, positions, roles, workers): MCP tools replacing HTTP read endpoints
- Updated `subscribe.go`, `unsubscribe.go`, `publish.go`: Use streamId and Subscriptions API
- Renamed `channel_members.go` → `stream_members.go`; calls Subscriptions.ListForStream
- Updated `spawner.go`: Trigger struct uses StreamID; updated event notification text

### Server & HTTP (server/)
- Moved all read endpoints to MCP tools; `/workers/{id}/mcp` now handles mutations only
- Updated `tail.go`: Long-poll attributes renamed to streamID; calls store.Streams.List
- Simplified `server.go`: Only MCP mutation handler and tail endpoint remain
- Deleted: bootstrap.go, channels.go, environment.go, feed.go, grants.go, positions.go, roles.go, workers.go

### Bootstrap & CLI (bootstrap/, cmd/)
- Updated default tool grants to reference new tool names
- Updated vocabulary throughout: c- prefix → s- prefix for stream IDs

### Demos (demos/)
- Updated all demo READMEs and role definitions from channel to stream vocabulary
- Added `mlops-newsletter/hire.txt`: Example hire prompt

## Benefits

1. **Clearer semantics**: Stream is what it says (a named pub/sub channel), Subscription is the worker's interest in it
2. **Extensibility**: Transport field allows future integrations without core changes
3. **Reduced complexity**: No synthetic stream IDs, no redundant Feed/Channel/Stream layers
4. **MCP-first design**: All mutations now routed through MCP, read endpoints are MCP tools
5. **Smaller server surface**: HTTP endpoints only for authentication + tail streaming

## Testing

All 57 test cases pass with race detector enabled across all packages:
- domain: Subscription and Transport validation
- store/sqlite: Subscriptions repository operations, stream queries with JOINs
- broadcast: Pub/sub with streamID
- server: Tail long-poll with stream glob matching
- tools: All 13 MCP tools with varied schemas

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `/tail` HTTP long-poll endpoint and `helix-org tail/prompt/client`
CLI subcommands are now unnecessary: all human observation and
orchestration flows through MCP via `claude` sessions directly.

**Removals:**
- Delete server/tail.go (HTTP long-poll handler)
- Delete server/jsonapi.go (only used by tail)
- Delete cmd/helix-org/tail.go (CLI client)
- Delete cmd/helix-org/prompt.go (spawner stub)
- Delete cmd/helix-org/client.go (envelope types)
- Remove mux route for GET /tail
- Remove Broadcaster.SubscribeAll/UnsubscribeAll (dead after tail removal)
- Simplify serve/bootstrap doc: "one HTTP endpoint: /workers/{id}/mcp"

**Updates:**
- demos/getting-started/README.md: replace helix-org tail with claude
  watcher prompt using subscribe + read_events(wait=60)
- demos/mlops-newsletter/README.md: same pattern
- demos/newsroom/README.md: same pattern, plus add recruiter role
  "On hire" trigger to handle stream race condition
- CLAUDE.md: clarify that human observation uses MCP (no /tail endpoint)
- tools/publish.go: comment fix

**Fixes:**
- cmd/helix-org/bootstrap.go: make installClaudeMCPEntry idempotent
  by removing stale entry before adding (re-running bootstrap between
  demo wipes no longer fails)
- demos/newsroom/roles/recruiter.md: add "On hire" subscribe + retry
  guidance matching researcher/journalist (Renée was getting hired
  before Maya's hire activation created s-recruiting)

All three demos tested end-to-end: bootstrap → scaffold → hire cascade
→ event publishing → role live-edit → behavior change confirmed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add helix-org chat — an interactive claude session pointed at a Worker's
MCP endpoint (default w-owner). Supports --new, --resume, --worker flags,
and session persistence via claude's per-cwd store with --continue.

Update all three demos to show only the interactive chat flow:

- getting-started: condensed from two-terminal to one, removed
  --install-claude-mcp, Bootstrap → chat → type prompts as w-owner
- mlops-newsletter: removed separate watcher terminal, team setup and
  brief publishing now happen inline in chat
- newsroom: removed multi-terminal watcher, all interaction happens
  in the bootstrap + chat session

Demos now focus on the actual user experience (typing into a chat)
which mirrors a real UI-based server. Removed background concepts,
multi-terminal complexity, and one-shot (-p) mode from demos.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
helix-org chat unconditionally passed --continue, so the first run in a
fresh directory exited with "No conversation found to continue" before
the user could type anything. Probe ~/.claude/projects/<encoded-cwd>/
for any .jsonl session file and only pass --continue when one exists;
otherwise let claude start fresh, which still seeds a session for the
next run to resume.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace claude's --continue flag with --resume <sessionId>, looked up
by reading the most-recently-modified .jsonl in the cwd's session
store and parsing the sessionId from its first line.

--continue rejects sessions whose log ended on certain non-user events
(e.g. an agent-name marker from a prior interrupted exit), failing
with "No conversation found to continue" even when the session is
fine to resume by ID. This blocked re-entry into chat in the demo
directories whenever a previous chat had exited mid-flight.

If no prior session exists, claude is launched without a resume flag
and starts fresh — matching the desired first-run behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two new MCP tools for worker-to-worker communication:

- dm: High-level tool bundling create_stream + invite_workers + publish
  into a single call. Creates per-pair streams with deterministic naming
  (s-dm-<sortedIDs>) so conversations reuse the same stream regardless of
  direction. Complements lower-level streaming tools with a high-level,
  autonomously-discoverable entry point.

- invite_workers: Subscribes one or more workers to a stream in a single call.
  Idempotent — re-inviting already-subscribed workers is a no-op. Enables
  batch subscription workflows without manual loop.

Both tools are granted to the owner during bootstrap and tested end-to-end
(dm stream reuse across directions, idempotency, self-DM rejection, unknown
worker rejection).

Updated demo: newsroom step 6 now uses dm instead of manual 4-step workflow,
and updated comments in publish/subscribe to point to dm as the high-level
entry point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces on-disk activation.log/jsonl files with a per-Worker activation
Stream. Assistant text, tool calls, tool results, and lifecycle markers
are now Events on s-activations-<workerID> — same primitive as every
other read in the system.

- hire_worker creates the activation Stream at hire time and subscribes
  the hiring Worker. The new Worker themselves is intentionally NOT
  subscribed (would loop the dispatcher otherwise).
- Spawner publishes one Event per atomic message segment (assistant
  text, tool_use, tool_result, system init, run result), bracketed by
  synthetic '=== activation: <trigger> ===' and '=== exit: <err> ==='
  markers. Append + Notify only — the dispatcher is skipped so per-
  message events can't re-trigger subscribed AI Workers.
- worker_log tool bundles subscribe + read_events scoped to one
  Worker's activation Stream. Mirrors the dm pattern: a friendly
  shortcut the agent can reach for from a 'show me what w-X is doing'
  instruction without knowing the stream-naming convention.

Persistence between activation runs is left to the Role: if a Worker
needs cross-run memory, the Role tells it to write to history.md and
read it back on the next activation. No system feature added.

Demos updated to showcase the new affordances:
- getting-started: step 3 uses worker_log to confirm hire activation
  finished, eliminating the cross-terminal log-watching requirement.
- mlops-newsletter: step 4 adds a peek-inside tip using worker_log.
- newsroom: adds a 'Watch a Worker work' step parallel to the dm
  step, plus a 'What to point at' bullet for fact-checker blocks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds inbound webhook support to helix-org Streams. Each Stream can declare
transport.kind="webhook"; POST requests to /webhooks/<streamID> append the
request body as an Event, trigger the dispatcher to wake subscribed Workers,
and notify long-poll observers.

Key changes:
- domain/transport.go: add TransportWebhook kind with docstring
- server/server.go: add Dispatcher interface, update New() signature
- server/webhook.go: HTTP POST handler for /webhooks/{streamID}
- server/webhook_test.go: 9 test functions covering edge cases and concurrency
  * happy path, missing stream, wrong transport, empty body
  * size limits, nil broadcaster/dispatcher, UTF-8 handling
  * 25 concurrent POSTs, stream isolation
  * race-detector clean with -count=20

Also fixes critical :memory: SQLite concurrency bug:
- store/sqlite/sqlite.go: pin MaxOpenConns(1) for in-memory databases
- Root cause: each connection gets its own private :memory: DB
- Impact: concurrent HTTP tests now see consistent state

New demo:
- demos/webhook/README.md: 5-step specification (hire secretary, POST payload, read back)
- demos/webhook/roles/secretary.md: secretary subscribes to s-inbox, summarizes
  incoming payloads, DMs summaries to owner

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the webhook transport so a Stream can be configured to POST
each appended Event to an external URL. A Stream can now be inbound-
only (current behaviour, no config), outbound-only (config sets
outbound_url), or both at once — the dispatcher fires emit on every
append regardless of origin (webhook handler, publish tool, dm tool).

Key changes:
- domain/transport.go: WebhookConfig type with OutboundURL field;
  Validate now parses webhook config and rejects non-http(s) URLs,
  relative URLs, and empty hosts before stream creation
- dispatch/dispatcher.go: emitOutbound runs on every Dispatch, looks
  up the Stream's transport, and if outbound_url is set fires an
  async POST with X-Helix-Stream and X-Helix-Event headers; bounded
  by 5s timeout so slow targets don't stall publishes
- domain/transport_test.go: 14 cases covering Validate happy paths
  and rejection paths, plus WebhookConfig parse round-trip
- dispatch/dispatcher_test.go: 12 tests covering emit happy path,
  inbound-only no-emit, local-no-emit, missing stream, 4xx/5xx
  tolerance, unreachable host, slow target timeout, 25 concurrent
  emits, binary payload round-trip, malformed stored config, store
  lookup errors, and content-type/path preservation
- server/webhook_test.go: TestWebhookBridgesInboundToOutbound wires
  the real dispatcher end-to-end and proves an external POST to
  /webhooks/<streamID> bridges to an outbound POST when the same
  stream has both directions configured

Demo narrative updated: secretary now subscribes to s-inbox, DMs the
owner with the summary, and publishes the summary to s-outbox which
is configured with outbound_url. A 4-terminal flow with a local nc
catcher shows the full inbound -> summarise -> outbound bridge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds domain.Message — a transport-agnostic envelope (From, To, Subject,
Body, ThreadID, InReplyTo, MessageID, Attachments, Extra) — and migrates
every event-producing path to encode it as JSON in Event.Body. There is
one storage shape going forward; future transports (email, Slack,
queues, feeds) translate at their boundary, Workers see the same
structure regardless of source.

Identity convention: From/To carry transport-native identifiers
verbatim (WorkerIDs when known, alice@x.com / U0123 / +15551234 / etc.
otherwise — no prefixes). Empty From means "no human originator" for
data feeds and triggers.

Code changes:
- domain/message.go: Message + Attachment types, Encode/Decode helpers,
  Event.Message() parser, NewMessageEvent constructor
- tools/dm.go: produces Message{From: caller, To: [recipient], Body}
- tools/publish.go: accepts optional to/subject/threadId/inReplyTo/
  messageId/bodyContentType/attachments args; defaults From=caller
- server/webhook.go: wraps inbound POST bodies into Message{Body: raw}
- tools/spawner.go: activation log entries wrapped as Message{From:
  workerID, Body: line}; Trigger gains a Message field
- dispatch/dispatcher.go: parses Event.Body once, passes parsed
  Message and visible Body text to the spawner
- tools/read_events.go: surfaces Message.Body as `body` (visible text)
  and the full envelope as `message` — Roles needing structure read
  the latter; existing role prompts that read `.body` continue to work

Tests updated to use Event.Message() instead of comparing raw Body
strings; full make check passes (lint clean, race detector clean).

Demos verified end-to-end after the refactor:
- getting-started: hire echo worker, publish "hello", echo replies,
  live-edit role, "loud: HELLO" — all four steps green
- webhook: secretary summarises inbound POST, DMs owner, publishes to
  s-outbox, outbound emitter POSTs Message JSON to nc:9000 catcher
  (catcher now sees structured envelope, not raw text — README
  updated to describe this)
- mlops-newsletter: full editor → researcher → journalist → editor
  cascade produces a complete newsletter on s-newsletter
- newsroom: 7 roles, 2 positions, 2 hires (Maya + Renée), all
  activations clean — message machinery validated without running
  the real-PR cascade

Design doc at design/messages.md captures the convention, the per-
transport mapping table for future transports, and open questions
to resolve as new transports ship.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements the email transport, the operational-config infrastructure
it sits on, and a runnable customer-service demo (Sam) that emails
land at and reply through.

Verified end-to-end: simulated inbound POST → +sam alias routed →
Sam's claude activation → reply published to s-support → outbound
emit POSTed to Postmark's /email API → real email delivered to
phil@winder.ai. ~22s wall-clock end-to-end on a cold activation.

Operational config (design/config.md):
- New configs table (key/value/audit), store.Configs interface,
  sqlite impl. Auto-migrated alongside the rest.
- config.Registry: subsystems Register a Spec (type, default,
  required, secret paths, description). Reads/writes go through it
  so the CLI's view matches what consumers actually consume.
- helix-org config CLI: set/get/list/delete. Opens the SQLite file
  directly (same path as bootstrap), so config writes commit and
  the running server picks them up on its next read — live updates
  without restart, and without an LLM ever touching the values.
  Secrets redacted by default; --reveal-secrets opts in.
- Strict separation: org-graph mutations stay on MCP; operational
  config (transport creds, future model selection, etc.) is
  CLI-only. Same SQLite file, two access paths, two threat models.

Email transport (transports/postmark):
- domain.TransportEmail kind + EmailConfig{Alias} stream config.
  Validate enforces lowercase alphanumeric/dash/underscore aliases
  so they compose safely into <hash>+<alias>@... or <alias>@Domain.
- Inbound HTTP handler at /email/postmark: parses Postmark's JSON,
  extracts the +alias suffix from OriginalRecipient, finds the
  matching Stream by alias, builds a domain.Message envelope (From,
  To, Subject, Body, MessageID, InReplyTo, ThreadID from headers,
  Attachment metadata), appends the event, fires the dispatcher.
- Outbound emitter: when a Worker publishes to an email Stream, the
  dispatcher invokes the transport's Emit, which composes a
  Postmark /email POST (From=server-config, To from Message.To,
  optional Reply-To at <hash>+<alias>@... for threading,
  In-Reply-To/References headers when set).
- Server-level config (token, inbound, from, optional
  disable_reply_to) lives in transport.postmark; per-stream
  config is just {"alias":"sam"}. The transport joins the two at
  runtime, so rotating creds is one CLI call with no restart.
- disable_reply_to flag: workaround for Postmark's pending-approval
  same-domain restriction (Reply-To at inbound.postmarkapp.com is
  treated as a cross-domain recipient and blocks the send). With
  it on, outbound works but customer replies won't loop back into
  helix until the account is approved — documented in the demo
  README as the path to closing the loop.

Dispatcher loop guard:
- Skip outbound emit when event.Source == "" (system-emitted, i.e.
  inbound from this transport's own webhook). Without this, a
  bidirectional Stream (one alias, both inbound and outbound) would
  echo every inbound message straight back out to itself.
  Worker-published events (Source != "") still emit normally.
- Replaced TestWebhookBridgesInboundToOutbound with
  TestWebhookInboundDoesNotEcho to lock the new behaviour in.

Server:
- Server.Handler now takes optional Routes so transports can mount
  their own inbound endpoints without server.go importing them. The
  email transport's /email/postmark gets mounted from cmd/helix-org/serve.go.

Demo (demos/email):
- README.md walks through the whole flow: signup → server token →
  Sender Signature → inbound hash → cloudflared/ngrok tunnel →
  Postmark InboundHookUrl → helix-org config set transport.postmark
  → bootstrap → hire Sam → send a real email. Includes the
  pending-approval workaround and the path to closing the
  customer-reply loop once approved.
- roles/customer-service.md: Sam reads inbound, drafts a 2–4
  sentence reply, escalates rather than fabricates, signs off
  '— Sam' on its own line.
- workers/sam.md: identity stub (real first name, no brand voice,
  knows when he doesn't know).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates the email demo to show two workers — customer service
(Sam, alias=sam) and engineering (Lee, alias=engineer) — handling
a customer query that requires escalation. Every leg of the
four-hop cascade goes through Postmark; both Streams are
bidirectional; threading via Message-Id stitches the whole thing
into one logical conversation.

Verified e2e in ~2:15 wall-clock:

  customer → Sam   (Postmark inbound  → s-support)
  Sam → Lee        (Postmark send + inbound → s-engineer)
  Lee → Sam        (Postmark send + inbound → s-support, [eng] prefix)
  Sam → customer   (Postmark send → real inbox)

Three Postmark sends, all returned status=200; same ThreadID flowed
through every event.

Changes:
- demos/email/roles/customer-service.md: Sam now branches on
  Subject. `[eng]` prefix means Lee replied → walk s-support
  history by ThreadID to find the customer's original query, then
  reply to that customer with a paraphrased version of Lee's
  answer. Otherwise it's a customer query → answer directly when
  simple, forward to <hash>+engineer@inbound.postmarkapp.com when
  technical. ThreadID preservation is critical for the lookup.
- demos/email/roles/engineer.md (new): Lee subscribes to
  s-engineer, drafts 3-6 sentence technical answers, replies to
  Sam at the +sam alias with `[eng] Re:` subject prefix and
  preserved ThreadID.
- demos/email/workers/lee.md (new): identity stub.
- demos/email/README.md: rewritten "Run the demo" section for the
  two-worker flow. Adds an explicit `<INBOUND_HASH>` sed
  substitution step (workers know each other's addresses via
  role text). Drops the disable_reply_to workaround now that the
  Postmark account is approved. New "What this shows" bullets
  call out workers-as-email-participants and ThreadID-as-spine.
- demos/email/demo.cast: re-recorded asciicast of the four-hop
  cascade.

The mp4 (demos/email/demo.mp4) is regenerated locally but stays
gitignored, same convention as demos/getting-started/demo.mp4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the activation prompt only carried Body. The Worker had to
call read_events to learn Subject, From, ThreadID, Extra — exactly the
round-trip that caused the docs-engineer to misroute issue #3 to PR #2
during the github demo's E2E run.

renderTrigger now formats every populated envelope field into the
prompt, omitting empties for cleanliness. The Trigger.Body field is
dropped; callers pass the full Message instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GitHub POSTs to a single /github/webhook endpoint; the transport
HMAC-verifies via X-Hub-Signature-256 against the installation's
webhook_secret, then fans the delivery out to every Stream whose
Config.Repo matches repository.full_name and whose Config.Events
whitelist contains the X-GitHub-Event header value.

Inbound only — acting on a repo (label, comment, review, open PR) is
the Worker's job via gh in its Environment. publish on a github stream
returns a loud error rather than silently no-op'ing.

The Message envelope is mapped from the upstream payload verbatim:
Subject = issue/PR title, Body = body, ThreadID = "#<number>",
MessageID = X-GitHub-Delivery, From = sender.login, Extra = the full
payload with one synthetic top-level "event" key injected from the
X-GitHub-Event header so Workers can branch on event type from Extra
alone.

Per-stream config is just routing identity (repo, events). Provider
credentials (token, webhook_secret) live in server-level config under
transport.github with both fields registered as Secrets so config get
redacts them. Regression tests pin both names against silent leaks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Walkthrough demo of the doc-engineer role: spin up a real cloudflared
tunnel, register the webhook, hire the Worker, then exercise the
issues + pull_request + pull_request_review + issue_comment paths
against a live GitHub repo. README narrates each step; demo.cast is
the asciinema recording.

Design doc covers the identity model (no machine user; gh auth token
gives the engineer the operator's own identity for now), the inbound-
only decision, the message envelope mapping, and the operational
config / setup-via-chat flow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move Role.Content and Worker.IdentityContent from disk-based markdown files
(role.md, identity.md) into the SQLite domain, enabling future evolution to
remote workspaces and eliminating hardcoded filename coupling.

## Key changes

- Domain: Worker interface now exposes IdentityContent() string method; both
  HumanWorker and AIWorker carry immutable identity field. Constructor signatures
  updated to accept identity content at hire time.
- Store: Added Update(ctx, worker) method to Workers interface, implemented via
  GORM with identity_content column in worker table.
- Tools:
  - update_role: Simplified to single DB write (removed 50-line fanOut loop).
  - update_identity: New tool, mirrors update_role's shape.
  - hire_worker: Creates DB records only; no env files at hire time.
  - spawner: Added projectEnv() function that lazily writes role.md, identity.md,
    agent.md to env at activation time, reading from DB.
- Bootstrap: Seed owner Worker with starter identity text; grant UpdateIdentityName.
- UI: Added /ui/org org-chart master-detail view. handleOrgIdentitySet() now
  calls Workers.Update() instead of WriteFile(). Removed disk path tracking.
- Tests: Updated 12+ call sites with identity parameter; rewrote
  TestUpdateRoleFanOut as TestProjectEnvWritesCanonicalState to verify
  lazy-projection contract.

## Why

Hardcoded filenames across hire_worker, tools, spawner, and UI meant the system
could not evolve to support remote workspaces or other workspace configurations.
Making the DB the source of truth and performing projection at activation time
(not at hire time) lets future work extend to remote/ephemeral environments
without changing tool or bootstrap logic.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…ahead

Add MCP prompts — server-defined slash commands gated by tool grants:

- New prompts package: Prompt interface, Registry (mirrors tools.Registry),
  and builtins (Role and Help).
- /help: Self-introspecting command that walks the registry at render time
  and produces a markdown list of every other prompt. Adding a new prompt
  automatically lights it up in /help without touching this file.
- /role: Drafts a new Role from a title hint, expands to full interview
  template, saves via create_role, then offers edits or chains to hire_worker.
- Server-side expansion in chat bridge: SendHandler intercepts inputs
  starting with /,expands them from template before sending to claude.
  User sees original input in their bubble.
- Chat typeahead: CommandsHandler (POST /ui/chat/commands) renders
  matching prompts as HTML buttons on every keyup. Clicking fills the
  textarea and focuses it.
- Enum schema constraints: WorkerKind and TransportKind now surface as
  enums in JSON Schema so MCP clients see valid values in tool input
  autocomplete.
- Self-documenting validation: WorkerKind.Validate() formats errors as
  'unknown worker kind "foo" (valid: "human", "ai")' so clients can
  self-correct without reading source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… polling, and tool visibility

Major changes:

- **Prevent cascading AI-worker activations**: Added SourceKind classifier (human/ai) to Trigger;
  workers now deprioritize or skip AI-origin events per agent.md discipline rules. Dispatcher
  skips self-reactivation on publish. Tests pin self-skip and source_kind behavior.

- **Fix SSE newline rendering**: Split markdown fragments across multiple `data:` lines (SSE
  spec compliant) instead of collapsing newlines. Browser's EventSource rejoins with \n,
  preserving fenced code blocks and list formatting.

- **Add markdown rendering**: Integrated goldmark for safe HTML rendering of Role/Activity text.
  Added .md CSS class for styling (lists, code, links, headers, blockquotes). Goldmark runs
  in safe mode; raw HTML is omitted (not escaped). Tests verify bold/lists/code/headings render
  and <script> tags are dropped.

- **Real-time polling UI**: Added htmx polling (every 5s) to org chart, streams list, and
  events feed. Fixed htmx attribute inheritance breaking child click handlers by adding
  hx-disinherit="*" on poll parents. Implemented unified all-streams firehose when no stream
  selected.

- **Tool grant visibility**: Org detail now shows each Worker's granted tools as alphabetically-
  sorted chip badges. Schema exposes MCP tool names; UI surfaces them without requiring a
  separate tools query.

- **System prompt templates**: Moved agent.md and owner_role.md to embedded templates so
  content can be edited via /ui/org and doesn't require code changes. Agent.md teaches AI
  workers that human constraints don't apply and defaults to action. Owner role teaches
  delegation, polling pattern, and stream subscription during hiring.

- **Hiring playbook refinement**: Updated role template to instruct on stream provisioning:
  list_streams → create if missing → subscribe. Emphasized "Worker without streams is
  half-hired."

- **Title selection priority**: Sessions now track separate ai-title events and prefer them
  over user input for recents display (custom > ai-generated > fallback).

- **Model/effort defaults**: Changed claude.model default to "sonnet" for cost predictability;
  added claude.effort default "low" to minimize extended-thinking budget. Both configurable
  via registry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… docs

- Update 'make run' to automatically invoke 'helix-org serve' with sensible defaults (./envs, ./helix-org.db, :8080) rather than bare 'go run'
- Enhance 'make clean' to kill running servers and remove local state (DB, envs) in one command
- Improve CLAUDE.md to document these defaults and explain when/why to use each target
- Clarify that ad-hoc 'go' commands should be avoided in favor of make targets to ensure consistent build/test environment

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
The dispatcher now coalesces events that arrive while an activation is
running, passing them to the Spawner as a single batched []Trigger
instead of spawning N separate claude processes. This collapses webhook
cascades (e.g. five GitHub events from a worker's own action against a
shared auth token) into one follow-up activation.

Implementation:
- Spawner signature: trigger -> []Trigger
- Dispatcher: per-worker queue (pending slice + running flag) replaces
  per-worker mutex. enqueue() appends and starts runner if needed;
  run() drains queue in a loop until empty, calling spawner once per
  drain with the accumulated batch.
- buildPrompt() renders multiple triggers as [1/N], [2/N], etc. when
  there's more than one, so agents see them as a numbered list.
- New test proves coalescing: block first activation, publish 3 more
  events, release -> expect [e-1] then [e-2, e-3, e-4], not 5 separate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The github-engineer demo includes:
- Full README with prerequisites, setup steps, and teardown instructions
- Runnable end-to-end example of a software engineer worker on GitHub
- Role documentation for handling task lifecycle, review feedback, and board state

Updates to prerequisites:
- Document required gh token scopes (project, read:project)
- Document port availability requirement for helix-org server
- Add instructions for creating and linking a GitHub Project v2 board

Updates to software-engineer role:
- Add dm tool to MCP surface (was: subscribe, read_events)
- Add constraint: escalate setup-level problems to owner via DM instead of failing silently
  (covers: gh auth issues, missing board, repo unreachable, missing tools, discovery failure)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ession reuse

End-to-end working chat + dispatcher → Helix zed_external desktops with
the org-graph MCP attached. Each Worker (human or AI) gets its own
project + agent app + git repo at hire time; new activations reuse the
same long-lived chat session so follow-ups complete in seconds instead
of paying a 3-minute cold-start every turn.

Key fixes that came out of debugging against app.helix.ml:

- HelixProjectApplier creates a Helix-internal git repo, seeds it with
  a README so `main` exists, creates the `helix-specs` branch, and
  pushes role/identity to `workers/<id>/.context/` on that branch.
  The desktop's startup script then materialises the helix-specs
  worktree at `~/work/helix-specs/` automatically.
- Project-apply does NOT auto-create a repo; without one the desktop's
  startup script bails with "No repositories were cloned successfully"
  and Zed never launches.
- StartChatRequest now sends `app_id` so `session.ParentApp` is set —
  Helix's external MCP proxy bails with "session has no associated
  agent" otherwise, and Zed never sees the helix MCP.
- StartChatRequest sends `organization_id` (Helix doesn't auto-populate
  it from project_id; without it desktop quota falls back to the
  personal-org limit of 2).
- Streaming-aware StartChatWithStatus: reads the SSE response, returns
  the session ID + a flag indicating whether the WS-not-ready race
  fired. Detached upstream context so the request survives past the
  caller's request ctx closing.
- warmupAndRetry (chat bridge) and warmupSession (spawner) re-POST the
  same prompt every 8–20s until the dispatch lands. Helix's
  waitForExternalAgentReady checks connections globally, so the wait
  passes immediately when other users have desktops up; the per-session
  sendCommand then fails fast and Helix marks the interaction error
  (auto-wake won't recover state=error). The retry pattern absorbs
  the race client-side.
- Spawner reuses worker.HelixSessionID() across activations. Each
  fresh session spawns a fresh container; reuse keeps it warm.
- Owner-role hiring playbook updated: hire_worker MUST include
  `grants` matching the Role's Tools section. The MCP tool list is
  frozen by Helix's external-MCP-proxy cache for the lifetime of the
  first session, so granting later means the Worker can't see the
  tools until session restart.
- Runtime switched from claude_code → zed_agent. claude_code talks
  directly to Anthropic and needs an API key wired into the container
  (which we don't); zed_agent routes inference back through Helix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n agent

Live role-edit (update_role) now propagates to running Workers without
requiring a session restart:

- HelixProjectApplier.Ensure no longer early-returns on the fast path
  before pushing files. The expensive ApplyProject / CreateGitRepo /
  AttachRepo steps still skip when the project exists, but
  agent.md / role.md / identity.md are re-pushed to the helix-specs
  branch on every Ensure call. CreateBranch and PutFile are idempotent
  and cheap, so the cost is two HTTP calls per activation.
- Spawner activation prompt (helixSpecsMandate) now ALWAYS runs
  `git pull --ff-only origin helix-specs` at the start of every
  activation (fall-through to `git worktree add` only when the worktree
  is missing). Without this, the agent reads the worktree's stale
  on-disk copy and the new role text never takes effect.
- Activation prompt now also reads `.context/agent.md` first as the
  org-wide entrypoint, then role.md, then identity.md.
- AgentMD threaded through HelixSpawnerConfig and HelixProjectApplier
  so the spawner+chat-backend both seed the org policy on apply.

Validated end-to-end via demos/getting-started:
  publish hello → echo: hello (initial role)
  update_role r-echo → "loud: <BODY UPPERCASED>"
  publish hello → loud: HELLO ← live-edit takes effect

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d channel discipline

- Add **On anything else. Stay quiet** block (required in every Role) to establish
  default behavior: don't post unless a trigger above matches and output is something
  a human asked for.
- Require explicit output channel per trigger (`Post to s-{channel}` or "no post").
- Add constraint requiring workers to name the trigger before acting, enabling
  audit-log inspection and forcing commitment to a frame.
- Clarify drafting instructions so LLM-generated Roles include these elements.

This addresses the "chatty colleague" failure mode at the template level: models now
have explicit permission boundaries and must name their reasoning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@philwinder philwinder force-pushed the feat/helix-org-prompt-driven-mcp branch from 2386a28 to 284d86b Compare May 4, 2026 09:43
philwinder and others added 23 commits May 6, 2026 17:22
- Move Helix-specific Worker fields off domain.Worker into a sidecar
  WorkerRuntimeState store keyed on (workerID, backend, key). Drops
  six methods from the domain interface and isolates per-runtime
  pointers behind typed helpers in agent/helix/state.go.
- Move the runtime layer out of tools/: new agent/, agent/claude/,
  agent/helix/ packages plus helix/helixclient/ (was tools/helixclient/).
  tools/ now holds only org-graph MCP tools and Deps.
- Rename SpecsPublisher -> agent.WorkspaceSync. Logical-name contract
  ("role.md", "identity.md"); each backend translates to its own
  layout (claude: <envsDir>/<wid>/<name>; helix:
  workers/<wid>/.context/<name>). Fixes the prior path mismatch where
  update_role wrote job/* but the activation mandate read .context/*.
- Move agent.md from tools/templates/ to agent/policy.md and embed as
  agent.Policy so both runtimes share one source.
- Unify session shape: helix.Runtime ("zed_agent") and helix.AgentType
  ("zed_external") are non-configurable constants used by every
  project apply and every /sessions/chat post. Drops chat.agent_type
  config key and the SpawnerConfig.Runtime / ProjectApplier.Runtime
  fields so the spawner and chat backend can no longer drift to
  claude_code.

Verified end-to-end against app.helix.ml: getting-started demo (hire
echo, publish hello, echo: hello, live update_role, loud: HELLO).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New demo: operator raises NCR on shop floor → agent fans out to
supervisor (Slack), customers (SMS), supplier (email held) → supervisor
approves containment in one DM → agent confirms and kills/sends supplier
email based on approval text. Shows the hold pattern and the split
between agent (glue) and human (decisions).

Verified end-to-end against app.helix.ml with comms-demo container.
Three channels (email/slack/sms), two activations, ~90 seconds on stage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chat agent was creating s-ncr-raised with the default transport
(local) because the hire prompt said "no config" — leaving it
ambiguous whether the transport itself was needed. Symptom on stage:
POST /webhooks/s-ncr-raised → 404 "is not a webhook stream".

Three changes:
- Hire prompt now spells out the create_stream JSON for every
  stream and explicitly says do not omit the transport field.
- Adds a smoke-test curl after hire that fails fast if any stream
  is misconfigured.
- Adds the local-transport failure mode to the Recovery table with
  the verbatim fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chat agent kept guessing wrong on transport.kind ("webhook",
"incoming-webhook", and {"kind":"webhook","direction":"in"}) because
the JSON schema exposed kind as a plain string with no enum and no
description. We already had a TransportKind enum surfacer wired up
in tools/schema.go — but createStreamTransport.Kind was typed as
string, not domain.TransportKind, so the enrichment never applied to
this schema.

- Retype createStreamTransport.Kind to domain.TransportKind so the
  existing enum-and-description enrichment kicks in.
- Beef up the tool's Description with the valid kinds and a webhook
  example for clients that don't render enum constraints.

Verified: schema now exposes
  enum: ["local", "webhook", "email", "github"]
and bad kinds are rejected with the existing self-documenting error
("valid: \"local\", \"webhook\", \"email\", \"github\"").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
create_stream's schema now surfaces transport.kind as
enum: ["local","webhook","email","github"] with a description, so
the hire prompt no longer has to defend against the agent guessing
"incoming-webhook" or omitting the transport entirely.

- Trim the "do not omit transport" guardrail and the post-hire
  get_stream verification step — both were workarounds for the
  schema gap, now closed.
- Add a note to always pass `chat --new` after rebuilding the
  binary; chat-driving claude caches MCP tool schemas at session
  start and won't see new enum constraints without a fresh session.
- Soften (don't remove) the local-default Recovery row: stale chat
  sessions on a fresh binary can still hit it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The schema now exposes the valid transport kinds, so the prompt no
longer needs literal JSON arguments — describing the streams in
words is enough for the agent to call create_stream correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Smaller chat models reliably collapse the canonical
{"transport":{"kind":"webhook"}} object to its discriminator string
{"transport":"webhook"} once they've seen the kind enum on the
schema, then watch the call fail with a JSON-unmarshal error and
loop. Both shapes are unambiguous and mean the same thing — accept
both.

- Custom UnmarshalJSON on createStreamTransport handles either form.
- Schema declares transport as a oneOf [enum-string, object] so
  strict-validating MCP clients accept the shorthand too.
- Tests cover both input forms and the schema shape.

Verified live: create_stream with transport:"webhook" produces a
stream with transportKind:"webhook"; the object form still works.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chat backend (chat.backend=helix) runs the chat-driving agent
inside a Helix sandbox that does NOT have this repo checked out.
Telling it to "read ./demos/manufacturing/roles/quality-bot.md" is
a dead instruction — the file isn't there. The Zed agent then
spirals through every other tool it has trying to find context:
kodit_repositories, kodit_wiki, kodit_grep, curl on localhost:9876,
ls on the helix-specs branch, etc.

Fix: paste the entire role markdown inline in the hire prompt so
the agent has zero reason to fetch anything from the filesystem.
Add explicit "Use ONLY the helix-org MCP tools, do NOT read files,
do NOT use kodit, do NOT curl URLs" steering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pointer schema arrived as Types:["object","null"]; setting Type
without clearing Types produced an invalid jsonschema (both Type and
Types non-zero is a marshal error), which broke MCP tools/list at
session start and starved Claude of every helix-org tool.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bare `helix` pattern matched any directory named `helix` at any
depth, which was silently swallowing helix-org/helix/ and
helix-org/agent/helix/ — entire packages (helixclient, spawner,
project applier, runtime state, workspace) sitting in the working tree
but never reaching git. The original intent was to ignore the `helix`
binary at known cmd paths; anchor it there so the helix-org subtree
becomes trackable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…vider

Adopts #2375 (the durable session-scoped message queue)
and #2399 (cold-start dev-container wake) so helix-org no
longer needs to fight the framework with a client-side warmup loop.

## helix client

- New `SendSessionMessage(ctx, sid, content, opts)` posts to
  /api/v1/sessions/{id}/messages — Helix persists the interaction and
  pickupWaitingInteraction delivers it once the agent's WS is reachable.
  Returns 200 even when no agent is connected.
- New `ListProviders` and `ListModelsForProvider`, plus a
  `ValidateProviderModel` helper that checks chat.provider /
  chat.model against the live Helix instance. We hit /v1/models with
  the provider query string (the bare aggregate endpoint excludes
  Anthropic and is unreliable).

## Spawner refactor (agent/helix/spawner.go)

- Follow-up activations queue via `SendSessionMessage` — no StartChat
  round-trip. 290ms instead of 7s+ on a warm session.
- First activations still use `StartChat` to create the session; on the
  cold-start `hadWSError` race we re-queue the same prompt via the
  durable endpoint instead of polling for up to 5 minutes.
- Drops `warmupSession` (~40 lines).
- New tests: `TestSpawnerFollowUpUsesSendSessionMessage` (asserts no
  StartChat on follow-up) and `TestSpawnerColdStartReQueues` (asserts
  the hadWSError → queue handoff).

## Chat-bridge refactor (server/chat/helix_bridge.go)

- Same two-path treatment: follow-ups via `SendSessionMessage`, fresh
  sessions via `StartChat` with cold-start fallback to the queue.
- Drops `warmupAndRetry` and the 5-minute background goroutine
  (~70 lines).
- Existing test updated to assert follow-ups go through the queue.

## Provider/model validation

- `bootstrap helix-runtime` now runs the validator after WhoAmI and
  prints the actual providers/models on failure.
- `serve` refuses to start with bad chat.provider / chat.model and
  points operators at the exact config commands to fix it.

Without this, a typo in chat.provider surfaces as a 422 from
/sessions/{id}/zed-config three minutes later when the desktop tries
to fetch its Zed config — with no obvious link back to the bad key.
The validator turns that into a fail-fast at startup.

## Verified end-to-end against meta.helix.ml

Final smoke session: ses_01kr9bcpcm9gnpr7k5y4fgjmdk
- First send → StartChat (~31s for Zed cold boot) → "pong"
- Follow-up → SendSessionMessage (347ms to queue) → response within ~10s

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds CheckDesktopQuota helper that hits /api/v1/config and refuses
when max_concurrent_desktops would be exceeded by spinning up one
more session. Wired into both code paths that open *new* zed_external
sessions:

- agent/helix/spawner.go::ensureSession (AI Worker activations)
- server/chat/helix_bridge.go::send       (owner chat first turn)

Follow-ups skip the check — they reuse the warm container and don't
allocate a new desktop slot.

Without this, a quota-full Helix would let helix-org spin up the per-
Worker project plumbing (apply secrets, attach MCP, create agent app)
and only fail at the StartDesktop step with a generic 500 several
seconds later. The new error message names the actual count and
points operators at the fix:

  desktop quota reached on Helix (3/2 active) — stop one of the
  existing sessions before opening a new one

The check is soft (no atomic reserve) — a parallel caller could still
race for the last slot, in which case Helix's own quota error wins.
That's acceptable; the goal is operator clarity in the common single-
user case.

Verified end-to-end against meta.helix.ml: with active=3/max=2, send
returned 500 + actionable message in 289ms; after stopping two
sessions (active=1), the same request opened a session in 7s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…way and embedded Spawner

Mounts helix-org as a per-user, alpha-gated feature inside Helix. Adds
an `alpha_features` column on users, a `requireFeature` middleware, a
sidebar entry for flagged users, an agent picker at /ui/alpha-agents,
and the helix-org htmx surface at /ui/. helix-org's MCP server is
exposed through Helix's MCP gateway at /api/v1/mcp/helix-org/workers/
{id}/mcp; the gateway extracts the worker id and forwards to the
in-process helix-org handler, so picked agents authenticate via the
calling user's api_key (baked into the agent config's MCP headers)
rather than reaching a separately auth-gated endpoint.

The new embedded Spawner activates AI Workers by opening a fresh
helix_agent chat session against a lazily-provisioned per-Worker clone
of the picked owner agent, with its MCP entry rewritten to scope at
/workers/<id>/mcp. Worker prompts include role.md + identity.md + the
agent policy; transcripts publish to s-activations-<workerID>. No Zed
sandbox per Worker — every activation is one LLM call's worth of
latency.

Chat bridge fixes that fell out of e2e testing: /ui/chat/send now runs
b.send in a detached 10-min context so htmx doesn't 500 on long
agent runs (the WS subscriber pushes the transcript regardless); the
follow-up path uses /sessions/chat with SessionID set (the
/sessions/{id}/messages queue endpoint helix-org's standalone build
targets doesn't exist in this Helix); and the startChat REST call uses
a dedicated long-timeout client to survive multi-step agent runs.

hire_worker accepts `grants` as a JSON-encoded string in addition to
an inline array — Sonnet sometimes wraps nested arrays this way.

Verified e2e against the rewritten getting-started demo: stream/role/
position created, w-echo hired and activated on hire, owner publish
triggers w-echo via the dispatcher, live role edit takes effect on the
next activation (echo: hello → loud: HELLO). All 28 helix-org MCP
tools surface to the picked agent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nd-to-end

Replaces the in-process spawner from the previous commit with helix-org's
production helix.Spawner (per-Worker Helix project + git repo + Zed
sandbox), wired so the embedded SaaS alpha can drive it without
storing tokens at rest.

Worker activations now run as the hiring user, not the service account:
hire_worker reads `X-Helix-Org-User-Id` (forwarded by the MCP gateway
backend from the authenticated Helix user) off the request context and
persists it on WorkerRuntimeState. The Spawner's new `BearerForUser`
callback mints a fresh api_key per activation by user-id lookup —
implemented in the embedded host as `resolveUserHelixAPIKey`. Each
Worker's chat session, project apply, MCP attach and transcript
subscribe therefore happen as the user who hired the Worker (their
Claude subscription, their desktop quota, their audit trail). No
bearer tokens persisted in helix-org's domain at any point.

Workers run Claude Code on subscription credentials by default:
SpawnerConfig grows Runtime + Credentials fields, ProjectApplier
honours them (claude_code + subscription means no Provider/Model needed
on the per-Worker app). Helix's `addUserAPITokenToAgent` learns to set
CLAUDE_CODE_OAUTH_TOKEN + Anthropic-direct ANTHROPIC_BASE_URL on the
container env when the parent app uses subscription credentials. Two
session-handler bugs surfaced and were fixed along the way:
`ValidateAssistantModelConfig` and the codeAgentConfig/agentName lookup
in zed_config_handlers both only honoured spec-task-driven sessions —
they now also resolve via `session.ParentApp` so any zed_external
session opened via /sessions/chat against a code_agent-runtime app
ships the right runtime, not "zed-agent".

Live role/identity edits propagate to running Workers: the embedded
host wires `agenthelix.NewWorkspace` as `deps.Workspace`, so update_role
pushes the new role.md to the per-Worker repo on helix-specs. The
Workspace also clears the Worker's persisted SessionID on role.md /
identity.md publishes, forcing the next activation to open a fresh
Claude Code session that re-reads role.md instead of inheriting the
prior turn's cached content.

Tool argument tolerance: hire_worker.grants, read_events.{limit,wait},
read_streams.limit, and worker_log.{limit,wait} now accept their
declared ints either as JSON numbers or as JSON strings — Claude Code
intermittently emits typed params as strings when the schema isn't in
its discovered-tool set, and we'd rather absorb the quirk than fail
the activation. The MCP gateway also extracts the worker ID from the
URL suffix so per-Worker scoping (`/api/v1/mcp/helix-org/workers/<id>/mcp`)
works end-to-end, and helix-org's MCP handler hoists the Authorization
bearer onto ctx so tools can use it.

Verified end-to-end: hire a worker → Zed sandbox boots → Claude Code
authenticates via OAuth subscription → subscribes to s-general → exits
ok. Publish "hello" → dispatcher activates worker → "echo: hello"
appears. update_role to "loud" mode → session invalidated → next
activation publishes "loud: HELLO".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chart section was wrapped in a polling div with hx-trigger="every 5s"
+ hx-swap="outerHTML" + hx-select="#org-chart-section". Each tick
fetched the entire 14KB /ui/org page, replaced the polling div with a
fresh copy, and forced htmx to re-walk every node inside the chart
SVG to re-bind hx-* attributes — hundreds of element scans on every
swap. With htmx 2 the outerHTML swap also occasionally double-fires
its replacement (timer not cleaned up across swaps), so the polling
cascaded: each replace spawned another timer, each timer triggered
another replace, browser tabs ground to a halt and the first click
after a fresh load showed "request never received" in DevTools while
follow-up clicks took ~20s.

Split the chart into a standalone template (org_chart.html) and a
dedicated endpoint GET /ui/org/chart that serves the chart fragment
only. The polling div now does hx-swap="innerHTML" against itself —
stable identity, single timer — and the polling interval is bumped
5s → 30s since the org graph rarely changes that often and the chart
is CPU-expensive to re-bind even on the cheap path.

Verified: page sits idle for 35s producing exactly 1 chart poll;
clicking a node fires 1 detail fetch with no cascade.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The embedded build constructed the chat bridge and the org MCP server
without a prompts registry — so typing `/help` (or any other slash
command) just got forwarded to the LLM as the literal string
"/help", with no expansion. The chat bridge has expandSlashCommand
plumbing and the org MCP server has prompt support; both pick up
their content from prompts.Registry but only when one is attached.

Build the registry with prompts.RegisterBuiltins (same set the
standalone helix-org binary uses — /help, /role, /worker etc.),
attach it to the chat bridge via HelixBridge.WithPrompts, and pass
it to the org server via helixorgserver.Server.WithPrompts. Typing
"/help" now renders the auto-generated prompt body and the agent
replies with the slash-command listing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the /ui/alpha-agents handler emitted raw inline HTML/CSS
from Go — quick to ship during the alpha bring-up, but visually
inconsistent with the rest of /ui/ (no sidebar, no head template,
different fonts). Move it onto the same template machinery the chat,
org, streams, and settings pages use:

- New `helix-org/server/ui/templates/alpha_agents.html` with the
  standard shell (head + sidebar) and card-soft agent list matching
  the rest of /ui/.
- New `AlphaAgentsPage` / `AlphaAgentRow` types in
  `helix-org/server/ui/pages.go`.
- New exported `RenderAlphaAgents(w, ownerWorkerID, recents, page)`
  helper so the embedded SaaS host can render through helix-org's
  tmpl pipeline without dragging shell HTML into api/pkg/server/.
- `helix_org_agent_picker.go` strips its 60 lines of inline HTML and
  hands an `AlphaAgentsPage` to `RenderAlphaAgents`. Picker logic
  (Helix /apps fetch, MCP attach) stays in api/ where it belongs;
  only the rendering moves into helix-org.
- Picker now surfaces `code_agent_runtime` on each row so the
  operator can tell `claude_code` (subscription Claude Code) apart
  from `zed_agent` (Helix-proxied LLM) before picking.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ngs knob

The previous architecture pivoted on a vestigial separation: the chat
bridge ran in "app-only mode" against an existing Helix agent picked
via chat.app_id, while AI Workers got their own per-Worker Helix
project + Zed sandbox via the Spawner. Two different code paths for
"run a Worker," surfaced via a /ui/alpha-agents picker that nobody
needed once the design clarified.

Right model: w-owner IS a Worker. ProjectApplier.Ensure runs the same
provisioning for the owner that it runs for any AI Worker — the chat
surface at /ui/ is a window onto w-owner's persistent zed_external
session. One default per-Worker config: `worker.runtime` (default
"claude_code"), implies subscription auth, no provider/model needed.

Changes:

- Chat bridge built with `Ensure: ProjectApplier` (not `AppIDFunc`).
  Same applier the Spawner uses, same defaults, same MCP wiring.
- Drop `chat.app_id` and `chat.session_role` config keys. The
  session-role is now hardcoded to "owner-chat" (Helix never reuses
  it in any control path). Drop `helix.org_url` — the gateway URL is
  derived from `helix.url`.
- Add `worker.runtime` config key (default "claude_code"). One knob.
- Delete /ui/alpha-agents: page handler, template, AlphaAgentsPage
  type, RenderAlphaAgents helper, sidebar entry that opened it.
  Sidebar shortcut now opens /ui/ chat directly.
- helix_org.go now builds one shared `*agenthelix.ProjectApplier` and
  hands it to both the spawner and the chat bridge — single source of
  truth for "Worker defaults" instead of duplicating Runtime/
  Credentials/MCP-attach config in two places.
- Cold-start retry path in helix_bridge.go's `b.send` previously fell
  back to `SendSessionMessage` (which targets a /sessions/{id}/messages
  queue endpoint that doesn't exist in embedded Helix); switched to
  StartChatWithStatus-with-SessionID, same pattern as the followup
  path we fixed earlier.
- Detached chat-send goroutine was stripping the per-request bearer,
  which pushed every owner-chat session onto the service api_key and
  blocked Claude subscription lookup. Now reads
  helixclient.BearerFromContext(r.Context()) up-front and rewraps it
  onto the detached ctx, so the session lands on the actual logged-in
  user and picks up their Claude subscription.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ridge session survives restarts

Two related bugs surfaced when hiring an AI worker from the owner
chat: the new worker's Helix project ended up under the service
account instead of the logged-in user, and clicking on the owner
worker in Helix's project list boots a fresh Zed sandbox instead of
attaching to the one the chat surface is using.

ProjectApplier was attaching the helix-org MCP entry on each
auto-provisioned agent app using the static `MCPAuthBearer` field,
which the embedded host filled with the service api_key. When the
owner's sandbox called `hire_worker` over that MCP, the request
authenticated as the service user; `hire_worker` then persisted the
service user as `HiringUserID`; the Spawner used that ID via
BearerForUser to mint a service-user api_key; and the resulting
worker project ended up outside the user's org. Now ProjectApplier
prefers the bearer in ctx (set by withHelixUserBearer on chat sends,
or by BearerForUser inside the Spawner) and only falls back to the
static field when ctx carries nothing — keeping the old
service-account behaviour for standalone deploys.

HelixBridge tracked its live session ID in process memory only, so
every API restart orphaned the warm Zed sandbox. Added optional
LoadSessionID/SaveSessionID callbacks on HelixConfig; the embedded
host wires them to agenthelix.LoadState/SaveSession on
WorkerRuntimeState, so the bridge picks up the same session_id the
Spawner persists. After restart, /ui/chat/send recovers the pointer
on its first call and continues the existing session instead of
opening a new one. Side benefit: anyone opening w-owner's project
page in Helix lands on the same session the chat surface is driving.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every reply was showing up twice in /ui/. Two paths were converging
on the same SSE stream for zed_external sessions:

1. broadcastInteractions iterated session.Interactions returned by
   StartChatWithStatus and rendered each reply.
2. The WS subscriber attachSession kicked off translated
   message_completed frames and rendered them too.

For helix_basic (app-only) the WS path is silent — interactions come
back inline — so the synchronous render is the only source. For
zed_external the WS path IS the canonical source — Interactions
should never be populated inline because Helix's streaming handler
returns the session ID early and the agent runs async.

But the follow-up code path didn't set AgentType on its
StartChatWithStatus request, so Helix dropped to the non-streaming
handler which blocks until the agent finishes and DOES populate
Interactions inline. Both paths then rendered. First-turn was OK
because it explicitly set AgentType, but it still called
broadcastInteractions unconditionally — masked by the empty
Interactions slice the streaming handler returns.

Fix: set AgentType=zed_external on follow-ups (matches first-turn)
and gate broadcastInteractions on appOnly so zed_external never
synchronous-renders even if Interactions sneak through. Comment
update explaining why the two paths are mutually exclusive by
agent_type, not just by "in practice."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Helix's per-project "Open Human Desktop" button matches on
session_role="exploratory". We were writing role="owner-chat" on
every chat session helix-org's bridge opened, so the button never
found those sessions, always spawned a parallel sandbox, and the
user thought the button was trashing their live desktop.

For helix-org's model the owner chat IS the project's human
session — there is no separate "exploratory" notion. Labelling
matches reality and makes the button take the operator to the
session their /ui/ chat is already driving.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same class of bug we already fixed on /ui/org: three concurrent
hx-trigger="every Ns" triggers (one at 5s, two at 3s), each paired
with hx-swap="outerHTML" against the trigger node itself. htmx 2's
timer cleanup on outerHTML swap is racy, replacements stack up, and
the browser tab spends ~20s of CPU per page load processing
overlapping /ui/streams responses (each ~1KB of markup with a wide
DOM walk to re-bind handlers).

Killed all three pollers. The page renders once now; manual refresh
to see new streams or events. A proper live update belongs on SSE
(htmx-ext-sse is already on the page) rather than whole-page
polling — defer that to a later pass.

Audit of remaining hx-* usage in the templates: only org.html still
polls, at 30s with hx-swap="innerHTML" on a stable shell — the fixed
pattern. chat.html uses SSE + debounced keyup (no polling). The rest
are click/form-driven.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant