feat: helix-org prototype with MCP, prompt-driven CLI, transports (webhook/email/github), and Role/Identity split#2286
Draft
philwinder wants to merge 51 commits into
Draft
feat: helix-org prototype with MCP, prompt-driven CLI, transports (webhook/email/github), and Role/Identity split#2286philwinder wants to merge 51 commits into
philwinder wants to merge 51 commits into
Conversation
d9a9c99 to
01e9388
Compare
…ity split
Adds a complete proto-implementation of helix-org as a standalone Go project with:
- **MCP Integration**: All mutations flow through Model Context Protocol at /workers/{id}/mcp
using Streamable HTTP transport. Tool list is grant-filtered per worker.
- **Prompt-Driven CLI**: New `helix-org prompt` subcommand spawns Claude Code with inline
MCP config, enabling natural-language orchestration of the entire org graph.
- **Role vs Worker Split**: Roles are job descriptions (owner-edited markdown, fanned out
via update_role). Workers are people in positions (per-hire identities, immutable).
- **Environment Provisioning**: Each Worker gets an isolated environment directory with:
- role.md (propagated via update_role)
- identity.md (per-hire, immutable)
- agent.md (fixed stub: "Read role.md and identity.md, act on trigger")
- mcp.json (dynamically generated per activation)
- **Push-Dispatch Event Loop**: When events land on subscribed channels, the system spawns
a fresh Claude Code instance (one-shot activation) with that worker's MCP endpoint.
- **channel_members Tool**: Read-only MCP tool that lists workers subscribed to a channel,
enabling Workers to query org membership without side effects.
- **Simplified Grant Model**: Grants are now strictly (workerID, toolName) pairs. Removed
enforcement/scope entirely—a grant IS the permission, and the agent is trusted to comply.
- **Humanized Demos**: Getting-started and newsroom demos now use prompt-based CLIs with
natural-language orchestration instead of raw API calls.
Major components:
- domain/: Core types (Role, Worker, Position, Channel, Grant, Event)
- store/sqlite: GORM-driven SQLite storage with AutoMigrate
- tools/: 13 MCP tools (create_role, hire_worker, etc.) + spawner
- server/: HTTP endpoints + MCP handler + jsonapi.org serialization
- cmd/helix-org: CLI with serve, bootstrap, prompt subcommands
- broadcast/dispatch: Event bus for push-based activation
- demos/: Two runnable examples (getting-started, newsroom editorial team)
Design principles embedded:
- Prefer data/text over code (config in Role markdown, not Go)
- Keep core generic (tools define their own scope and schemas)
- No workflow in code (agents orchestrate via prompts, not implicit chains)
- Write smallest thing that works (no speculative abstractions)
All code tested end-to-end: bootstrap → role create → worker hire → event publish →
worker activation with MCP → live-edit role → behaviour change on next activation.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
A minimal three-Worker demo that produces an opinionated MLOps
newsletter with a fresh angle each issue. Shows the prompt-driven
philosophy at its tightest:
- Only files on disk are 3 short role markdown files (~25 lines each)
- A single helix-org prompt call creates the roles, positions,
channels, and hires the team
- Editor picks the angle, researcher hunts for matching news,
journalist crafts the narrative
- Re-run with a different brief and the same team produces a
completely different angle on the same broad subject
Tested end-to-end: two briefs produced two distinct angles
("platform team tax" vs "feature stores as MLOps' open secret
graveyard") with named subjects (Stitch Fix, Chime, Modal Labs,
Tecton) — proving the angle truly varies per brief.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Adds a new \`helix-org tail [glob...]\` CLI plus the \`GET /tail\` endpoint it talks to. Lets the human watch the cascade of a running team in real time without curl + jq incantations. - Defaults to '*' (all channels). Globs use Go's path.Match: 'c-*', 'c-news?', 'c-newsletter'. Multiple globs unioned. - Long-polls (default 30s wait, configurable via --wait). - Pretty output: HH:MM:SS channel source body, with subsequent body lines indented under the body column. ANSI colour when stdout is a TTY; --no-color to disable. - New broadcast.Broadcaster.SubscribeAll for wildcard wakes, so channels created mid-tail (e.g. by an editor's hire trigger) also wake the tail loop. - New store.Events.ListSince(channelIDs, since, limit) returning oldest-first events strictly newer than the named event. - URL surface designed to extend: bare globs are channel IDs today; future namespace prefixes (channel:c-*, activation:w-*) can be added without breaking compatibility. Tested: store + broadcaster unit tests, server endpoint test covering glob match, since cursor, and default match. Live-tested against the running mlops-newsletter demo (history backfill, live event arrival via long-poll, multi-glob union). Newsletter README updated to use \`helix-org tail\` instead of curl. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Both demos previously asked the user to either tail per-Worker activation.log files or curl the channel events endpoint. Replace both with helix-org tail: - newsroom: drop "tile seven terminals" instruction in favour of one tail window (default '*' = all channels). Recommend per-channel globs (tail c-bullpen, tail c-recruiting) for narrower focus. "What to point at during the demo" callouts now name the exact tail command to run. - getting-started: replace tail -f activation.log + curl-and-jq round-trip check with helix-org tail. Keep activation.log as a parenthetical for debugging the worker's internal claude stream. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…h Transport extensibility
## Abstraction Simplification
- **Channel → Stream**: Unified the Channel concept into Stream, removing redundant abstraction. Streams now hold the single named pub/sub channel.
- **Stream → Subscription**: Renamed the worker-channel edge from Stream to Subscription using a composite key (worker_id, stream_id). This eliminates synthetic stream IDs and clarifies the semantic: a subscription is a worker's interest in a stream, not the stream itself.
- **Transport Field**: Added optional Transport field to Stream to support future integrations (Slack, email, webhook, RSS, tick). Defaults to "local" (in-process pub/sub). Designed to be extensible without core changes.
## Architecture Changes
### Domain Layer (domain/)
- Added `transport.go`: Transport struct with Kind (enum) and optional Config (json.RawMessage)
- Added `subscription.go`: Subscription struct with WorkerID, StreamID, CreatedAt (composite key, no synthetic ID)
- Updated `stream.go`: Renamed from Channel; now holds ID, Name, Description, CreatedBy, CreatedAt, Transport
- Updated `event.go`: Changed ChannelID field to StreamID
- Updated `id.go`: Removed ChannelID type
### Store Layer (store/sqlite/)
- Added `subscription.go`: Subscriptions repository with Create, Delete, Find, ListForWorker, ListForStream
- Updated `stream.go`: Renamed from channel.go; added TransportKind and TransportConfig columns
- Updated `event.go`: Changed column references from channel_id to stream_id; JOINs on subscriptions instead of streams
- Updated `streams_and_events_test.go`: Renamed from feed_and_channels_test.go; comprehensive test coverage for new abstractions
- Updated `store.go`: Renamed Channels → Streams; replaced Streams → Subscriptions
### Broadcast & Dispatch (broadcast/, dispatch/)
- Renamed all channelID references to streamID throughout
- Updated method signatures to use StreamID instead of ChannelID
### Tools Layer (tools/)
- Added `create_stream.go`: New tool taking optional transport argument
- Added `read_events.go`: Replaces read_feed.go; queries subscriptions then long-polls streams
- Added `read_*.go` (streams, grants, positions, roles, workers): MCP tools replacing HTTP read endpoints
- Updated `subscribe.go`, `unsubscribe.go`, `publish.go`: Use streamId and Subscriptions API
- Renamed `channel_members.go` → `stream_members.go`; calls Subscriptions.ListForStream
- Updated `spawner.go`: Trigger struct uses StreamID; updated event notification text
### Server & HTTP (server/)
- Moved all read endpoints to MCP tools; `/workers/{id}/mcp` now handles mutations only
- Updated `tail.go`: Long-poll attributes renamed to streamID; calls store.Streams.List
- Simplified `server.go`: Only MCP mutation handler and tail endpoint remain
- Deleted: bootstrap.go, channels.go, environment.go, feed.go, grants.go, positions.go, roles.go, workers.go
### Bootstrap & CLI (bootstrap/, cmd/)
- Updated default tool grants to reference new tool names
- Updated vocabulary throughout: c- prefix → s- prefix for stream IDs
### Demos (demos/)
- Updated all demo READMEs and role definitions from channel to stream vocabulary
- Added `mlops-newsletter/hire.txt`: Example hire prompt
## Benefits
1. **Clearer semantics**: Stream is what it says (a named pub/sub channel), Subscription is the worker's interest in it
2. **Extensibility**: Transport field allows future integrations without core changes
3. **Reduced complexity**: No synthetic stream IDs, no redundant Feed/Channel/Stream layers
4. **MCP-first design**: All mutations now routed through MCP, read endpoints are MCP tools
5. **Smaller server surface**: HTTP endpoints only for authentication + tail streaming
## Testing
All 57 test cases pass with race detector enabled across all packages:
- domain: Subscription and Transport validation
- store/sqlite: Subscriptions repository operations, stream queries with JOINs
- broadcast: Pub/sub with streamID
- server: Tail long-poll with stream glob matching
- tools: All 13 MCP tools with varied schemas
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `/tail` HTTP long-poll endpoint and `helix-org tail/prompt/client`
CLI subcommands are now unnecessary: all human observation and
orchestration flows through MCP via `claude` sessions directly.
**Removals:**
- Delete server/tail.go (HTTP long-poll handler)
- Delete server/jsonapi.go (only used by tail)
- Delete cmd/helix-org/tail.go (CLI client)
- Delete cmd/helix-org/prompt.go (spawner stub)
- Delete cmd/helix-org/client.go (envelope types)
- Remove mux route for GET /tail
- Remove Broadcaster.SubscribeAll/UnsubscribeAll (dead after tail removal)
- Simplify serve/bootstrap doc: "one HTTP endpoint: /workers/{id}/mcp"
**Updates:**
- demos/getting-started/README.md: replace helix-org tail with claude
watcher prompt using subscribe + read_events(wait=60)
- demos/mlops-newsletter/README.md: same pattern
- demos/newsroom/README.md: same pattern, plus add recruiter role
"On hire" trigger to handle stream race condition
- CLAUDE.md: clarify that human observation uses MCP (no /tail endpoint)
- tools/publish.go: comment fix
**Fixes:**
- cmd/helix-org/bootstrap.go: make installClaudeMCPEntry idempotent
by removing stale entry before adding (re-running bootstrap between
demo wipes no longer fails)
- demos/newsroom/roles/recruiter.md: add "On hire" subscribe + retry
guidance matching researcher/journalist (Renée was getting hired
before Maya's hire activation created s-recruiting)
All three demos tested end-to-end: bootstrap → scaffold → hire cascade
→ event publishing → role live-edit → behavior change confirmed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add helix-org chat — an interactive claude session pointed at a Worker's MCP endpoint (default w-owner). Supports --new, --resume, --worker flags, and session persistence via claude's per-cwd store with --continue. Update all three demos to show only the interactive chat flow: - getting-started: condensed from two-terminal to one, removed --install-claude-mcp, Bootstrap → chat → type prompts as w-owner - mlops-newsletter: removed separate watcher terminal, team setup and brief publishing now happen inline in chat - newsroom: removed multi-terminal watcher, all interaction happens in the bootstrap + chat session Demos now focus on the actual user experience (typing into a chat) which mirrors a real UI-based server. Removed background concepts, multi-terminal complexity, and one-shot (-p) mode from demos. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
helix-org chat unconditionally passed --continue, so the first run in a fresh directory exited with "No conversation found to continue" before the user could type anything. Probe ~/.claude/projects/<encoded-cwd>/ for any .jsonl session file and only pass --continue when one exists; otherwise let claude start fresh, which still seeds a session for the next run to resume. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace claude's --continue flag with --resume <sessionId>, looked up by reading the most-recently-modified .jsonl in the cwd's session store and parsing the sessionId from its first line. --continue rejects sessions whose log ended on certain non-user events (e.g. an agent-name marker from a prior interrupted exit), failing with "No conversation found to continue" even when the session is fine to resume by ID. This blocked re-entry into chat in the demo directories whenever a previous chat had exited mid-flight. If no prior session exists, claude is launched without a resume flag and starts fresh — matching the desired first-run behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two new MCP tools for worker-to-worker communication: - dm: High-level tool bundling create_stream + invite_workers + publish into a single call. Creates per-pair streams with deterministic naming (s-dm-<sortedIDs>) so conversations reuse the same stream regardless of direction. Complements lower-level streaming tools with a high-level, autonomously-discoverable entry point. - invite_workers: Subscribes one or more workers to a stream in a single call. Idempotent — re-inviting already-subscribed workers is a no-op. Enables batch subscription workflows without manual loop. Both tools are granted to the owner during bootstrap and tested end-to-end (dm stream reuse across directions, idempotency, self-DM rejection, unknown worker rejection). Updated demo: newsroom step 6 now uses dm instead of manual 4-step workflow, and updated comments in publish/subscribe to point to dm as the high-level entry point. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces on-disk activation.log/jsonl files with a per-Worker activation Stream. Assistant text, tool calls, tool results, and lifecycle markers are now Events on s-activations-<workerID> — same primitive as every other read in the system. - hire_worker creates the activation Stream at hire time and subscribes the hiring Worker. The new Worker themselves is intentionally NOT subscribed (would loop the dispatcher otherwise). - Spawner publishes one Event per atomic message segment (assistant text, tool_use, tool_result, system init, run result), bracketed by synthetic '=== activation: <trigger> ===' and '=== exit: <err> ===' markers. Append + Notify only — the dispatcher is skipped so per- message events can't re-trigger subscribed AI Workers. - worker_log tool bundles subscribe + read_events scoped to one Worker's activation Stream. Mirrors the dm pattern: a friendly shortcut the agent can reach for from a 'show me what w-X is doing' instruction without knowing the stream-naming convention. Persistence between activation runs is left to the Role: if a Worker needs cross-run memory, the Role tells it to write to history.md and read it back on the next activation. No system feature added. Demos updated to showcase the new affordances: - getting-started: step 3 uses worker_log to confirm hire activation finished, eliminating the cross-terminal log-watching requirement. - mlops-newsletter: step 4 adds a peek-inside tip using worker_log. - newsroom: adds a 'Watch a Worker work' step parallel to the dm step, plus a 'What to point at' bullet for fact-checker blocks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds inbound webhook support to helix-org Streams. Each Stream can declare
transport.kind="webhook"; POST requests to /webhooks/<streamID> append the
request body as an Event, trigger the dispatcher to wake subscribed Workers,
and notify long-poll observers.
Key changes:
- domain/transport.go: add TransportWebhook kind with docstring
- server/server.go: add Dispatcher interface, update New() signature
- server/webhook.go: HTTP POST handler for /webhooks/{streamID}
- server/webhook_test.go: 9 test functions covering edge cases and concurrency
* happy path, missing stream, wrong transport, empty body
* size limits, nil broadcaster/dispatcher, UTF-8 handling
* 25 concurrent POSTs, stream isolation
* race-detector clean with -count=20
Also fixes critical :memory: SQLite concurrency bug:
- store/sqlite/sqlite.go: pin MaxOpenConns(1) for in-memory databases
- Root cause: each connection gets its own private :memory: DB
- Impact: concurrent HTTP tests now see consistent state
New demo:
- demos/webhook/README.md: 5-step specification (hire secretary, POST payload, read back)
- demos/webhook/roles/secretary.md: secretary subscribes to s-inbox, summarizes
incoming payloads, DMs summaries to owner
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the webhook transport so a Stream can be configured to POST each appended Event to an external URL. A Stream can now be inbound- only (current behaviour, no config), outbound-only (config sets outbound_url), or both at once — the dispatcher fires emit on every append regardless of origin (webhook handler, publish tool, dm tool). Key changes: - domain/transport.go: WebhookConfig type with OutboundURL field; Validate now parses webhook config and rejects non-http(s) URLs, relative URLs, and empty hosts before stream creation - dispatch/dispatcher.go: emitOutbound runs on every Dispatch, looks up the Stream's transport, and if outbound_url is set fires an async POST with X-Helix-Stream and X-Helix-Event headers; bounded by 5s timeout so slow targets don't stall publishes - domain/transport_test.go: 14 cases covering Validate happy paths and rejection paths, plus WebhookConfig parse round-trip - dispatch/dispatcher_test.go: 12 tests covering emit happy path, inbound-only no-emit, local-no-emit, missing stream, 4xx/5xx tolerance, unreachable host, slow target timeout, 25 concurrent emits, binary payload round-trip, malformed stored config, store lookup errors, and content-type/path preservation - server/webhook_test.go: TestWebhookBridgesInboundToOutbound wires the real dispatcher end-to-end and proves an external POST to /webhooks/<streamID> bridges to an outbound POST when the same stream has both directions configured Demo narrative updated: secretary now subscribes to s-inbox, DMs the owner with the summary, and publishes the summary to s-outbox which is configured with outbound_url. A 4-terminal flow with a local nc catcher shows the full inbound -> summarise -> outbound bridge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds domain.Message — a transport-agnostic envelope (From, To, Subject,
Body, ThreadID, InReplyTo, MessageID, Attachments, Extra) — and migrates
every event-producing path to encode it as JSON in Event.Body. There is
one storage shape going forward; future transports (email, Slack,
queues, feeds) translate at their boundary, Workers see the same
structure regardless of source.
Identity convention: From/To carry transport-native identifiers
verbatim (WorkerIDs when known, alice@x.com / U0123 / +15551234 / etc.
otherwise — no prefixes). Empty From means "no human originator" for
data feeds and triggers.
Code changes:
- domain/message.go: Message + Attachment types, Encode/Decode helpers,
Event.Message() parser, NewMessageEvent constructor
- tools/dm.go: produces Message{From: caller, To: [recipient], Body}
- tools/publish.go: accepts optional to/subject/threadId/inReplyTo/
messageId/bodyContentType/attachments args; defaults From=caller
- server/webhook.go: wraps inbound POST bodies into Message{Body: raw}
- tools/spawner.go: activation log entries wrapped as Message{From:
workerID, Body: line}; Trigger gains a Message field
- dispatch/dispatcher.go: parses Event.Body once, passes parsed
Message and visible Body text to the spawner
- tools/read_events.go: surfaces Message.Body as `body` (visible text)
and the full envelope as `message` — Roles needing structure read
the latter; existing role prompts that read `.body` continue to work
Tests updated to use Event.Message() instead of comparing raw Body
strings; full make check passes (lint clean, race detector clean).
Demos verified end-to-end after the refactor:
- getting-started: hire echo worker, publish "hello", echo replies,
live-edit role, "loud: HELLO" — all four steps green
- webhook: secretary summarises inbound POST, DMs owner, publishes to
s-outbox, outbound emitter POSTs Message JSON to nc:9000 catcher
(catcher now sees structured envelope, not raw text — README
updated to describe this)
- mlops-newsletter: full editor → researcher → journalist → editor
cascade produces a complete newsletter on s-newsletter
- newsroom: 7 roles, 2 positions, 2 hires (Maya + Renée), all
activations clean — message machinery validated without running
the real-PR cascade
Design doc at design/messages.md captures the convention, the per-
transport mapping table for future transports, and open questions
to resolve as new transports ship.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements the email transport, the operational-config infrastructure
it sits on, and a runnable customer-service demo (Sam) that emails
land at and reply through.
Verified end-to-end: simulated inbound POST → +sam alias routed →
Sam's claude activation → reply published to s-support → outbound
emit POSTed to Postmark's /email API → real email delivered to
phil@winder.ai. ~22s wall-clock end-to-end on a cold activation.
Operational config (design/config.md):
- New configs table (key/value/audit), store.Configs interface,
sqlite impl. Auto-migrated alongside the rest.
- config.Registry: subsystems Register a Spec (type, default,
required, secret paths, description). Reads/writes go through it
so the CLI's view matches what consumers actually consume.
- helix-org config CLI: set/get/list/delete. Opens the SQLite file
directly (same path as bootstrap), so config writes commit and
the running server picks them up on its next read — live updates
without restart, and without an LLM ever touching the values.
Secrets redacted by default; --reveal-secrets opts in.
- Strict separation: org-graph mutations stay on MCP; operational
config (transport creds, future model selection, etc.) is
CLI-only. Same SQLite file, two access paths, two threat models.
Email transport (transports/postmark):
- domain.TransportEmail kind + EmailConfig{Alias} stream config.
Validate enforces lowercase alphanumeric/dash/underscore aliases
so they compose safely into <hash>+<alias>@... or <alias>@Domain.
- Inbound HTTP handler at /email/postmark: parses Postmark's JSON,
extracts the +alias suffix from OriginalRecipient, finds the
matching Stream by alias, builds a domain.Message envelope (From,
To, Subject, Body, MessageID, InReplyTo, ThreadID from headers,
Attachment metadata), appends the event, fires the dispatcher.
- Outbound emitter: when a Worker publishes to an email Stream, the
dispatcher invokes the transport's Emit, which composes a
Postmark /email POST (From=server-config, To from Message.To,
optional Reply-To at <hash>+<alias>@... for threading,
In-Reply-To/References headers when set).
- Server-level config (token, inbound, from, optional
disable_reply_to) lives in transport.postmark; per-stream
config is just {"alias":"sam"}. The transport joins the two at
runtime, so rotating creds is one CLI call with no restart.
- disable_reply_to flag: workaround for Postmark's pending-approval
same-domain restriction (Reply-To at inbound.postmarkapp.com is
treated as a cross-domain recipient and blocks the send). With
it on, outbound works but customer replies won't loop back into
helix until the account is approved — documented in the demo
README as the path to closing the loop.
Dispatcher loop guard:
- Skip outbound emit when event.Source == "" (system-emitted, i.e.
inbound from this transport's own webhook). Without this, a
bidirectional Stream (one alias, both inbound and outbound) would
echo every inbound message straight back out to itself.
Worker-published events (Source != "") still emit normally.
- Replaced TestWebhookBridgesInboundToOutbound with
TestWebhookInboundDoesNotEcho to lock the new behaviour in.
Server:
- Server.Handler now takes optional Routes so transports can mount
their own inbound endpoints without server.go importing them. The
email transport's /email/postmark gets mounted from cmd/helix-org/serve.go.
Demo (demos/email):
- README.md walks through the whole flow: signup → server token →
Sender Signature → inbound hash → cloudflared/ngrok tunnel →
Postmark InboundHookUrl → helix-org config set transport.postmark
→ bootstrap → hire Sam → send a real email. Includes the
pending-approval workaround and the path to closing the
customer-reply loop once approved.
- roles/customer-service.md: Sam reads inbound, drafts a 2–4
sentence reply, escalates rather than fabricates, signs off
'— Sam' on its own line.
- workers/sam.md: identity stub (real first name, no brand voice,
knows when he doesn't know).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates the email demo to show two workers — customer service (Sam, alias=sam) and engineering (Lee, alias=engineer) — handling a customer query that requires escalation. Every leg of the four-hop cascade goes through Postmark; both Streams are bidirectional; threading via Message-Id stitches the whole thing into one logical conversation. Verified e2e in ~2:15 wall-clock: customer → Sam (Postmark inbound → s-support) Sam → Lee (Postmark send + inbound → s-engineer) Lee → Sam (Postmark send + inbound → s-support, [eng] prefix) Sam → customer (Postmark send → real inbox) Three Postmark sends, all returned status=200; same ThreadID flowed through every event. Changes: - demos/email/roles/customer-service.md: Sam now branches on Subject. `[eng]` prefix means Lee replied → walk s-support history by ThreadID to find the customer's original query, then reply to that customer with a paraphrased version of Lee's answer. Otherwise it's a customer query → answer directly when simple, forward to <hash>+engineer@inbound.postmarkapp.com when technical. ThreadID preservation is critical for the lookup. - demos/email/roles/engineer.md (new): Lee subscribes to s-engineer, drafts 3-6 sentence technical answers, replies to Sam at the +sam alias with `[eng] Re:` subject prefix and preserved ThreadID. - demos/email/workers/lee.md (new): identity stub. - demos/email/README.md: rewritten "Run the demo" section for the two-worker flow. Adds an explicit `<INBOUND_HASH>` sed substitution step (workers know each other's addresses via role text). Drops the disable_reply_to workaround now that the Postmark account is approved. New "What this shows" bullets call out workers-as-email-participants and ThreadID-as-spine. - demos/email/demo.cast: re-recorded asciicast of the four-hop cascade. The mp4 (demos/email/demo.mp4) is regenerated locally but stays gitignored, same convention as demos/getting-started/demo.mp4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the activation prompt only carried Body. The Worker had to call read_events to learn Subject, From, ThreadID, Extra — exactly the round-trip that caused the docs-engineer to misroute issue #3 to PR #2 during the github demo's E2E run. renderTrigger now formats every populated envelope field into the prompt, omitting empties for cleanliness. The Trigger.Body field is dropped; callers pass the full Message instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GitHub POSTs to a single /github/webhook endpoint; the transport HMAC-verifies via X-Hub-Signature-256 against the installation's webhook_secret, then fans the delivery out to every Stream whose Config.Repo matches repository.full_name and whose Config.Events whitelist contains the X-GitHub-Event header value. Inbound only — acting on a repo (label, comment, review, open PR) is the Worker's job via gh in its Environment. publish on a github stream returns a loud error rather than silently no-op'ing. The Message envelope is mapped from the upstream payload verbatim: Subject = issue/PR title, Body = body, ThreadID = "#<number>", MessageID = X-GitHub-Delivery, From = sender.login, Extra = the full payload with one synthetic top-level "event" key injected from the X-GitHub-Event header so Workers can branch on event type from Extra alone. Per-stream config is just routing identity (repo, events). Provider credentials (token, webhook_secret) live in server-level config under transport.github with both fields registered as Secrets so config get redacts them. Regression tests pin both names against silent leaks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Walkthrough demo of the doc-engineer role: spin up a real cloudflared tunnel, register the webhook, hire the Worker, then exercise the issues + pull_request + pull_request_review + issue_comment paths against a live GitHub repo. README narrates each step; demo.cast is the asciinema recording. Design doc covers the identity model (no machine user; gh auth token gives the engineer the operator's own identity for now), the inbound- only decision, the message envelope mapping, and the operational config / setup-via-chat flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move Role.Content and Worker.IdentityContent from disk-based markdown files
(role.md, identity.md) into the SQLite domain, enabling future evolution to
remote workspaces and eliminating hardcoded filename coupling.
## Key changes
- Domain: Worker interface now exposes IdentityContent() string method; both
HumanWorker and AIWorker carry immutable identity field. Constructor signatures
updated to accept identity content at hire time.
- Store: Added Update(ctx, worker) method to Workers interface, implemented via
GORM with identity_content column in worker table.
- Tools:
- update_role: Simplified to single DB write (removed 50-line fanOut loop).
- update_identity: New tool, mirrors update_role's shape.
- hire_worker: Creates DB records only; no env files at hire time.
- spawner: Added projectEnv() function that lazily writes role.md, identity.md,
agent.md to env at activation time, reading from DB.
- Bootstrap: Seed owner Worker with starter identity text; grant UpdateIdentityName.
- UI: Added /ui/org org-chart master-detail view. handleOrgIdentitySet() now
calls Workers.Update() instead of WriteFile(). Removed disk path tracking.
- Tests: Updated 12+ call sites with identity parameter; rewrote
TestUpdateRoleFanOut as TestProjectEnvWritesCanonicalState to verify
lazy-projection contract.
## Why
Hardcoded filenames across hire_worker, tools, spawner, and UI meant the system
could not evolve to support remote workspaces or other workspace configurations.
Making the DB the source of truth and performing projection at activation time
(not at hire time) lets future work extend to remote/ephemeral environments
without changing tool or bootstrap logic.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…ahead Add MCP prompts — server-defined slash commands gated by tool grants: - New prompts package: Prompt interface, Registry (mirrors tools.Registry), and builtins (Role and Help). - /help: Self-introspecting command that walks the registry at render time and produces a markdown list of every other prompt. Adding a new prompt automatically lights it up in /help without touching this file. - /role: Drafts a new Role from a title hint, expands to full interview template, saves via create_role, then offers edits or chains to hire_worker. - Server-side expansion in chat bridge: SendHandler intercepts inputs starting with /,expands them from template before sending to claude. User sees original input in their bubble. - Chat typeahead: CommandsHandler (POST /ui/chat/commands) renders matching prompts as HTML buttons on every keyup. Clicking fills the textarea and focuses it. - Enum schema constraints: WorkerKind and TransportKind now surface as enums in JSON Schema so MCP clients see valid values in tool input autocomplete. - Self-documenting validation: WorkerKind.Validate() formats errors as 'unknown worker kind "foo" (valid: "human", "ai")' so clients can self-correct without reading source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… polling, and tool visibility Major changes: - **Prevent cascading AI-worker activations**: Added SourceKind classifier (human/ai) to Trigger; workers now deprioritize or skip AI-origin events per agent.md discipline rules. Dispatcher skips self-reactivation on publish. Tests pin self-skip and source_kind behavior. - **Fix SSE newline rendering**: Split markdown fragments across multiple `data:` lines (SSE spec compliant) instead of collapsing newlines. Browser's EventSource rejoins with \n, preserving fenced code blocks and list formatting. - **Add markdown rendering**: Integrated goldmark for safe HTML rendering of Role/Activity text. Added .md CSS class for styling (lists, code, links, headers, blockquotes). Goldmark runs in safe mode; raw HTML is omitted (not escaped). Tests verify bold/lists/code/headings render and <script> tags are dropped. - **Real-time polling UI**: Added htmx polling (every 5s) to org chart, streams list, and events feed. Fixed htmx attribute inheritance breaking child click handlers by adding hx-disinherit="*" on poll parents. Implemented unified all-streams firehose when no stream selected. - **Tool grant visibility**: Org detail now shows each Worker's granted tools as alphabetically- sorted chip badges. Schema exposes MCP tool names; UI surfaces them without requiring a separate tools query. - **System prompt templates**: Moved agent.md and owner_role.md to embedded templates so content can be edited via /ui/org and doesn't require code changes. Agent.md teaches AI workers that human constraints don't apply and defaults to action. Owner role teaches delegation, polling pattern, and stream subscription during hiring. - **Hiring playbook refinement**: Updated role template to instruct on stream provisioning: list_streams → create if missing → subscribe. Emphasized "Worker without streams is half-hired." - **Title selection priority**: Sessions now track separate ai-title events and prefer them over user input for recents display (custom > ai-generated > fallback). - **Model/effort defaults**: Changed claude.model default to "sonnet" for cost predictability; added claude.effort default "low" to minimize extended-thinking budget. Both configurable via registry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… docs - Update 'make run' to automatically invoke 'helix-org serve' with sensible defaults (./envs, ./helix-org.db, :8080) rather than bare 'go run' - Enhance 'make clean' to kill running servers and remove local state (DB, envs) in one command - Improve CLAUDE.md to document these defaults and explain when/why to use each target - Clarify that ad-hoc 'go' commands should be avoided in favor of make targets to ensure consistent build/test environment Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
The dispatcher now coalesces events that arrive while an activation is running, passing them to the Spawner as a single batched []Trigger instead of spawning N separate claude processes. This collapses webhook cascades (e.g. five GitHub events from a worker's own action against a shared auth token) into one follow-up activation. Implementation: - Spawner signature: trigger -> []Trigger - Dispatcher: per-worker queue (pending slice + running flag) replaces per-worker mutex. enqueue() appends and starts runner if needed; run() drains queue in a loop until empty, calling spawner once per drain with the accumulated batch. - buildPrompt() renders multiple triggers as [1/N], [2/N], etc. when there's more than one, so agents see them as a numbered list. - New test proves coalescing: block first activation, publish 3 more events, release -> expect [e-1] then [e-2, e-3, e-4], not 5 separate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The github-engineer demo includes: - Full README with prerequisites, setup steps, and teardown instructions - Runnable end-to-end example of a software engineer worker on GitHub - Role documentation for handling task lifecycle, review feedback, and board state Updates to prerequisites: - Document required gh token scopes (project, read:project) - Document port availability requirement for helix-org server - Add instructions for creating and linking a GitHub Project v2 board Updates to software-engineer role: - Add dm tool to MCP surface (was: subscribe, read_events) - Add constraint: escalate setup-level problems to owner via DM instead of failing silently (covers: gh auth issues, missing board, repo unreachable, missing tools, discovery failure) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ession reuse End-to-end working chat + dispatcher → Helix zed_external desktops with the org-graph MCP attached. Each Worker (human or AI) gets its own project + agent app + git repo at hire time; new activations reuse the same long-lived chat session so follow-ups complete in seconds instead of paying a 3-minute cold-start every turn. Key fixes that came out of debugging against app.helix.ml: - HelixProjectApplier creates a Helix-internal git repo, seeds it with a README so `main` exists, creates the `helix-specs` branch, and pushes role/identity to `workers/<id>/.context/` on that branch. The desktop's startup script then materialises the helix-specs worktree at `~/work/helix-specs/` automatically. - Project-apply does NOT auto-create a repo; without one the desktop's startup script bails with "No repositories were cloned successfully" and Zed never launches. - StartChatRequest now sends `app_id` so `session.ParentApp` is set — Helix's external MCP proxy bails with "session has no associated agent" otherwise, and Zed never sees the helix MCP. - StartChatRequest sends `organization_id` (Helix doesn't auto-populate it from project_id; without it desktop quota falls back to the personal-org limit of 2). - Streaming-aware StartChatWithStatus: reads the SSE response, returns the session ID + a flag indicating whether the WS-not-ready race fired. Detached upstream context so the request survives past the caller's request ctx closing. - warmupAndRetry (chat bridge) and warmupSession (spawner) re-POST the same prompt every 8–20s until the dispatch lands. Helix's waitForExternalAgentReady checks connections globally, so the wait passes immediately when other users have desktops up; the per-session sendCommand then fails fast and Helix marks the interaction error (auto-wake won't recover state=error). The retry pattern absorbs the race client-side. - Spawner reuses worker.HelixSessionID() across activations. Each fresh session spawns a fresh container; reuse keeps it warm. - Owner-role hiring playbook updated: hire_worker MUST include `grants` matching the Role's Tools section. The MCP tool list is frozen by Helix's external-MCP-proxy cache for the lifetime of the first session, so granting later means the Worker can't see the tools until session restart. - Runtime switched from claude_code → zed_agent. claude_code talks directly to Anthropic and needs an API key wired into the container (which we don't); zed_agent routes inference back through Helix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n agent Live role-edit (update_role) now propagates to running Workers without requiring a session restart: - HelixProjectApplier.Ensure no longer early-returns on the fast path before pushing files. The expensive ApplyProject / CreateGitRepo / AttachRepo steps still skip when the project exists, but agent.md / role.md / identity.md are re-pushed to the helix-specs branch on every Ensure call. CreateBranch and PutFile are idempotent and cheap, so the cost is two HTTP calls per activation. - Spawner activation prompt (helixSpecsMandate) now ALWAYS runs `git pull --ff-only origin helix-specs` at the start of every activation (fall-through to `git worktree add` only when the worktree is missing). Without this, the agent reads the worktree's stale on-disk copy and the new role text never takes effect. - Activation prompt now also reads `.context/agent.md` first as the org-wide entrypoint, then role.md, then identity.md. - AgentMD threaded through HelixSpawnerConfig and HelixProjectApplier so the spawner+chat-backend both seed the org policy on apply. Validated end-to-end via demos/getting-started: publish hello → echo: hello (initial role) update_role r-echo → "loud: <BODY UPPERCASED>" publish hello → loud: HELLO ← live-edit takes effect Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d channel discipline
- Add **On anything else. Stay quiet** block (required in every Role) to establish
default behavior: don't post unless a trigger above matches and output is something
a human asked for.
- Require explicit output channel per trigger (`Post to s-{channel}` or "no post").
- Add constraint requiring workers to name the trigger before acting, enabling
audit-log inspection and forcing commitment to a frame.
- Clarify drafting instructions so LLM-generated Roles include these elements.
This addresses the "chatty colleague" failure mode at the template level: models now
have explicit permission boundaries and must name their reasoning.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2386a28 to
284d86b
Compare
- Move Helix-specific Worker fields off domain.Worker into a sidecar
WorkerRuntimeState store keyed on (workerID, backend, key). Drops
six methods from the domain interface and isolates per-runtime
pointers behind typed helpers in agent/helix/state.go.
- Move the runtime layer out of tools/: new agent/, agent/claude/,
agent/helix/ packages plus helix/helixclient/ (was tools/helixclient/).
tools/ now holds only org-graph MCP tools and Deps.
- Rename SpecsPublisher -> agent.WorkspaceSync. Logical-name contract
("role.md", "identity.md"); each backend translates to its own
layout (claude: <envsDir>/<wid>/<name>; helix:
workers/<wid>/.context/<name>). Fixes the prior path mismatch where
update_role wrote job/* but the activation mandate read .context/*.
- Move agent.md from tools/templates/ to agent/policy.md and embed as
agent.Policy so both runtimes share one source.
- Unify session shape: helix.Runtime ("zed_agent") and helix.AgentType
("zed_external") are non-configurable constants used by every
project apply and every /sessions/chat post. Drops chat.agent_type
config key and the SpawnerConfig.Runtime / ProjectApplier.Runtime
fields so the spawner and chat backend can no longer drift to
claude_code.
Verified end-to-end against app.helix.ml: getting-started demo (hire
echo, publish hello, echo: hello, live update_role, loud: HELLO).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New demo: operator raises NCR on shop floor → agent fans out to supervisor (Slack), customers (SMS), supplier (email held) → supervisor approves containment in one DM → agent confirms and kills/sends supplier email based on approval text. Shows the hold pattern and the split between agent (glue) and human (decisions). Verified end-to-end against app.helix.ml with comms-demo container. Three channels (email/slack/sms), two activations, ~90 seconds on stage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chat agent was creating s-ncr-raised with the default transport (local) because the hire prompt said "no config" — leaving it ambiguous whether the transport itself was needed. Symptom on stage: POST /webhooks/s-ncr-raised → 404 "is not a webhook stream". Three changes: - Hire prompt now spells out the create_stream JSON for every stream and explicitly says do not omit the transport field. - Adds a smoke-test curl after hire that fails fast if any stream is misconfigured. - Adds the local-transport failure mode to the Recovery table with the verbatim fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chat agent kept guessing wrong on transport.kind ("webhook",
"incoming-webhook", and {"kind":"webhook","direction":"in"}) because
the JSON schema exposed kind as a plain string with no enum and no
description. We already had a TransportKind enum surfacer wired up
in tools/schema.go — but createStreamTransport.Kind was typed as
string, not domain.TransportKind, so the enrichment never applied to
this schema.
- Retype createStreamTransport.Kind to domain.TransportKind so the
existing enum-and-description enrichment kicks in.
- Beef up the tool's Description with the valid kinds and a webhook
example for clients that don't render enum constraints.
Verified: schema now exposes
enum: ["local", "webhook", "email", "github"]
and bad kinds are rejected with the existing self-documenting error
("valid: \"local\", \"webhook\", \"email\", \"github\"").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
create_stream's schema now surfaces transport.kind as enum: ["local","webhook","email","github"] with a description, so the hire prompt no longer has to defend against the agent guessing "incoming-webhook" or omitting the transport entirely. - Trim the "do not omit transport" guardrail and the post-hire get_stream verification step — both were workarounds for the schema gap, now closed. - Add a note to always pass `chat --new` after rebuilding the binary; chat-driving claude caches MCP tool schemas at session start and won't see new enum constraints without a fresh session. - Soften (don't remove) the local-default Recovery row: stale chat sessions on a fresh binary can still hit it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The schema now exposes the valid transport kinds, so the prompt no longer needs literal JSON arguments — describing the streams in words is enough for the agent to call create_stream correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Smaller chat models reliably collapse the canonical
{"transport":{"kind":"webhook"}} object to its discriminator string
{"transport":"webhook"} once they've seen the kind enum on the
schema, then watch the call fail with a JSON-unmarshal error and
loop. Both shapes are unambiguous and mean the same thing — accept
both.
- Custom UnmarshalJSON on createStreamTransport handles either form.
- Schema declares transport as a oneOf [enum-string, object] so
strict-validating MCP clients accept the shorthand too.
- Tests cover both input forms and the schema shape.
Verified live: create_stream with transport:"webhook" produces a
stream with transportKind:"webhook"; the object form still works.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chat backend (chat.backend=helix) runs the chat-driving agent inside a Helix sandbox that does NOT have this repo checked out. Telling it to "read ./demos/manufacturing/roles/quality-bot.md" is a dead instruction — the file isn't there. The Zed agent then spirals through every other tool it has trying to find context: kodit_repositories, kodit_wiki, kodit_grep, curl on localhost:9876, ls on the helix-specs branch, etc. Fix: paste the entire role markdown inline in the hire prompt so the agent has zero reason to fetch anything from the filesystem. Add explicit "Use ONLY the helix-org MCP tools, do NOT read files, do NOT use kodit, do NOT curl URLs" steering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pointer schema arrived as Types:["object","null"]; setting Type without clearing Types produced an invalid jsonschema (both Type and Types non-zero is a marshal error), which broke MCP tools/list at session start and starved Claude of every helix-org tool. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bare `helix` pattern matched any directory named `helix` at any depth, which was silently swallowing helix-org/helix/ and helix-org/agent/helix/ — entire packages (helixclient, spawner, project applier, runtime state, workspace) sitting in the working tree but never reaching git. The original intent was to ignore the `helix` binary at known cmd paths; anchor it there so the helix-org subtree becomes trackable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…vider Adopts #2375 (the durable session-scoped message queue) and #2399 (cold-start dev-container wake) so helix-org no longer needs to fight the framework with a client-side warmup loop. ## helix client - New `SendSessionMessage(ctx, sid, content, opts)` posts to /api/v1/sessions/{id}/messages — Helix persists the interaction and pickupWaitingInteraction delivers it once the agent's WS is reachable. Returns 200 even when no agent is connected. - New `ListProviders` and `ListModelsForProvider`, plus a `ValidateProviderModel` helper that checks chat.provider / chat.model against the live Helix instance. We hit /v1/models with the provider query string (the bare aggregate endpoint excludes Anthropic and is unreliable). ## Spawner refactor (agent/helix/spawner.go) - Follow-up activations queue via `SendSessionMessage` — no StartChat round-trip. 290ms instead of 7s+ on a warm session. - First activations still use `StartChat` to create the session; on the cold-start `hadWSError` race we re-queue the same prompt via the durable endpoint instead of polling for up to 5 minutes. - Drops `warmupSession` (~40 lines). - New tests: `TestSpawnerFollowUpUsesSendSessionMessage` (asserts no StartChat on follow-up) and `TestSpawnerColdStartReQueues` (asserts the hadWSError → queue handoff). ## Chat-bridge refactor (server/chat/helix_bridge.go) - Same two-path treatment: follow-ups via `SendSessionMessage`, fresh sessions via `StartChat` with cold-start fallback to the queue. - Drops `warmupAndRetry` and the 5-minute background goroutine (~70 lines). - Existing test updated to assert follow-ups go through the queue. ## Provider/model validation - `bootstrap helix-runtime` now runs the validator after WhoAmI and prints the actual providers/models on failure. - `serve` refuses to start with bad chat.provider / chat.model and points operators at the exact config commands to fix it. Without this, a typo in chat.provider surfaces as a 422 from /sessions/{id}/zed-config three minutes later when the desktop tries to fetch its Zed config — with no obvious link back to the bad key. The validator turns that into a fail-fast at startup. ## Verified end-to-end against meta.helix.ml Final smoke session: ses_01kr9bcpcm9gnpr7k5y4fgjmdk - First send → StartChat (~31s for Zed cold boot) → "pong" - Follow-up → SendSessionMessage (347ms to queue) → response within ~10s Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds CheckDesktopQuota helper that hits /api/v1/config and refuses when max_concurrent_desktops would be exceeded by spinning up one more session. Wired into both code paths that open *new* zed_external sessions: - agent/helix/spawner.go::ensureSession (AI Worker activations) - server/chat/helix_bridge.go::send (owner chat first turn) Follow-ups skip the check — they reuse the warm container and don't allocate a new desktop slot. Without this, a quota-full Helix would let helix-org spin up the per- Worker project plumbing (apply secrets, attach MCP, create agent app) and only fail at the StartDesktop step with a generic 500 several seconds later. The new error message names the actual count and points operators at the fix: desktop quota reached on Helix (3/2 active) — stop one of the existing sessions before opening a new one The check is soft (no atomic reserve) — a parallel caller could still race for the last slot, in which case Helix's own quota error wins. That's acceptable; the goal is operator clarity in the common single- user case. Verified end-to-end against meta.helix.ml: with active=3/max=2, send returned 500 + actionable message in 289ms; after stopping two sessions (active=1), the same request opened a session in 7s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…way and embedded Spawner
Mounts helix-org as a per-user, alpha-gated feature inside Helix. Adds
an `alpha_features` column on users, a `requireFeature` middleware, a
sidebar entry for flagged users, an agent picker at /ui/alpha-agents,
and the helix-org htmx surface at /ui/. helix-org's MCP server is
exposed through Helix's MCP gateway at /api/v1/mcp/helix-org/workers/
{id}/mcp; the gateway extracts the worker id and forwards to the
in-process helix-org handler, so picked agents authenticate via the
calling user's api_key (baked into the agent config's MCP headers)
rather than reaching a separately auth-gated endpoint.
The new embedded Spawner activates AI Workers by opening a fresh
helix_agent chat session against a lazily-provisioned per-Worker clone
of the picked owner agent, with its MCP entry rewritten to scope at
/workers/<id>/mcp. Worker prompts include role.md + identity.md + the
agent policy; transcripts publish to s-activations-<workerID>. No Zed
sandbox per Worker — every activation is one LLM call's worth of
latency.
Chat bridge fixes that fell out of e2e testing: /ui/chat/send now runs
b.send in a detached 10-min context so htmx doesn't 500 on long
agent runs (the WS subscriber pushes the transcript regardless); the
follow-up path uses /sessions/chat with SessionID set (the
/sessions/{id}/messages queue endpoint helix-org's standalone build
targets doesn't exist in this Helix); and the startChat REST call uses
a dedicated long-timeout client to survive multi-step agent runs.
hire_worker accepts `grants` as a JSON-encoded string in addition to
an inline array — Sonnet sometimes wraps nested arrays this way.
Verified e2e against the rewritten getting-started demo: stream/role/
position created, w-echo hired and activated on hire, owner publish
triggers w-echo via the dispatcher, live role edit takes effect on the
next activation (echo: hello → loud: HELLO). All 28 helix-org MCP
tools surface to the picked agent.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nd-to-end
Replaces the in-process spawner from the previous commit with helix-org's
production helix.Spawner (per-Worker Helix project + git repo + Zed
sandbox), wired so the embedded SaaS alpha can drive it without
storing tokens at rest.
Worker activations now run as the hiring user, not the service account:
hire_worker reads `X-Helix-Org-User-Id` (forwarded by the MCP gateway
backend from the authenticated Helix user) off the request context and
persists it on WorkerRuntimeState. The Spawner's new `BearerForUser`
callback mints a fresh api_key per activation by user-id lookup —
implemented in the embedded host as `resolveUserHelixAPIKey`. Each
Worker's chat session, project apply, MCP attach and transcript
subscribe therefore happen as the user who hired the Worker (their
Claude subscription, their desktop quota, their audit trail). No
bearer tokens persisted in helix-org's domain at any point.
Workers run Claude Code on subscription credentials by default:
SpawnerConfig grows Runtime + Credentials fields, ProjectApplier
honours them (claude_code + subscription means no Provider/Model needed
on the per-Worker app). Helix's `addUserAPITokenToAgent` learns to set
CLAUDE_CODE_OAUTH_TOKEN + Anthropic-direct ANTHROPIC_BASE_URL on the
container env when the parent app uses subscription credentials. Two
session-handler bugs surfaced and were fixed along the way:
`ValidateAssistantModelConfig` and the codeAgentConfig/agentName lookup
in zed_config_handlers both only honoured spec-task-driven sessions —
they now also resolve via `session.ParentApp` so any zed_external
session opened via /sessions/chat against a code_agent-runtime app
ships the right runtime, not "zed-agent".
Live role/identity edits propagate to running Workers: the embedded
host wires `agenthelix.NewWorkspace` as `deps.Workspace`, so update_role
pushes the new role.md to the per-Worker repo on helix-specs. The
Workspace also clears the Worker's persisted SessionID on role.md /
identity.md publishes, forcing the next activation to open a fresh
Claude Code session that re-reads role.md instead of inheriting the
prior turn's cached content.
Tool argument tolerance: hire_worker.grants, read_events.{limit,wait},
read_streams.limit, and worker_log.{limit,wait} now accept their
declared ints either as JSON numbers or as JSON strings — Claude Code
intermittently emits typed params as strings when the schema isn't in
its discovered-tool set, and we'd rather absorb the quirk than fail
the activation. The MCP gateway also extracts the worker ID from the
URL suffix so per-Worker scoping (`/api/v1/mcp/helix-org/workers/<id>/mcp`)
works end-to-end, and helix-org's MCP handler hoists the Authorization
bearer onto ctx so tools can use it.
Verified end-to-end: hire a worker → Zed sandbox boots → Claude Code
authenticates via OAuth subscription → subscribes to s-general → exits
ok. Publish "hello" → dispatcher activates worker → "echo: hello"
appears. update_role to "loud" mode → session invalidated → next
activation publishes "loud: HELLO".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chart section was wrapped in a polling div with hx-trigger="every 5s" + hx-swap="outerHTML" + hx-select="#org-chart-section". Each tick fetched the entire 14KB /ui/org page, replaced the polling div with a fresh copy, and forced htmx to re-walk every node inside the chart SVG to re-bind hx-* attributes — hundreds of element scans on every swap. With htmx 2 the outerHTML swap also occasionally double-fires its replacement (timer not cleaned up across swaps), so the polling cascaded: each replace spawned another timer, each timer triggered another replace, browser tabs ground to a halt and the first click after a fresh load showed "request never received" in DevTools while follow-up clicks took ~20s. Split the chart into a standalone template (org_chart.html) and a dedicated endpoint GET /ui/org/chart that serves the chart fragment only. The polling div now does hx-swap="innerHTML" against itself — stable identity, single timer — and the polling interval is bumped 5s → 30s since the org graph rarely changes that often and the chart is CPU-expensive to re-bind even on the cheap path. Verified: page sits idle for 35s producing exactly 1 chart poll; clicking a node fires 1 detail fetch with no cascade. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The embedded build constructed the chat bridge and the org MCP server without a prompts registry — so typing `/help` (or any other slash command) just got forwarded to the LLM as the literal string "/help", with no expansion. The chat bridge has expandSlashCommand plumbing and the org MCP server has prompt support; both pick up their content from prompts.Registry but only when one is attached. Build the registry with prompts.RegisterBuiltins (same set the standalone helix-org binary uses — /help, /role, /worker etc.), attach it to the chat bridge via HelixBridge.WithPrompts, and pass it to the org server via helixorgserver.Server.WithPrompts. Typing "/help" now renders the auto-generated prompt body and the agent replies with the slash-command listing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the /ui/alpha-agents handler emitted raw inline HTML/CSS from Go — quick to ship during the alpha bring-up, but visually inconsistent with the rest of /ui/ (no sidebar, no head template, different fonts). Move it onto the same template machinery the chat, org, streams, and settings pages use: - New `helix-org/server/ui/templates/alpha_agents.html` with the standard shell (head + sidebar) and card-soft agent list matching the rest of /ui/. - New `AlphaAgentsPage` / `AlphaAgentRow` types in `helix-org/server/ui/pages.go`. - New exported `RenderAlphaAgents(w, ownerWorkerID, recents, page)` helper so the embedded SaaS host can render through helix-org's tmpl pipeline without dragging shell HTML into api/pkg/server/. - `helix_org_agent_picker.go` strips its 60 lines of inline HTML and hands an `AlphaAgentsPage` to `RenderAlphaAgents`. Picker logic (Helix /apps fetch, MCP attach) stays in api/ where it belongs; only the rendering moves into helix-org. - Picker now surfaces `code_agent_runtime` on each row so the operator can tell `claude_code` (subscription Claude Code) apart from `zed_agent` (Helix-proxied LLM) before picking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ngs knob
The previous architecture pivoted on a vestigial separation: the chat
bridge ran in "app-only mode" against an existing Helix agent picked
via chat.app_id, while AI Workers got their own per-Worker Helix
project + Zed sandbox via the Spawner. Two different code paths for
"run a Worker," surfaced via a /ui/alpha-agents picker that nobody
needed once the design clarified.
Right model: w-owner IS a Worker. ProjectApplier.Ensure runs the same
provisioning for the owner that it runs for any AI Worker — the chat
surface at /ui/ is a window onto w-owner's persistent zed_external
session. One default per-Worker config: `worker.runtime` (default
"claude_code"), implies subscription auth, no provider/model needed.
Changes:
- Chat bridge built with `Ensure: ProjectApplier` (not `AppIDFunc`).
Same applier the Spawner uses, same defaults, same MCP wiring.
- Drop `chat.app_id` and `chat.session_role` config keys. The
session-role is now hardcoded to "owner-chat" (Helix never reuses
it in any control path). Drop `helix.org_url` — the gateway URL is
derived from `helix.url`.
- Add `worker.runtime` config key (default "claude_code"). One knob.
- Delete /ui/alpha-agents: page handler, template, AlphaAgentsPage
type, RenderAlphaAgents helper, sidebar entry that opened it.
Sidebar shortcut now opens /ui/ chat directly.
- helix_org.go now builds one shared `*agenthelix.ProjectApplier` and
hands it to both the spawner and the chat bridge — single source of
truth for "Worker defaults" instead of duplicating Runtime/
Credentials/MCP-attach config in two places.
- Cold-start retry path in helix_bridge.go's `b.send` previously fell
back to `SendSessionMessage` (which targets a /sessions/{id}/messages
queue endpoint that doesn't exist in embedded Helix); switched to
StartChatWithStatus-with-SessionID, same pattern as the followup
path we fixed earlier.
- Detached chat-send goroutine was stripping the per-request bearer,
which pushed every owner-chat session onto the service api_key and
blocked Claude subscription lookup. Now reads
helixclient.BearerFromContext(r.Context()) up-front and rewraps it
onto the detached ctx, so the session lands on the actual logged-in
user and picks up their Claude subscription.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ridge session survives restarts Two related bugs surfaced when hiring an AI worker from the owner chat: the new worker's Helix project ended up under the service account instead of the logged-in user, and clicking on the owner worker in Helix's project list boots a fresh Zed sandbox instead of attaching to the one the chat surface is using. ProjectApplier was attaching the helix-org MCP entry on each auto-provisioned agent app using the static `MCPAuthBearer` field, which the embedded host filled with the service api_key. When the owner's sandbox called `hire_worker` over that MCP, the request authenticated as the service user; `hire_worker` then persisted the service user as `HiringUserID`; the Spawner used that ID via BearerForUser to mint a service-user api_key; and the resulting worker project ended up outside the user's org. Now ProjectApplier prefers the bearer in ctx (set by withHelixUserBearer on chat sends, or by BearerForUser inside the Spawner) and only falls back to the static field when ctx carries nothing — keeping the old service-account behaviour for standalone deploys. HelixBridge tracked its live session ID in process memory only, so every API restart orphaned the warm Zed sandbox. Added optional LoadSessionID/SaveSessionID callbacks on HelixConfig; the embedded host wires them to agenthelix.LoadState/SaveSession on WorkerRuntimeState, so the bridge picks up the same session_id the Spawner persists. After restart, /ui/chat/send recovers the pointer on its first call and continues the existing session instead of opening a new one. Side benefit: anyone opening w-owner's project page in Helix lands on the same session the chat surface is driving. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every reply was showing up twice in /ui/. Two paths were converging on the same SSE stream for zed_external sessions: 1. broadcastInteractions iterated session.Interactions returned by StartChatWithStatus and rendered each reply. 2. The WS subscriber attachSession kicked off translated message_completed frames and rendered them too. For helix_basic (app-only) the WS path is silent — interactions come back inline — so the synchronous render is the only source. For zed_external the WS path IS the canonical source — Interactions should never be populated inline because Helix's streaming handler returns the session ID early and the agent runs async. But the follow-up code path didn't set AgentType on its StartChatWithStatus request, so Helix dropped to the non-streaming handler which blocks until the agent finishes and DOES populate Interactions inline. Both paths then rendered. First-turn was OK because it explicitly set AgentType, but it still called broadcastInteractions unconditionally — masked by the empty Interactions slice the streaming handler returns. Fix: set AgentType=zed_external on follow-ups (matches first-turn) and gate broadcastInteractions on appOnly so zed_external never synchronous-renders even if Interactions sneak through. Comment update explaining why the two paths are mutually exclusive by agent_type, not just by "in practice." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Helix's per-project "Open Human Desktop" button matches on session_role="exploratory". We were writing role="owner-chat" on every chat session helix-org's bridge opened, so the button never found those sessions, always spawned a parallel sandbox, and the user thought the button was trashing their live desktop. For helix-org's model the owner chat IS the project's human session — there is no separate "exploratory" notion. Labelling matches reality and makes the button take the operator to the session their /ui/ chat is already driving. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same class of bug we already fixed on /ui/org: three concurrent hx-trigger="every Ns" triggers (one at 5s, two at 3s), each paired with hx-swap="outerHTML" against the trigger node itself. htmx 2's timer cleanup on outerHTML swap is racy, replacements stack up, and the browser tab spends ~20s of CPU per page load processing overlapping /ui/streams responses (each ~1KB of markup with a wide DOM walk to re-bind handlers). Killed all three pollers. The page renders once now; manual refresh to see new streams or events. A proper live update belongs on SSE (htmx-ext-sse is already on the page) rather than whole-page polling — defer that to a later pass. Audit of remaining hx-* usage in the templates: only org.html still polls, at 30s with hx-swap="innerHTML" on a stable shell — the fixed pattern. chat.html uses SSE + debounced keyup (no polling). The rest are click/form-driven. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduces helix-org, a standalone Go prototype for a hybrid human/AI organization system. This PR is a WIP/Draft collecting the core infrastructure, three transport implementations, MCP prompts (slash commands), and a set of runnable demos.
Core platform
Model Context Protocol (MCP) Integration: All mutations flow through MCP endpoints at
/workers/{id}/mcpusing Streamable HTTP transport. Tool visibility is grant-filtered per worker.MCP Prompts (Slash Commands): Server-defined prompts registered in the MCP surface alongside tools. Each prompt has a name, title, description, arguments, and a render method that produces seed messages. Grant-gated (a prompt requires a tool to be visible). Auto-generated
/helpcommand that walks the registry at render time — new prompts automatically appear without manual updates./rolecommand drafts a new Role from a title hint, expands to full interview template, saves via create_role, then offers edits or chains to hire_worker.Chat Typeahead: UI dropdown showing available slash commands on every keyup in the chat textarea. Server-side expansion in the chat bridge: SendHandler intercepts
/nameinputs, expands them from template before sending to claude. User sees original input in their bubble; claude gets the expanded text. Enables interactive discovery and reduces friction.Enum Schema Hints: WorkerKind and TransportKind surface as enums in the JSON Schema that MCP clients see, enabling better autocomplete. Validation errors are self-documenting:
unknown worker kind "foo" (valid: "human", "ai")so clients can self-correct.Prompt-Driven CLI: New
helix-org promptsubcommand spawns Claude Code with inline MCP configuration, enabling natural-language orchestration of the entire organization graph (Roles, Workers, Positions, Streams, Grants).Role vs Worker Split: Separates the job (Role: owner-edited markdown, fanned out via
update_role) from the person (Worker: per-hire identity, immutable). Allows live edits to job descriptions without touching identities.Environment Provisioning & Push Dispatch: Each Worker gets an isolated environment directory. When events land on subscribed Streams, the system spawns a fresh Claude Code activation (one-shot) with that worker's MCP endpoint. Role and identity are stamped into the environment; the agent reads them and acts on the event trigger.
Canonical Message envelope: Every
Event.Bodyis adomain.MessageJSON (From / To / Subject / Body / ThreadID / InReplyTo / MessageID / Extra). The spawner renders every populated field into the activation prompt so Workers branch on transport-shaped metadata directly, without a separateread_eventsround-trip.Simplified Grant Model: Grants are strictly
(WorkerID, ToolName)pairs with no enforcement/scope logic. A grant is the permission; the agent is trusted to comply.Transports
Streams own their I/O. Three transport kinds, each behind its own package:
/github/webhookendpoint, HMAC-verified viaX-Hub-Signature-256, fans out to every Stream whoserepo+eventswhitelist matches. Acting on a repo (label, comment, review, open PR) is the Worker's job viaghin its Environment;publishon a github stream returns a loud error. Demos: doc-engineer reviews docs PRs and tags docs issues; github-engineer implements features on a GitHub Project v2 board.Operational config
transport.<kind>keys with explicitSecrets: []stringdeclarations.helix-org config getredacts every declared secret; regression tests pin the spec for bothtransport.postmarkandtransport.githubso a future refactor can't silently drop a redaction entry.Design Philosophy
What's Inside
serve,bootstrap,chat,configsubcommandsTesting
All code is tested end-to-end:
make checkpasses: 0 lint issues, race detector cleanNext Steps (Post-WIP)
WIP because: the core prototype is complete and tested, but we're still validating the design with the broader team before finalizing the API surface and documentation.
Co-Authored-By: Claude Haiku 4.5 noreply@anthropic.com
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com
Update — domain/runtime split + unified Helix session shape
domain.Workerto a sidecarWorkerRuntimeStatekeyed on(workerID, backend, key). Six methods dropped from the domain interface.tools/: newagent/,agent/claude/,agent/helix/packages plushelix/helixclient/.tools/now holds only org-graph MCP tools.SpecsPublisher->agent.WorkspaceSync. Logical-name contract (role.md,identity.md); each backend translates to its own layout. Fixes the prior path mismatch whereupdate_rolewrotejob/*but the activation mandate read.context/*.agent.mdmoved fromtools/templates/toagent/policy.mdand embedded asagent.Policyso both runtimes share one source.helix.Runtime(zed_agent) andhelix.AgentType(zed_external) are non-configurable constants used by every project apply and every/sessions/chatpost. Dropschat.agent_typeconfig key and theRuntimefields on the spawner/applier so the spawner and chat backend can no longer drift toclaude_code.Verified end-to-end against
app.helix.ml(getting-started demo).Demos
The PR now includes seven runnable end-to-end demos:
ghCLI.Notes for reviewers
Manufacturing demo is the newest and was verified end-to-end against
app.helix.ml:claude).All demos pass
make ci(formatting, lint, race tests).