Summary
Evolve Loomkin's agent orchestration from reactive event-driven coordination to self-organizing swarms that surpass the traditional sequential-phase model used by tools like Claude Code teams.
Foundation from PR #141 & #144
Recent work laid critical infrastructure that several epics build on:
- Bootstrap agents (Concierge + Orienter) — sessions now spawn two always-on agents. Concierge orchestrates, Orienter scans project context silently
- Concierge routing — user messages route through Concierge first (
maybe_route_to_concierge/2), making it the natural cross-team dispatch point for 6.7
- Deferred bootstrap — agents spawn on first
send_message, not session creation, aligning with 6.10's context-aware spawning philosophy
- Agent attribution —
from: field on broadcasts provides identity for cross-team messages
team_id on sessions — persistence link between session and team hierarchy
- Context keeper rehydration —
ContextKeeper.rehydrate_from_db/1 restores shared knowledge across restarts
The Problem with Traditional AI Agent Teams
Traditional AI code agent teams (Claude Code, Cursor, etc.) use rigid sequential phases:
Phase 1: Design agent (solo) → wait for completion
Phase 2: Component agents (parallel) → wait for ALL to complete
Phase 3: Integration agent (solo) → done
All orchestration intelligence lives in the leader's upfront plan. Agents can't adapt, can't communicate laterally, and can't start downstream work until an entire phase completes.
What Loomkin Already Does Better
| Capability |
Traditional Teams |
Loomkin Today |
| Agent spawning |
Pre-planned by leader |
Dynamic via TeamSpawn — LLM decides |
| Session bootstrap |
Cold start, no context |
Concierge + Orienter warm start with project scan |
| Task assignment |
Leader assigns explicitly |
smart_assign by capability + load |
| Conflict handling |
Avoided by design (non-overlapping files) |
Real-time detection (file, approach, decision) |
| Message priority |
None (sequential) |
4-tier routing with urgent interrupts |
| Stuck recovery |
None (wait forever) |
Rebalancer detects, nudges, escalates |
| Knowledge sharing |
None between agents |
RelevanceScorer + cross-team propagation |
| Consensus |
Leader decides |
Weighted voting (expertise × capability × confidence) |
| User routing |
Single agent |
Concierge dispatches to specialists |
What's Missing
Despite the above, sub-teams are completely isolated from each other. Communication is one-way upward only (insights/blockers propagate to parent via comms.ex). Sibling sub-teams cannot:
- Send direct messages to each other
- Ask questions across team boundaries
- Share tasks or coordinate work laterally
- Even discover each other's existence
Additionally, tasks run to completion with no mid-execution steering, no partial results, and no speculative parallelism.
Sub-tasks
6.3: Interruptible Checkpoints (CRITICAL)
Insert yield points in AgentLoop after LLM response and after each tool execution. Allow the user (or system) to pause, inspect, redirect, or cancel an agent mid-loop rather than waiting for full task completion.
- Add checkpoint callbacks in
agent_loop.ex after do_loop LLM call and after execute_single_tool
- New agent status
:paused with resume capability
- User-facing "pause/steer" control in activity feed
- Must handle auto-initiated loops (Orienter's
handle_continue(:auto_orient)), not just user-triggered ones
- Foundation for all other epics — without this, agents are black boxes during execution
Existing foundation: Permission system (check_permission callback, :waiting_permission status, PermissionComponent modal) provides the primitive for single-tool blocking. Extend to full checkpoint protocol.
6.4: Dynamic Task Dependencies (HIGH)
Extend TeamTaskDep beyond simple :blocks to support content-aware coupling. Tasks should depend on milestones ("schema_ready", "API defined"), not just full task completion.
- New dependency type
:requires_output — inject predecessor's output into dependent task context
- Milestone signaling: agents emit named checkpoints that unblock specific dependents
- Dynamic dependency creation: agents discover mid-work that a new dependency should exist
- Priority inheritance: urgent downstream tasks make their blockers urgent
6.5: Speculative Execution (HIGH)
Allow agents to begin work optimistically before all dependencies resolve. Mark outputs as "tentative" and merge or discard when assumptions are confirmed.
- Tentative task state with assumption tracking
- Merge protocol when speculative work validates against actual results
- Discard + replay when assumptions are wrong
- Extends ConflictDetector to handle intentional overlap (speculative vs accidental)
6.6: Readiness Signaling (MEDIUM)
Add fine-grained task states beyond pending → in_progress → completed. Agents can signal readiness for integration, request review, or indicate partial availability.
- New states:
:ready_for_review, :paused, :blocked, :partially_complete
- Agents emit
:agent_ready events that other agents can subscribe to
- Rendezvous points: named synchronization barriers where multiple agents signal readiness and a coordinator action triggers when all arrive
Existing prototype: Orienter→Concierge handshake via peer_message after handle_continue(:auto_orient) is a primitive readiness signal. Generalize this pattern.
6.7: Cross-Team Communication (HIGH PRIORITY)
This is a critical gap. Currently sub-teams have completely isolated PubSub topic spaces. All cross-team knowledge flows one-way upward (only :insight and :blocker types via maybe_propagate_to_parent/2).
What needs to change:
- Sub-team ↔ sub-team peer messaging — sibling teams coordinate without routing through the lead
- Parent ↔ sub-team bidirectional queries — parent can ask sub-team agents questions, not just receive insights
- Cross-team task visibility — agents can see and reference tasks across team boundaries
- Team discovery — agents can discover sibling teams and their members (scope: sibling-to-sibling discovery; Concierge already knows its spawned agents)
- PubSub bridge or shared topic layer for cross-team routing
Fix parent_team_id bug in agent_loop.ex:309 FIXED — now resolves actual parent via Manager.get_parent_team/1
Existing foundation: Concierge routing provides user→agent dispatch. The gap is specifically lateral (sibling↔sibling) and downward (parent→child queries).
6.8: Agent Negotiation (MEDIUM)
Allow agents to counter-propose task assignments rather than silently accepting or being force-assigned.
- Agent can respond to
:task_assigned with a counter-proposal (suggest better-suited agent, request clarification, flag conflicts)
- Negotiation protocol with timeout fallback to current behavior
- Reduces wasted cycles from misassigned tasks
6.9: Partial Task Results (MEDIUM)
Tasks currently succeed or fail atomically. Enable intermediate outputs that downstream dependents can consume immediately.
- Streaming task results: agents emit partial outputs as they work
- Dependent tasks can start with partial predecessor data
- Pipelined workflows: task A produces data → task B consumes incrementally → task C integrates
- Complements 6.4 (milestone dependencies) with actual data flow
6.10: Adaptive Team Spawning (LOW)
Automatically scale team composition based on workload rather than requiring explicit team_spawn calls.
- Auto-scaler monitors ratio of unblocked tasks to idle agents
- Spawns specialist agents when backlog exceeds threshold
- Work stealing: idle agents claim tasks from overloaded agents' queues
- Extends Rebalancer (currently only detects stuck agents) to handle capacity
Note: Concierge already does manual adaptive spawning via team_spawn tool with LLM-driven decisions. This epic adds automatic scaling on top.
The Vision
From: Leader plans everything → agents execute in waves → leader integrates
To: Leader seeds goals → agents discover work → system scales → lateral coordination → emergent completion
Recommended Priority Order
- 6.3 Checkpoints — foundation for user steering (everything else builds on this)
- 6.7 Cross-Team Comms — removes the biggest structural bottleneck
- 6.4 Dynamic Dependencies — enables content-aware task coupling
- 6.5 Speculative Execution — unlocks true parallelism
- 6.6 Readiness Signaling — fine-grained observation + control (Orienter pattern as prototype)
- 6.9 Partial Results — pipelined workflows
- 6.8 Agent Negotiation — agent autonomy
- 6.10 Adaptive Spawning — dynamic team composition
Key Files
lib/loomkin/agent_loop.ex — LLM loop, tool execution, checkpoints target
lib/loomkin/teams/agent.ex — GenServer, async execution, priority dispatch
lib/loomkin/teams/manager.ex — team/agent lifecycle, sub-team creation
lib/loomkin/teams/comms.ex — PubSub broadcasting, cross-team propagation
lib/loomkin/teams/priority_router.ex — message classification
lib/loomkin/teams/rebalancer.ex — stuck agent detection
lib/loomkin/teams/conflict_detector.ex — conflict detection
lib/loomkin/teams/capabilities.ex — capability tracking, smart_assign
lib/loomkin/teams/collective_decision.ex — weighted voting
lib/loomkin/teams/query_router.ex — question routing (same-team only today)
lib/loomkin/schemas/team_task_dep.ex — task dependency schema
lib/loomkin/teams/role.ex — Concierge/Orienter/specialist role definitions
lib/loomkin/session/session.ex — Concierge routing, session lifecycle
References
Summary
Evolve Loomkin's agent orchestration from reactive event-driven coordination to self-organizing swarms that surpass the traditional sequential-phase model used by tools like Claude Code teams.
Foundation from PR #141 & #144
Recent work laid critical infrastructure that several epics build on:
maybe_route_to_concierge/2), making it the natural cross-team dispatch point for 6.7send_message, not session creation, aligning with 6.10's context-aware spawning philosophyfrom:field on broadcasts provides identity for cross-team messagesteam_idon sessions — persistence link between session and team hierarchyContextKeeper.rehydrate_from_db/1restores shared knowledge across restartsThe Problem with Traditional AI Agent Teams
Traditional AI code agent teams (Claude Code, Cursor, etc.) use rigid sequential phases:
All orchestration intelligence lives in the leader's upfront plan. Agents can't adapt, can't communicate laterally, and can't start downstream work until an entire phase completes.
What Loomkin Already Does Better
smart_assignby capability + loadWhat's Missing
Despite the above, sub-teams are completely isolated from each other. Communication is one-way upward only (insights/blockers propagate to parent via
comms.ex). Sibling sub-teams cannot:Additionally, tasks run to completion with no mid-execution steering, no partial results, and no speculative parallelism.
Sub-tasks
6.3: Interruptible Checkpoints (CRITICAL)
Insert yield points in
AgentLoopafter LLM response and after each tool execution. Allow the user (or system) to pause, inspect, redirect, or cancel an agent mid-loop rather than waiting for full task completion.agent_loop.exafterdo_loopLLM call and afterexecute_single_tool:pausedwith resume capabilityhandle_continue(:auto_orient)), not just user-triggered onesExisting foundation: Permission system (
check_permissioncallback,:waiting_permissionstatus,PermissionComponentmodal) provides the primitive for single-tool blocking. Extend to full checkpoint protocol.6.4: Dynamic Task Dependencies (HIGH)
Extend
TeamTaskDepbeyond simple:blocksto support content-aware coupling. Tasks should depend on milestones ("schema_ready", "API defined"), not just full task completion.:requires_output— inject predecessor's output into dependent task context6.5: Speculative Execution (HIGH)
Allow agents to begin work optimistically before all dependencies resolve. Mark outputs as "tentative" and merge or discard when assumptions are confirmed.
6.6: Readiness Signaling (MEDIUM)
Add fine-grained task states beyond
pending → in_progress → completed. Agents can signal readiness for integration, request review, or indicate partial availability.:ready_for_review,:paused,:blocked,:partially_complete:agent_readyevents that other agents can subscribe toExisting prototype: Orienter→Concierge handshake via
peer_messageafterhandle_continue(:auto_orient)is a primitive readiness signal. Generalize this pattern.6.7: Cross-Team Communication (HIGH PRIORITY)
This is a critical gap. Currently sub-teams have completely isolated PubSub topic spaces. All cross-team knowledge flows one-way upward (only
:insightand:blockertypes viamaybe_propagate_to_parent/2).What needs to change:
FixFIXED — now resolves actual parent viaparent_team_idbug inagent_loop.ex:309Manager.get_parent_team/1Existing foundation: Concierge routing provides user→agent dispatch. The gap is specifically lateral (sibling↔sibling) and downward (parent→child queries).
6.8: Agent Negotiation (MEDIUM)
Allow agents to counter-propose task assignments rather than silently accepting or being force-assigned.
:task_assignedwith a counter-proposal (suggest better-suited agent, request clarification, flag conflicts)6.9: Partial Task Results (MEDIUM)
Tasks currently succeed or fail atomically. Enable intermediate outputs that downstream dependents can consume immediately.
6.10: Adaptive Team Spawning (LOW)
Automatically scale team composition based on workload rather than requiring explicit
team_spawncalls.Note: Concierge already does manual adaptive spawning via
team_spawntool with LLM-driven decisions. This epic adds automatic scaling on top.The Vision
From: Leader plans everything → agents execute in waves → leader integrates
To: Leader seeds goals → agents discover work → system scales → lateral coordination → emergent completion
Recommended Priority Order
Key Files
lib/loomkin/agent_loop.ex— LLM loop, tool execution, checkpoints targetlib/loomkin/teams/agent.ex— GenServer, async execution, priority dispatchlib/loomkin/teams/manager.ex— team/agent lifecycle, sub-team creationlib/loomkin/teams/comms.ex— PubSub broadcasting, cross-team propagationlib/loomkin/teams/priority_router.ex— message classificationlib/loomkin/teams/rebalancer.ex— stuck agent detectionlib/loomkin/teams/conflict_detector.ex— conflict detectionlib/loomkin/teams/capabilities.ex— capability tracking, smart_assignlib/loomkin/teams/collective_decision.ex— weighted votinglib/loomkin/teams/query_router.ex— question routing (same-team only today)lib/loomkin/schemas/team_task_dep.ex— task dependency schemalib/loomkin/teams/role.ex— Concierge/Orienter/specialist role definitionslib/loomkin/session/session.ex— Concierge routing, session lifecycleReferences
/memory/orchestration-synthesis.mdTask.Supervisor.async_nolinkpattern