Skip to content

Epic 6.3–6.10: Self-Organizing Agent Orchestration #129

@bleuropa

Description

@bleuropa

Summary

Evolve Loomkin's agent orchestration from reactive event-driven coordination to self-organizing swarms that surpass the traditional sequential-phase model used by tools like Claude Code teams.

Foundation from PR #141 & #144

Recent work laid critical infrastructure that several epics build on:

  • Bootstrap agents (Concierge + Orienter) — sessions now spawn two always-on agents. Concierge orchestrates, Orienter scans project context silently
  • Concierge routing — user messages route through Concierge first (maybe_route_to_concierge/2), making it the natural cross-team dispatch point for 6.7
  • Deferred bootstrap — agents spawn on first send_message, not session creation, aligning with 6.10's context-aware spawning philosophy
  • Agent attributionfrom: field on broadcasts provides identity for cross-team messages
  • team_id on sessions — persistence link between session and team hierarchy
  • Context keeper rehydrationContextKeeper.rehydrate_from_db/1 restores shared knowledge across restarts

The Problem with Traditional AI Agent Teams

Traditional AI code agent teams (Claude Code, Cursor, etc.) use rigid sequential phases:

Phase 1: Design agent (solo)         → wait for completion
Phase 2: Component agents (parallel) → wait for ALL to complete
Phase 3: Integration agent (solo)    → done

All orchestration intelligence lives in the leader's upfront plan. Agents can't adapt, can't communicate laterally, and can't start downstream work until an entire phase completes.

What Loomkin Already Does Better

Capability Traditional Teams Loomkin Today
Agent spawning Pre-planned by leader Dynamic via TeamSpawn — LLM decides
Session bootstrap Cold start, no context Concierge + Orienter warm start with project scan
Task assignment Leader assigns explicitly smart_assign by capability + load
Conflict handling Avoided by design (non-overlapping files) Real-time detection (file, approach, decision)
Message priority None (sequential) 4-tier routing with urgent interrupts
Stuck recovery None (wait forever) Rebalancer detects, nudges, escalates
Knowledge sharing None between agents RelevanceScorer + cross-team propagation
Consensus Leader decides Weighted voting (expertise × capability × confidence)
User routing Single agent Concierge dispatches to specialists

What's Missing

Despite the above, sub-teams are completely isolated from each other. Communication is one-way upward only (insights/blockers propagate to parent via comms.ex). Sibling sub-teams cannot:

  • Send direct messages to each other
  • Ask questions across team boundaries
  • Share tasks or coordinate work laterally
  • Even discover each other's existence

Additionally, tasks run to completion with no mid-execution steering, no partial results, and no speculative parallelism.

Sub-tasks

6.3: Interruptible Checkpoints (CRITICAL)

Insert yield points in AgentLoop after LLM response and after each tool execution. Allow the user (or system) to pause, inspect, redirect, or cancel an agent mid-loop rather than waiting for full task completion.

  • Add checkpoint callbacks in agent_loop.ex after do_loop LLM call and after execute_single_tool
  • New agent status :paused with resume capability
  • User-facing "pause/steer" control in activity feed
  • Must handle auto-initiated loops (Orienter's handle_continue(:auto_orient)), not just user-triggered ones
  • Foundation for all other epics — without this, agents are black boxes during execution

Existing foundation: Permission system (check_permission callback, :waiting_permission status, PermissionComponent modal) provides the primitive for single-tool blocking. Extend to full checkpoint protocol.

6.4: Dynamic Task Dependencies (HIGH)

Extend TeamTaskDep beyond simple :blocks to support content-aware coupling. Tasks should depend on milestones ("schema_ready", "API defined"), not just full task completion.

  • New dependency type :requires_output — inject predecessor's output into dependent task context
  • Milestone signaling: agents emit named checkpoints that unblock specific dependents
  • Dynamic dependency creation: agents discover mid-work that a new dependency should exist
  • Priority inheritance: urgent downstream tasks make their blockers urgent

6.5: Speculative Execution (HIGH)

Allow agents to begin work optimistically before all dependencies resolve. Mark outputs as "tentative" and merge or discard when assumptions are confirmed.

  • Tentative task state with assumption tracking
  • Merge protocol when speculative work validates against actual results
  • Discard + replay when assumptions are wrong
  • Extends ConflictDetector to handle intentional overlap (speculative vs accidental)

6.6: Readiness Signaling (MEDIUM)

Add fine-grained task states beyond pending → in_progress → completed. Agents can signal readiness for integration, request review, or indicate partial availability.

  • New states: :ready_for_review, :paused, :blocked, :partially_complete
  • Agents emit :agent_ready events that other agents can subscribe to
  • Rendezvous points: named synchronization barriers where multiple agents signal readiness and a coordinator action triggers when all arrive

Existing prototype: Orienter→Concierge handshake via peer_message after handle_continue(:auto_orient) is a primitive readiness signal. Generalize this pattern.

6.7: Cross-Team Communication (HIGH PRIORITY)

This is a critical gap. Currently sub-teams have completely isolated PubSub topic spaces. All cross-team knowledge flows one-way upward (only :insight and :blocker types via maybe_propagate_to_parent/2).

What needs to change:

  • Sub-team ↔ sub-team peer messaging — sibling teams coordinate without routing through the lead
  • Parent ↔ sub-team bidirectional queries — parent can ask sub-team agents questions, not just receive insights
  • Cross-team task visibility — agents can see and reference tasks across team boundaries
  • Team discovery — agents can discover sibling teams and their members (scope: sibling-to-sibling discovery; Concierge already knows its spawned agents)
  • PubSub bridge or shared topic layer for cross-team routing
  • Fix parent_team_id bug in agent_loop.ex:309 FIXED — now resolves actual parent via Manager.get_parent_team/1

Existing foundation: Concierge routing provides user→agent dispatch. The gap is specifically lateral (sibling↔sibling) and downward (parent→child queries).

6.8: Agent Negotiation (MEDIUM)

Allow agents to counter-propose task assignments rather than silently accepting or being force-assigned.

  • Agent can respond to :task_assigned with a counter-proposal (suggest better-suited agent, request clarification, flag conflicts)
  • Negotiation protocol with timeout fallback to current behavior
  • Reduces wasted cycles from misassigned tasks

6.9: Partial Task Results (MEDIUM)

Tasks currently succeed or fail atomically. Enable intermediate outputs that downstream dependents can consume immediately.

  • Streaming task results: agents emit partial outputs as they work
  • Dependent tasks can start with partial predecessor data
  • Pipelined workflows: task A produces data → task B consumes incrementally → task C integrates
  • Complements 6.4 (milestone dependencies) with actual data flow

6.10: Adaptive Team Spawning (LOW)

Automatically scale team composition based on workload rather than requiring explicit team_spawn calls.

  • Auto-scaler monitors ratio of unblocked tasks to idle agents
  • Spawns specialist agents when backlog exceeds threshold
  • Work stealing: idle agents claim tasks from overloaded agents' queues
  • Extends Rebalancer (currently only detects stuck agents) to handle capacity

Note: Concierge already does manual adaptive spawning via team_spawn tool with LLM-driven decisions. This epic adds automatic scaling on top.

The Vision

From: Leader plans everything → agents execute in waves → leader integrates

To: Leader seeds goals → agents discover work → system scales → lateral coordination → emergent completion

Recommended Priority Order

  1. 6.3 Checkpoints — foundation for user steering (everything else builds on this)
  2. 6.7 Cross-Team Comms — removes the biggest structural bottleneck
  3. 6.4 Dynamic Dependencies — enables content-aware task coupling
  4. 6.5 Speculative Execution — unlocks true parallelism
  5. 6.6 Readiness Signaling — fine-grained observation + control (Orienter pattern as prototype)
  6. 6.9 Partial Results — pipelined workflows
  7. 6.8 Agent Negotiation — agent autonomy
  8. 6.10 Adaptive Spawning — dynamic team composition

Key Files

  • lib/loomkin/agent_loop.ex — LLM loop, tool execution, checkpoints target
  • lib/loomkin/teams/agent.ex — GenServer, async execution, priority dispatch
  • lib/loomkin/teams/manager.ex — team/agent lifecycle, sub-team creation
  • lib/loomkin/teams/comms.ex — PubSub broadcasting, cross-team propagation
  • lib/loomkin/teams/priority_router.ex — message classification
  • lib/loomkin/teams/rebalancer.ex — stuck agent detection
  • lib/loomkin/teams/conflict_detector.ex — conflict detection
  • lib/loomkin/teams/capabilities.ex — capability tracking, smart_assign
  • lib/loomkin/teams/collective_decision.ex — weighted voting
  • lib/loomkin/teams/query_router.ex — question routing (same-team only today)
  • lib/loomkin/schemas/team_task_dep.ex — task dependency schema
  • lib/loomkin/teams/role.ex — Concierge/Orienter/specialist role definitions
  • lib/loomkin/session/session.ex — Concierge routing, session lifecycle

References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or improvement

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions