-
Notifications
You must be signed in to change notification settings - Fork 25
Self-Learning Prompt Evolution: extract what worked, augment role prompts #116
Description
Summary
Build a self-learning feedback loop where agent performance data flows back into role prompts. The mechanical recording layer already exists (Learning.record_task_result + Capabilities.record_completion — both wired and active). What's missing is closing the loop: extracting what worked from the decision graph and capabilities data, storing it as prompt augmentations, and injecting those augmentations into future sessions.
David's insight: Role Prompt = Fixed Instructions (always present) + Learned Augmentation (evolves)
Current State
| Component | Status | Notes |
|---|---|---|
Learning.ex |
✅ Active | Records task outcomes to Postgres (agent_metrics table). Has recommend_model/1, recommend_team/1, top_performers/1. |
Capabilities.ex |
✅ Active | ETS-based per-agent perf tracking. best_agent_for/2, record_completion/4. Ephemeral (dies with team session). |
AgentMetric schema |
✅ Active | team_id, agent_name, role, model, task_type, success, cost_usd, tokens_used, duration_ms |
| Prompt augmentation | ❌ Missing | No per-agent, per-session, or per-task-type prompt customization exists |
| Decision graph analysis | ❌ Missing | Decision nodes (abandoned, superseded, revisited) contain "what didn't work" signal but aren't mined |
The Self-Learning Loop
Task completes
→ Learning.record_task_result/1 (already wired)
→ Capabilities.record_completion/4 (already wired)
→ NEW: extract "what worked" from decision graph
→ NEW: store as prompt augmentation for this role/task-type combo
→ Next session: inject augmentation into role prompt
Sub-tasks
1. Decision Graph Mining
Extract learning signals from the decision graph after task/session completion:
- Abandoned nodes → "approaches that didn't work" (negative signal)
- Superseded nodes → "first approach was replaced by better one" (evolution signal)
- Revisited nodes → "had to come back to this — initial approach was incomplete"
- High-confidence outcome nodes → "this approach succeeded reliably"
- New module
Loomkin.Teams.LearningExtractor— walks decision graph post-session, extracts structured lessons - Lesson format:
%{role: atom, task_type: atom, lesson: string, signal: :positive | :negative, confidence: float}
2. Prompt Augmentation Storage
Persist learned augmentations that survive across sessions:
- New Ecto schema
Loomkin.Schemas.PromptAugmentation—role,task_type,augmentation_text,signal(:positive/:negative),confidence,source_team_id,times_applied,inserted_at -
LearningExtractorwrites augmentations after mining - Augmentations are scoped by role + task_type (e.g., "coder doing debugging tasks learned X")
- Include decay/relevance — augmentations used many times with continued success get higher weight; those that correlate with failures get pruned
3. Prompt Injection at Session Start
When building an agent's system prompt, append relevant learned augmentations:
- Modify
Role.get/1(or the prompt-building pipeline) to queryPromptAugmentationfor the agent's role + current task type - Inject as a
## Learned Patternssection after the static role prompt but before context mesh - Token budget: cap augmentations at ~512 tokens to avoid bloating the system prompt
- Format: "Based on prior experience: [lesson]. Confidence: [high/medium/low]"
4. Wire Capabilities to Learning (close the gap)
Currently Capabilities.ex (ETS, ephemeral) and Learning.ex (Postgres, persistent) are independent. Connect them:
- On team start, seed
CapabilitiesETS fromLearninghistorical data for the project - This gives agents a "warm start" — new teams benefit from past team performance data
Design Principles
- Single model architecture:
Learning.recommend_model/1exists but should NOT be used for automatic model switching. The user selects one model; all agents use it. Learning data informs prompt augmentation, not model routing. - Augmentations are text, not weights: We're not fine-tuning the LLM. We're adding natural language guidance to the system prompt based on empirical performance data. This works with any model.
- Gradual, not aggressive: Start with simple positive/negative lessons. Don't over-engineer the extraction — a few high-quality augmentations per session beats noisy data.
Example Augmentation Flow
Session 1: Coder agent writes tests after every implementation → task succeeds
Session 1: Coder agent skips tests on 2nd task → task fails, reviewer catches bugs
LearningExtractor mines decision graph:
- Positive: "Running tests after each file edit caught issues early" (confidence: 0.8)
- Negative: "Skipping test runs between edits led to accumulated bugs" (confidence: 0.7)
Session 2: Coder agent's prompt includes:
## Learned Patterns
- Based on prior experience: Run tests after each file edit to catch issues early. Confidence: high.
References
- Current role prompts:
lib/loomkin/teams/role.ex - Learning module:
lib/loomkin/teams/learning.ex - Capabilities:
lib/loomkin/teams/capabilities.ex - Decision graph:
lib/loomkin/decisions/graph.ex,pulse.ex - AgentMetric schema:
lib/loomkin/schemas/agent_metric.ex