Self-Learning Prompt Evolution: extract what worked, augment role prompts

## Summary

Build a self-learning feedback loop where agent performance data flows back into role prompts. The mechanical recording layer already exists (`Learning.record_task_result` + `Capabilities.record_completion` — both wired and active). What's missing is closing the loop: extracting **what worked** from the decision graph and capabilities data, storing it as prompt augmentations, and injecting those augmentations into future sessions.

David's insight: `Role Prompt = Fixed Instructions (always present) + Learned Augmentation (evolves)`

## Current State

| Component | Status | Notes |
|-----------|--------|-------|
| `Learning.ex` | ✅ Active | Records task outcomes to Postgres (`agent_metrics` table). Has `recommend_model/1`, `recommend_team/1`, `top_performers/1`. |
| `Capabilities.ex` | ✅ Active | ETS-based per-agent perf tracking. `best_agent_for/2`, `record_completion/4`. Ephemeral (dies with team session). |
| `AgentMetric` schema | ✅ Active | team_id, agent_name, role, model, task_type, success, cost_usd, tokens_used, duration_ms |
| Prompt augmentation | ❌ Missing | No per-agent, per-session, or per-task-type prompt customization exists |
| Decision graph analysis | ❌ Missing | Decision nodes (abandoned, superseded, revisited) contain "what didn't work" signal but aren't mined |

## The Self-Learning Loop

```
Task completes
  → Learning.record_task_result/1           (already wired)
  → Capabilities.record_completion/4        (already wired)
  → NEW: extract "what worked" from decision graph
  → NEW: store as prompt augmentation for this role/task-type combo
  → Next session: inject augmentation into role prompt
```

## Sub-tasks

### 1. Decision Graph Mining
Extract learning signals from the decision graph after task/session completion:
- **Abandoned nodes** → "approaches that didn't work" (negative signal)
- **Superseded nodes** → "first approach was replaced by better one" (evolution signal)
- **Revisited nodes** → "had to come back to this — initial approach was incomplete"
- **High-confidence outcome nodes** → "this approach succeeded reliably"
- [ ] New module `Loomkin.Teams.LearningExtractor` — walks decision graph post-session, extracts structured lessons
- [ ] Lesson format: `%{role: atom, task_type: atom, lesson: string, signal: :positive | :negative, confidence: float}`

### 2. Prompt Augmentation Storage
Persist learned augmentations that survive across sessions:
- [ ] New Ecto schema `Loomkin.Schemas.PromptAugmentation` — `role`, `task_type`, `augmentation_text`, `signal` (:positive/:negative), `confidence`, `source_team_id`, `times_applied`, `inserted_at`
- [ ] `LearningExtractor` writes augmentations after mining
- [ ] Augmentations are scoped by role + task_type (e.g., "coder doing debugging tasks learned X")
- [ ] Include decay/relevance — augmentations used many times with continued success get higher weight; those that correlate with failures get pruned

### 3. Prompt Injection at Session Start
When building an agent's system prompt, append relevant learned augmentations:
- [ ] Modify `Role.get/1` (or the prompt-building pipeline) to query `PromptAugmentation` for the agent's role + current task type
- [ ] Inject as a `## Learned Patterns` section after the static role prompt but before context mesh
- [ ] Token budget: cap augmentations at ~512 tokens to avoid bloating the system prompt
- [ ] Format: "Based on prior experience: [lesson]. Confidence: [high/medium/low]"

### 4. Wire Capabilities to Learning (close the gap)
Currently `Capabilities.ex` (ETS, ephemeral) and `Learning.ex` (Postgres, persistent) are independent. Connect them:
- [ ] On team start, seed `Capabilities` ETS from `Learning` historical data for the project
- [ ] This gives agents a "warm start" — new teams benefit from past team performance data

## Design Principles

- **Single model architecture**: `Learning.recommend_model/1` exists but should NOT be used for automatic model switching. The user selects one model; all agents use it. Learning data informs prompt augmentation, not model routing.
- **Augmentations are text, not weights**: We're not fine-tuning the LLM. We're adding natural language guidance to the system prompt based on empirical performance data. This works with any model.
- **Gradual, not aggressive**: Start with simple positive/negative lessons. Don't over-engineer the extraction — a few high-quality augmentations per session beats noisy data.

## Example Augmentation Flow

```
Session 1: Coder agent writes tests after every implementation → task succeeds
Session 1: Coder agent skips tests on 2nd task → task fails, reviewer catches bugs

LearningExtractor mines decision graph:
  - Positive: "Running tests after each file edit caught issues early" (confidence: 0.8)
  - Negative: "Skipping test runs between edits led to accumulated bugs" (confidence: 0.7)

Session 2: Coder agent's prompt includes:
  ## Learned Patterns
  - Based on prior experience: Run tests after each file edit to catch issues early. Confidence: high.
```

## References
- Current role prompts: `lib/loomkin/teams/role.ex`
- Learning module: `lib/loomkin/teams/learning.ex`
- Capabilities: `lib/loomkin/teams/capabilities.ex`
- Decision graph: `lib/loomkin/decisions/graph.ex`, `pulse.ex`
- AgentMetric schema: `lib/loomkin/schemas/agent_metric.ex`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self-Learning Prompt Evolution: extract what worked, augment role prompts #116

Summary

Current State

The Self-Learning Loop

Sub-tasks

1. Decision Graph Mining

2. Prompt Augmentation Storage

3. Prompt Injection at Session Start

4. Wire Capabilities to Learning (close the gap)

Design Principles

Example Augmentation Flow

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Component	Status	Notes
`Learning.ex`	✅ Active	Records task outcomes to Postgres (`agent_metrics` table). Has `recommend_model/1`, `recommend_team/1`, `top_performers/1`.
`Capabilities.ex`	✅ Active	ETS-based per-agent perf tracking. `best_agent_for/2`, `record_completion/4`. Ephemeral (dies with team session).
`AgentMetric` schema	✅ Active	team_id, agent_name, role, model, task_type, success, cost_usd, tokens_used, duration_ms
Prompt augmentation	❌ Missing	No per-agent, per-session, or per-task-type prompt customization exists
Decision graph analysis	❌ Missing	Decision nodes (abandoned, superseded, revisited) contain "what didn't work" signal but aren't mined

Self-Learning Prompt Evolution: extract what worked, augment role prompts #116

Description

Summary

Current State

The Self-Learning Loop

Sub-tasks

1. Decision Graph Mining

2. Prompt Augmentation Storage

3. Prompt Injection at Session Start

4. Wire Capabilities to Learning (close the gap)

Design Principles

Example Augmentation Flow

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions