Closed-Loop Self-Improvement System

## Summary

The ralph-compounder plugin has the infrastructure for compounding knowledge (Plan → Work → Review → Compound cycle, `docs/solutions/` knowledge base, `learnings-researcher` agent) but **the feedback loop is open**: every compounding step requires manual human intervention and the system never learns from its own execution.

This issue tracks 5 capabilities that close the loop, turning the plugin from a **workflow orchestrator** into a **self-improving engineering system**.

```
CURRENT (open loop):
  Plan → Work → Review → [stop]
                              ↑ human must manually run /workflows:compound
                              ↑ no tracking of what worked
                              ↑ docs/solutions/ is empty

DESIRED (closed loop):
  Plan → Work → Review → Auto-Compound → Telemetry → Agent Tuning
    ↑                                                        ↓
    └── learnings-researcher finds past solutions ←──────────┘
```

---

## Features (5 capabilities, 4 phases)

### P0: Knowledge Seeding — Bootstrap the Flywheel

The entire compounding system depends on `docs/solutions/` having content. Without it, `learnings-researcher` returns nothing and the compound loop has no data.

- [ ] **`/seed-knowledge` command** — Scan git history for fix/revert commits, extract problem/solution pairs, generate `docs/solutions/` files with valid YAML frontmatter
- [ ] **Migrate CLAUDE.md learnings** — Convert existing "Key Learnings" section into proper `docs/solutions/` files
- [ ] **Seed plugin patterns** — Create 7+ foundational pattern docs in `docs/solutions/best-practices/` (frontmatter conventions, agent prompt structure, command orchestration, hook development, skill structure, multi-agent coordination, state management)
- [ ] **Create `docs/solutions/patterns/critical-patterns.md`** — Top 3 must-know patterns for required reading

### P1: Auto-Capture — Close the Compound Loop

- [ ] **Phase 5 in `/workflows:work`** — After shipping (PR created), automatically detect non-trivial learnings and run condensed compound (1 subagent, auto-classify, no prompts)
- [ ] **Update `/lfg` and `/slfg` chains** — Insert compound step between resolve-todos and feature-video
- [ ] **Post-review finding capture** — When `/resolve_todo_parallel` resolves P1/P2 findings, auto-document the resolution in `docs/solutions/`
- [ ] Trivial sessions (< 3 tasks, no debugging detours) skip capture silently

### P1: Agent Scoring — Track Agent Effectiveness

- [ ] **Review outcome tracking** — After triage, append agent scores to `docs/metrics/agent-scores.jsonl` (findings, accepted, rejected, fix rate per agent per PR)
- [ ] **`/agent-scores` dashboard** — Show precision and fix rate per agent, flag agents below 50% precision
- [ ] **Adaptive agent selection** — After 10+ reviews, annotate low-precision agents in review synthesis; `--all-agents` flag to override

### P2: Execution Telemetry — Measure What Matters

- [ ] **Workflow event log** — Each workflow command appends structured events to `docs/metrics/workflow-events.jsonl` (plan_created, work_started, work_completed, review_completed, etc.)
- [ ] **Plan accuracy metric** — Track planned tasks vs. actual tasks, scope creep rate
- [ ] **`/velocity` dashboard** — Show trends for plan-to-ship time, plan accuracy, scope creep, findings per PR, rework rate

### P3: Prompt Evolution — Agent Self-Improvement

- [ ] **Auto-generate improvement suggestions** — When agent drops below 50% precision over 10+ reviews, analyze rejected findings and generate specific prompt modification suggestions stored in `docs/metrics/agent-improvements/`
- [ ] **`/improve-agent` command** — Review and apply pending suggestions with diff preview, requires explicit user approval
- [ ] **Improvement validation** — After 5 reviews post-improvement, compare precision before/after; mark as validated or ineffective

---

## Key Design Decisions

| Decision | Choice | Rationale |
|----------|--------|-----------|
| Metrics format | JSONL | Append-only, greppable, no dependencies, `jq` compatible |
| Metrics storage | Gitignored by default | Per-project, per-developer data |
| Auto-capture mode | Condensed (1 subagent) | Full compound (5 subagents) too expensive for every session |
| Agent scoring threshold | 10+ reviews | Statistical significance, learning period for new agents |
| Prompt changes | Human approval required | Never auto-apply — bad prompts can cascade |
| Trivial session detection | < 3 tasks AND no debugging detours | Avoid noise in docs/solutions/ |

---

## New Commands

| Command | Phase | Purpose |
|---------|-------|---------|
| `/seed-knowledge` | P0 | Bootstrap docs/solutions/ from git history and existing learnings |
| `/agent-scores` | P1 | Dashboard showing agent precision and fix rate |
| `/velocity` | P2 | Dashboard showing workflow trends and plan accuracy |
| `/improve-agent` | P3 | Review and apply prompt improvement suggestions |

## Modified Files

| File | Phase | Change |
|------|-------|--------|
| `commands/workflows/work.md` | P1 | Add Phase 5: Capture Learnings |
| `commands/lfg.md` | P1 | Add compound step to chain |
| `commands/slfg.md` | P1 | Add compound step to chain |
| `commands/workflows/review.md` | P1 | Read agent scores before launching |
| `commands/workflows/compound.md` | P1 | Support condensed autonomous mode |

---

## Success Metrics

| Metric | Baseline | Target (P2) | Target (P4) |
|--------|----------|-------------|-------------|
| docs/solutions/ file count | 0 | 15+ | 50+ |
| learnings-researcher hit rate | 0% | 30%+ | 60%+ |
| Auto-capture rate | 0% | 70%+ | 85%+ |
| Agent precision (avg) | Unknown | Measured | +15% improvement |
| Plan accuracy | Unknown | Measured | +10% improvement |

---

## Full PRD

The complete PRD with detailed acceptance criteria, risk analysis, technical design decisions, and implementation phases is at:

**`specs/closed-loop-self-improvement/PRD.md`**

---

> _Each unit of engineering work should make subsequent units easier — not harder. This issue makes that automatic._

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closed-Loop Self-Improvement System #1

Summary

Features (5 capabilities, 4 phases)

P0: Knowledge Seeding — Bootstrap the Flywheel

P1: Auto-Capture — Close the Compound Loop

P1: Agent Scoring — Track Agent Effectiveness

P2: Execution Telemetry — Measure What Matters

P3: Prompt Evolution — Agent Self-Improvement

Key Design Decisions

New Commands

Modified Files

Success Metrics

Full PRD

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Decision	Choice	Rationale
Metrics format	JSONL	Append-only, greppable, no dependencies, `jq` compatible
Metrics storage	Gitignored by default	Per-project, per-developer data
Auto-capture mode	Condensed (1 subagent)	Full compound (5 subagents) too expensive for every session
Agent scoring threshold	10+ reviews	Statistical significance, learning period for new agents
Prompt changes	Human approval required	Never auto-apply — bad prompts can cascade
Trivial session detection	< 3 tasks AND no debugging detours	Avoid noise in docs/solutions/

Command	Phase	Purpose
`/seed-knowledge`	P0	Bootstrap docs/solutions/ from git history and existing learnings
`/agent-scores`	P1	Dashboard showing agent precision and fix rate
`/velocity`	P2	Dashboard showing workflow trends and plan accuracy
`/improve-agent`	P3	Review and apply prompt improvement suggestions

File	Phase	Change
`commands/workflows/work.md`	P1	Add Phase 5: Capture Learnings
`commands/lfg.md`	P1	Add compound step to chain
`commands/slfg.md`	P1	Add compound step to chain
`commands/workflows/review.md`	P1	Read agent scores before launching
`commands/workflows/compound.md`	P1	Support condensed autonomous mode

Metric	Baseline	Target (P2)	Target (P4)
docs/solutions/ file count	0	15+	50+
learnings-researcher hit rate	0%	30%+	60%+
Auto-capture rate	0%	70%+	85%+
Agent precision (avg)	Unknown	Measured	+15% improvement
Plan accuracy	Unknown	Measured	+10% improvement

Closed-Loop Self-Improvement System #1

Description

Summary

Features (5 capabilities, 4 phases)

P0: Knowledge Seeding — Bootstrap the Flywheel

P1: Auto-Capture — Close the Compound Loop

P1: Agent Scoring — Track Agent Effectiveness

P2: Execution Telemetry — Measure What Matters

P3: Prompt Evolution — Agent Self-Improvement

Key Design Decisions

New Commands

Modified Files

Success Metrics

Full PRD

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions