Skip to content

Closed-Loop Self-Improvement System #1

@kavanaghpatrick

Description

@kavanaghpatrick

Summary

The ralph-compounder plugin has the infrastructure for compounding knowledge (Plan → Work → Review → Compound cycle, docs/solutions/ knowledge base, learnings-researcher agent) but the feedback loop is open: every compounding step requires manual human intervention and the system never learns from its own execution.

This issue tracks 5 capabilities that close the loop, turning the plugin from a workflow orchestrator into a self-improving engineering system.

CURRENT (open loop):
  Plan → Work → Review → [stop]
                              ↑ human must manually run /workflows:compound
                              ↑ no tracking of what worked
                              ↑ docs/solutions/ is empty

DESIRED (closed loop):
  Plan → Work → Review → Auto-Compound → Telemetry → Agent Tuning
    ↑                                                        ↓
    └── learnings-researcher finds past solutions ←──────────┘

Features (5 capabilities, 4 phases)

P0: Knowledge Seeding — Bootstrap the Flywheel

The entire compounding system depends on docs/solutions/ having content. Without it, learnings-researcher returns nothing and the compound loop has no data.

  • /seed-knowledge command — Scan git history for fix/revert commits, extract problem/solution pairs, generate docs/solutions/ files with valid YAML frontmatter
  • Migrate CLAUDE.md learnings — Convert existing "Key Learnings" section into proper docs/solutions/ files
  • Seed plugin patterns — Create 7+ foundational pattern docs in docs/solutions/best-practices/ (frontmatter conventions, agent prompt structure, command orchestration, hook development, skill structure, multi-agent coordination, state management)
  • Create docs/solutions/patterns/critical-patterns.md — Top 3 must-know patterns for required reading

P1: Auto-Capture — Close the Compound Loop

  • Phase 5 in /workflows:work — After shipping (PR created), automatically detect non-trivial learnings and run condensed compound (1 subagent, auto-classify, no prompts)
  • Update /lfg and /slfg chains — Insert compound step between resolve-todos and feature-video
  • Post-review finding capture — When /resolve_todo_parallel resolves P1/P2 findings, auto-document the resolution in docs/solutions/
  • Trivial sessions (< 3 tasks, no debugging detours) skip capture silently

P1: Agent Scoring — Track Agent Effectiveness

  • Review outcome tracking — After triage, append agent scores to docs/metrics/agent-scores.jsonl (findings, accepted, rejected, fix rate per agent per PR)
  • /agent-scores dashboard — Show precision and fix rate per agent, flag agents below 50% precision
  • Adaptive agent selection — After 10+ reviews, annotate low-precision agents in review synthesis; --all-agents flag to override

P2: Execution Telemetry — Measure What Matters

  • Workflow event log — Each workflow command appends structured events to docs/metrics/workflow-events.jsonl (plan_created, work_started, work_completed, review_completed, etc.)
  • Plan accuracy metric — Track planned tasks vs. actual tasks, scope creep rate
  • /velocity dashboard — Show trends for plan-to-ship time, plan accuracy, scope creep, findings per PR, rework rate

P3: Prompt Evolution — Agent Self-Improvement

  • Auto-generate improvement suggestions — When agent drops below 50% precision over 10+ reviews, analyze rejected findings and generate specific prompt modification suggestions stored in docs/metrics/agent-improvements/
  • /improve-agent command — Review and apply pending suggestions with diff preview, requires explicit user approval
  • Improvement validation — After 5 reviews post-improvement, compare precision before/after; mark as validated or ineffective

Key Design Decisions

Decision Choice Rationale
Metrics format JSONL Append-only, greppable, no dependencies, jq compatible
Metrics storage Gitignored by default Per-project, per-developer data
Auto-capture mode Condensed (1 subagent) Full compound (5 subagents) too expensive for every session
Agent scoring threshold 10+ reviews Statistical significance, learning period for new agents
Prompt changes Human approval required Never auto-apply — bad prompts can cascade
Trivial session detection < 3 tasks AND no debugging detours Avoid noise in docs/solutions/

New Commands

Command Phase Purpose
/seed-knowledge P0 Bootstrap docs/solutions/ from git history and existing learnings
/agent-scores P1 Dashboard showing agent precision and fix rate
/velocity P2 Dashboard showing workflow trends and plan accuracy
/improve-agent P3 Review and apply prompt improvement suggestions

Modified Files

File Phase Change
commands/workflows/work.md P1 Add Phase 5: Capture Learnings
commands/lfg.md P1 Add compound step to chain
commands/slfg.md P1 Add compound step to chain
commands/workflows/review.md P1 Read agent scores before launching
commands/workflows/compound.md P1 Support condensed autonomous mode

Success Metrics

Metric Baseline Target (P2) Target (P4)
docs/solutions/ file count 0 15+ 50+
learnings-researcher hit rate 0% 30%+ 60%+
Auto-capture rate 0% 70%+ 85%+
Agent precision (avg) Unknown Measured +15% improvement
Plan accuracy Unknown Measured +10% improvement

Full PRD

The complete PRD with detailed acceptance criteria, risk analysis, technical design decisions, and implementation phases is at:

specs/closed-loop-self-improvement/PRD.md


Each unit of engineering work should make subsequent units easier — not harder. This issue makes that automatic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions