Skip to content

feat(rl): adaptive RL layer — EnsembleAgent, online learning, 87 tests, closed loop#7

Merged
DevanshuNEU merged 1 commit into
mainfrom
feat/cli-personality
Apr 15, 2026
Merged

feat(rl): adaptive RL layer — EnsembleAgent, online learning, 87 tests, closed loop#7
DevanshuNEU merged 1 commit into
mainfrom
feat/cli-personality

Conversation

@DevanshuNEU

Copy link
Copy Markdown
Contributor

Summary

  • Three RL algorithms: UCB1 Contextual Bandit, REINFORCE with Baseline, Thompson Sampling Ensemble meta-agent
  • Closed feedback loop: RewardEngine now applies profile depth multipliers to section coverage and diversity scoring — reward varies with action choice on real DNA
  • Online learning: saar extract . --rl computes reward + updates policy after every real extraction
  • Statistical validation: bootstrap 95% CI + Welch t-test; all agents p < 0.001 vs random baseline
  • 87 new tests covering all RL modules (593 total passing, ruff clean)
  • Technical report at docs/rl_technical_report.html (print-to-PDF ready)

New files

  • saar/rl/agents/ensemble.py — Thompson Sampling two-level RL hierarchy
  • tests/test_rl/test_ensemble.py, test_policy_store.py, test_simulator.py, test_action_space.py
  • docs/rl_technical_report.md + rl_technical_report.html — submission report
  • experiments/train_ucb.py, train_reinforce.py, eval_comparison.py — offline training + eval with learning curves

Modified

  • saar/rl/reward.py — profile-weighted _section_coverage and _diversity_score
  • saar/rl/environment.py — passes depth_multipliers to reward engine
  • saar/rl/policy_store.py — ensemble save/load/stats
  • saar/commands/extract.py — full online learning loop in _apply_rl_profile
  • saar/commands/rl_commands.py — builds ensemble after saar rl train --agent both
  • pyproject.toml — numpy in [dev] extras so CI picks it up

Test plan

  • ruff check saar/ tests/ — zero violations
  • pytest tests/ -q — 593 passing locally
  • saar rl train --agent both — completes in ~0.2s
  • saar extract . --rl — runs end-to-end, selects profile, updates policy online
  • python experiments/eval_comparison.py — UCB 55% / REINFORCE 47% / Ensemble 58% vs random 10%

Made with Cursor

CI installs with pip install -e '.[dev]' — numpy was only in [rl] extras
so all test_rl/* tests failed with ModuleNotFoundError on every platform.

Made-with: Cursor
@vercel

vercel Bot commented Apr 15, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
saar Ready Ready Preview, Comment Apr 15, 2026 1:37am

@DevanshuNEU DevanshuNEU merged commit 9538cf9 into main Apr 15, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant