feat(rl): adaptive RL layer — EnsembleAgent, online learning, 87 tests, closed loop by DevanshuNEU · Pull Request #7 · OpenCodeIntel/saar

DevanshuNEU · 2026-04-15T01:37:43Z

Summary

Three RL algorithms: UCB1 Contextual Bandit, REINFORCE with Baseline, Thompson Sampling Ensemble meta-agent
Closed feedback loop: RewardEngine now applies profile depth multipliers to section coverage and diversity scoring — reward varies with action choice on real DNA
Online learning: saar extract . --rl computes reward + updates policy after every real extraction
Statistical validation: bootstrap 95% CI + Welch t-test; all agents p < 0.001 vs random baseline
87 new tests covering all RL modules (593 total passing, ruff clean)
Technical report at docs/rl_technical_report.html (print-to-PDF ready)

New files

saar/rl/agents/ensemble.py — Thompson Sampling two-level RL hierarchy
tests/test_rl/test_ensemble.py, test_policy_store.py, test_simulator.py, test_action_space.py
docs/rl_technical_report.md + rl_technical_report.html — submission report
experiments/train_ucb.py, train_reinforce.py, eval_comparison.py — offline training + eval with learning curves

Modified

saar/rl/reward.py — profile-weighted _section_coverage and _diversity_score
saar/rl/environment.py — passes depth_multipliers to reward engine
saar/rl/policy_store.py — ensemble save/load/stats
saar/commands/extract.py — full online learning loop in _apply_rl_profile
saar/commands/rl_commands.py — builds ensemble after saar rl train --agent both
pyproject.toml — numpy in [dev] extras so CI picks it up

Test plan

ruff check saar/ tests/ — zero violations
pytest tests/ -q — 593 passing locally
saar rl train --agent both — completes in ~0.2s
saar extract . --rl — runs end-to-end, selects profile, updates policy online
python experiments/eval_comparison.py — UCB 55% / REINFORCE 47% / Ensemble 58% vs random 10%

Made with Cursor

CI installs with pip install -e '.[dev]' — numpy was only in [rl] extras so all test_rl/* tests failed with ModuleNotFoundError on every platform. Made-with: Cursor

vercel · 2026-04-15T01:37:49Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
saar	Ready	Preview, Comment	Apr 15, 2026 1:37am

fix(ci): add numpy to dev dependencies so RL tests run in CI

3ef66ca

CI installs with pip install -e '.[dev]' — numpy was only in [rl] extras so all test_rl/* tests failed with ModuleNotFoundError on every platform. Made-with: Cursor

DevanshuNEU merged commit 9538cf9 into main Apr 15, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rl): adaptive RL layer — EnsembleAgent, online learning, 87 tests, closed loop#7

feat(rl): adaptive RL layer — EnsembleAgent, online learning, 87 tests, closed loop#7
DevanshuNEU merged 1 commit into
mainfrom
feat/cli-personality

DevanshuNEU commented Apr 15, 2026

Uh oh!

vercel Bot commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DevanshuNEU commented Apr 15, 2026

Summary

New files

Modified

Test plan

Uh oh!

vercel Bot commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant