Commit 7f1ead9
feat: confidence-based Trust Score — more usage = higher potential score
Redesigned from penalty-based to confidence-based scoring.
Old problem: agents that work MORE get lower scores (more chances for errors).
New design: agents EARN trust through proven work over time.
Three layers:
Proven Reliability (70%): Bayesian estimate (successes+5)/(total+10)
- 0 interactions → 0.5 (unknown)
- 100 interactions, 0 errors → 0.95 (proven)
- 1000 interactions, 50 errors → 0.95 (still excellent, huge sample)
- 10 interactions, 5 errors → 0.5 (concerning)
Evidence Integrity (15%): chain ratio (continuous 0-1)
Activity Confidence (15%): log10(interactions)/3
- More data = higher confidence = higher score
- Rewards volume instead of penalizing it
Key changes:
- New agent starts at ~500 (unknown), not ~950 (fake high)
- high_latency, hedge_rate, incomplete do NOT affect score
- Only agent's own errors count
- 1000 tasks with 95% success > 10 tasks with 100% success
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent fdb8b67 commit 7f1ead9
File tree
2 files changed
+84
-88
lines changed- sdk/python/atlast_ecp
- dashboard_assets
2 files changed
+84
-88
lines changed
0 commit comments