Commit 7f1ead9

authored and

SEA CAPITAL

committed

feat: confidence-based Trust Score — more usage = higher potential score

Redesigned from penalty-based to confidence-based scoring. Old problem: agents that work MORE get lower scores (more chances for errors). New design: agents EARN trust through proven work over time. Three layers: Proven Reliability (70%): Bayesian estimate (successes+5)/(total+10) - 0 interactions → 0.5 (unknown) - 100 interactions, 0 errors → 0.95 (proven) - 1000 interactions, 50 errors → 0.95 (still excellent, huge sample) - 10 interactions, 5 errors → 0.5 (concerning) Evidence Integrity (15%): chain ratio (continuous 0-1) Activity Confidence (15%): log10(interactions)/3 - More data = higher confidence = higher score - Rewards volume instead of penalizing it Key changes: - New agent starts at ~500 (unknown), not ~950 (fake high) - high_latency, hedge_rate, incomplete do NOT affect score - Only agent's own errors count - 1000 tasks with 95% success > 10 tasks with 100% success Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1 parent fdb8b67 commit 7f1ead9Copy full SHA for 7f1ead9

2 files changed

+84

-88

lines changed

sdk/python/atlast_ecp
- dashboard_assets
  - index.html
- scoring_rules.py

2 files changed

+84

-88

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 7f1ead9

2 files changed

2 files changed

File tree

2 files changed

2 files changed

0 commit comments