Skip to content
#

confidence-scoring

Here are 25 public repositories matching this topic...

Verification system that catches coding agents falsely claiming task completion. Runs 4 parallel checks (file integrity, test quality, scope narrowing, optional LLM judge) over task+claim+diff and returns a weighted 0-100 confidence score with evidence.

  • Updated May 21, 2026
  • Python

System that aggregates outputs from multiple Large Language Models (GPT-4, Claude-3, custom models) to generate reliable, high-confidence results through consensus-based reasoning evaluation. Demonstrates sophisticated AI orchestration with 92.7% accuracy improvement over single-model.

  • Updated Dec 22, 2025
  • Python

AI-powered concierge that normalises guest messages from WhatsApp, Booking.com, Airbnb, Instagram and direct channels, drafts a reply with Claude, and routes responses through a deterministic confidence-scoring pipeline. Built with FastAPI + Claude Sonnet 4.

  • Updated May 18, 2026
  • Python

Catch AI‑code hallucinations instantly: real‑time sandbox validation scores suggestions, flags low‑confidence snippets, so solo devs avoid wasted debugging and regain trust in assistants.

  • Updated Apr 16, 2026
  • Python

Improve this page

Add a description, image, and links to the confidence-scoring topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the confidence-scoring topic, visit your repo's landing page and select "manage topics."

Learn more