Skip to content

falcon883/Release-Radar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReleaseRadar

Full-stack Release & Incident Management System for tracking projects, releases, incidents, and environment health.

Tech Stack

  • Backend: FastAPI, Python, SQLAlchemy, PostgreSQL
  • Auth: JWT-based authentication, role-based access control (Admin/User)
  • Frontend: React, TypeScript, Vite, Tailwind CSS, Recharts
  • Infra: Docker, Docker Compose

Features

  • Projects with environment-specific releases (Dev / QA / Prod)
  • Incidents linked to releases with severity + status
  • JWT auth with login/register and protected APIs
  • Role-based access (admin vs normal user)
  • Inline editing for releases and incidents
  • Analytics:
    • Incident severity distribution
    • Incidents by status
    • Releases by environment and status
    • Environment health (Dev / QA / Prod success rate)
    • Incident trend by release
  • CRUD with delete for projects, releases, incidents
  • AI Triage workflow with cached hypotheses + draft on-call message

AI Triage Agent

The AI Triage Agent generates grounded, ranked root-cause hypotheses when an incident is selected, and drafts an escalation-ready Slack message for the on-call responder. Results are cached per incident to avoid re-running the model on every page view, and responders can submit thumbs-up/down feedback for future tuning. The goal is to speed initial incident triage while keeping the reasoning inspectable.

Architecture (high-level)

flowchart TD
    A["Incident selected in UI"] --> B["GET /api/incidents/{id}/triage"]
    B --> C{"Fresh cached triage < 1h?"}
    C -- Yes --> D["Return triage_results row"]
    C -- No --> E["Generate triage"]
    E --> F["Fetch incident context + last 24h audit logs"]
    F --> G["LLM call with structured JSON schema"]
    G --> H["Validate citation IDs against input audit logs"]
    H --> I["Persist triage_results"]
    I --> D
    D --> J["Render hypotheses, sources, draft Slack message"]
    J --> K["POST /api/triage/{triage_id}/feedback"]
Loading

Example screenshots (placeholders)

  • docs/screenshots/triage-tab-loading.png
  • docs/screenshots/triage-hypotheses-and-citations.png
  • docs/screenshots/triage-insufficient-context.png
  • docs/screenshots/triage-feedback-submitted.png

How it works (grounding + validation)

  1. The service assembles incident context and recent audit log entries (last 24 hours for affected services).
  2. The prompt instructs the model to cite only audit log IDs present in the provided input.
  3. The model returns structured JSON (hypotheses[], draft_message) via JSON-schema response formatting.
  4. A post-processing validation step checks every cited ID against the input audit-log ID set.
  5. Any hypothesis containing invalid/nonexistent citations is dropped before persistence and response.

This keeps the output grounded in observable system history instead of free-form speculation.

Limitations

  • The model can still be wrong even when grounded; hypotheses are suggestions, not root-cause proof.
  • Quality depends heavily on audit-log coverage, consistency, and timestamp accuracy.
  • Sparse incident metadata (missing affected services/notes) can reduce confidence or yield insufficient-context results.
  • LLM latency/timeouts may occur; the API returns error states in these cases.
  • Feedback is currently basic and should be combined with offline evaluation before production automation.

Running locally (dev)

Backend:

cd backend
python -m venv venv
venv/Scripts/activate  # Windows
pip install -r requirements.txt
uvicorn app.main:app --reload

Frontend:

cd frontend
npm install
npm run dev

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors