Your AI SRE that investigates production incidents
Long-term memory · Knowledge graph · 46 production skills · SolidAI Integration
SolidAI SRE is an open-source AI SRE agent that automatically investigates production incidents, finds root causes, and learns from every investigation. Built for Solid Solutions & SolidAI infrastructure. It combines episodic memory (remembering past incidents and what fixed them) with a Neo4j knowledge graph (understanding service dependencies and blast radius) and 46 production-ready skills for tools like Datadog, Grafana, PagerDuty, Elasticsearch, Kubernetes, and AWS. Self-hosted, provider-agnostic via LiteLLM, and licensed Apache 2.0.
| Learns from every incident | Remembers past investigations — what worked, what didn't. Similar incident at 3am? It already knows the playbook. |
| Understands your infrastructure | Neo4j knowledge graph maps service dependencies, so the agent knows blast radius before it starts investigating. |
| Plugs into what you already use | 46 production skills for Datadog, Grafana, PagerDuty, Elasticsearch, Kubernetes, AWS, and more. No rip-and-replace. |
| SolidAI Integration | Built-in integration with SolidAI Gateway (localhost:18789) and Telegram bots (@AionUi_solidsolutions_bot). |
git clone https://github.com/yassin/solidai-sre.git
cd solidai-sre
cp .env.example .env
# Add your OPENROUTER_API_KEY to .env
make devThe web console will be available at http://localhost:3000.
SolidAI Gateway (18789) -> SolidAI SRE
|
+-----+-----+
| | |
Memory Skills KG
config-service <- Telegram bot integration
sre-agent is the core investigation agent. Uses LangGraph for orchestration with a planner -> investigation subagents -> synthesizer -> writeup topology. 46 skills via load_skill + run_script tools.
web_ui is the admin console and agent entry point (Next.js, pnpm). Agent runs, config editor, knowledge base explorer, memory pages.
config-service is the control plane. Hierarchical org->team config with deep merge. Manages tokens and audit logging.
- Gateway Integration: Connects to SolidAI Gateway at
http://localhost:18789 - Telegram Bot: Sends alerts to @AionUi_solidsolutions_bot
- Multi-Agent Support: Monitors all 8 SolidAI agents (agriculture, fintech, health, education, energy, governance, retail, logistics)
- Site Monitoring: Tracks solidsolutions.africa and solidai.africa uptime
- 46 Production Skills: Elasticsearch, Datadog, Grafana, PagerDuty, K8s, AWS, and more
- Long-term Memory: Stores investigations, surfaces past solutions for similar incidents
- Knowledge Graph: Neo4j service topology, dependency traversal, blast radius
- Multi-provider LLM: OpenRouter (tencent/hy3-preview), Claude, OpenAI, Gemini, DeepSeek, Mistral, Ollama
- Web Console: Dashboard, agent runs, memory browser
- Telegram Integration: Investigate incidents directly from Telegram
| Command | What it does |
|---|---|
make dev |
Start all services (Postgres, config, LiteLLM, agent, web UI) |
make dev-telegram |
Start all services + Telegram bot |
make stop |
Stop all services |
make status |
Show service health status |
make logs |
Follow all service logs |
Licensed under the Apache License 2.0. See LICENSE for details.
Built by SolidAI — Powering African Innovation
