Claude Code plugin for AI-assisted root-cause analysis of infrastructure failures and operational incidents.
How It Works β’ Quick Start β’ Available Skills β’ Contributing
RHDP RCA Plugin is a Claude Code marketplace containing specialized skills designed for Red Hat Demo Platform (RHDP) root cause analysis. This plugin suite enables AI-powered investigation of infrastructure failures, log analysis, and root cause diagnosis. These skills provide Claude with the tools to:
- Fetch and analyze logs from remote servers
- Correlate multiple data sources (Ansible, Splunk, GitHub)
- Perform automated root cause analysis
- Capture and organize user feedback
- Claude Code installed
- SSH access to remote servers (for optional log auto-fetch)
- Splunk credentials (for log correlation)
- Open Claude Code and run
/plugin - Add marketplace:
redhat-et/rhdp-rca-plugin - Install the plugin and restart Claude Code
Start with a normal RCA request:
/aiops-plugin:root-cause-analysis job 123456
The root-cause-analysis workflow runs preflight checks and setup guidance first, then runs steps 1-4 automatically (log parse, Splunk correlation, GitHub context fetch), followed by Step 5 analysis.
If preflight setup does not complete in your environment:
- Copy and update
.claude/settings.example.json. - Apply it to your local
.claude/settings.local.json(project-level), including env vars and hooks.
Note: These hooks are required for MLflow tracing. The Stop hook
flushes traces and the SessionStart hook captures the session ID.
For tracing:
- MLflow tracing is optional.
- If MLflow is still not configured after step 2, use MLflow Tracing Setup (Manual Fallback).
| Skill | Description | Key Features |
|---|---|---|
| template-skill | Template for creating new skills | Starter template, best practices |
| logs-fetcher | Fetch Ansible/AAP logs via SSH | Time-based filtering, job number lookup |
| root-cause-analysis | Automated RCA for failed jobs | Log correlation, Splunk + GitHub integration |
| context-fetcher | Fetch job configs and docs | GitHub and Confluence integration |
| feedback-capture | Capture user feedback | Structured storage, categorization |
Fetch Ansible/AAP logs from remote servers with flexible filtering
# Fetch logs from a specific time range
python -m scripts.fetch_logs_ssh \
--start-time "2025-12-09 08:00:00" \
--end-time "2025-12-10 17:00:00" \
--mode processed
# Fetch logs by job number
python -m scripts.fetch_logs_by_job 1234567 1234568 1234569Use cases:
- Fetch logs from specific time windows (minute/second precision)
- Retrieve logs for specific job numbers
- Download recent processed or ignored job logs
- Investigate incidents within a known timeframe
View detailed documentation β
Investigate failed jobs by correlating Ansible/AAP logs with Splunk OCP pod logs and GitHub configuration
Step 1 [Python] Parse local job log (extract GUID, namespace, failed tasks)
Step 2 [Python] Query Splunk for correlated pod logs
Step 3 [Python] Build correlation timeline
Step 4 [Python] Fetch GitHub configs (AgnosticD/AgnosticV)
Step 5 [Claude] Analyze and summarize root cause
Command Usage:
# By job ID (auto-fetches log from remote if not found locally)
.venv/bin/python scripts/cli.py analyze --job-id <JOB_ID> --fetch
# By explicit path (when you already have the log file)
.venv/bin/python scripts/cli.py analyze --job-log <path-to-job-log>Use cases:
- Investigate job failures
- Analyze logs for errors and patterns
- Find root causes of infrastructure issues
- Debug failed deployments
- Troubleshoot Kubernetes/OpenShift problems
View detailed documentation β
Fetch configuration and documentation context via MCP servers
Integrates with:
- GitHub: Job configs, recent commits, CI workflows
- Confluence: Runbooks, troubleshooting guides, documentation
Use cases:
- Retrieve job configuration from repositories
- Access relevant documentation during investigations
- Review recent code changes related to failures
View detailed documentation β
Capture and store user feedback during interactions
Features:
- Ask users for feedback interactively
- Categorize feedback (Complexity, Clarity, Accuracy, etc.)
- Summarize interaction context
- Record structured feedback with timestamps
Feedback is appended to ~/feedback.txt by default with session tracking.
Use cases:
- Collect feedback at the end of skill invocations
- Track user sentiment across sessions
- Categorize and store bug reports
View detailed documentation β
βββββββββββββββββββββββ
β Claude Code UI β
β (User Interface) β
ββββββββββββ¬βββββββββββ
β
βββββββββββββββββββββββ΄ββββββββββββββββββββββ
β RHDP RCA Plugin Marketplace β
β β
β βββββββββββββββββββββββββββββββββββββββ β
β β Skills (SKILL.md definitions) β β
β β β β
β β β’ template-skill β β
β β β’ logs-fetcher βββββββΊ SSH β β
β β β’ root-cause-analysis βββΊ Splunk β β
β β ββββΊ GitHub APIβ β
β β β’ context-fetcher βββΊ MCP Servers β β
β β β’ feedback-capture βββΊ Local FS β β
β βββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββ΄ββββββββββββββββββββββ
β β
ββββββΌβββββ ββββββββββ ββββββββββββββββ β
β GitHub β βConfluenβ β External β β
β MCP β βce MCP β β Systems β β
β β β β β (SSH/Splunk) β β
βββββββββββ ββββββββββ ββββββββββββββββ β
Integration Points:
- MCP Servers: GitHub (code search, file retrieval) and Confluence (documentation)
- Direct APIs: Splunk REST API, GitHub API
- SSH: Remote log server access
- Local: File system for logs and feedback
Each skill follows the Anthropic Agent Skills Specification with SKILL.md definitions that Claude Code loads automatically.
When investigating a failed job:
- User Query: "/root-cause-analysis job 1234567"
- Skill Selection: Claude selects root-cause-analysis
- Data Collection (Steps 1-4, automated):
- Parse job log (local file)
- Query Splunk for pod logs
- Correlate timeline
- Fetch GitHub configs via API
- AI Analysis (Step 5): Claude analyzes and identifies root cause
- Results: Summary with evidence and recommendations
Simply invoke skills by describing your task:
"Analyze job 1234567 for root cause"
"Investigate why this deployment failed"
"Fetch logs from the last 2 hours"
Claude will automatically select and invoke the appropriate skill based on your request.
- Create a directory with your skill name (lowercase, hyphen-separated)
- Add a
SKILL.mdfile:
---
name: my-skill
description: Brief description of what this skill does
allowed-tools:
- Bash
- Read
---
# My Skill
Instructions for Claude...See template-skill for a minimal example and agent_skills_spec.md for the full specification.
We welcome contributions! Please ensure your skill:
- Follows the Agent Skills Spec
- Includes clear, actionable instructions
- Is focused on a specific AIOps domain
- Includes appropriate documentation and examples
See CONTRIBUTING.md for detailed contribution guidelines.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Copyright 2025 Red Hat ACE Team
Individual skills may specify their own licenses in their frontmatter.
- Issues: Report issues on GitHub Issues
- Documentation: See docs/ for additional guides
Built by the Red Hat ACE Team