RHDP RCA Plugin

Claude Code plugin for AI-assisted root-cause analysis of infrastructure failures and operational incidents.

How It Works • Quick Start • Available Skills • Contributing

What is RHDP RCA Plugin?

RHDP RCA Plugin is a Claude Code marketplace containing specialized skills designed for Red Hat Demo Platform (RHDP) root cause analysis. This plugin suite enables AI-powered investigation of infrastructure failures, log analysis, and root cause diagnosis. These skills provide Claude with the tools to:

Fetch and analyze logs from remote servers
Correlate multiple data sources (Ansible, Splunk, GitHub)
Perform automated root cause analysis
Capture and organize user feedback

Quick Start

Prerequisites

Claude Code installed
SSH access to remote servers (for optional log auto-fetch)
Splunk credentials (for log correlation)

1) Install the plugin

Open Claude Code and run /plugin
Add marketplace: redhat-et/rhdp-rca-plugin
Install the plugin and restart Claude Code

2) Run root cause analysis

Start with a normal RCA request:

/aiops-plugin:root-cause-analysis job 123456

The root-cause-analysis workflow runs preflight checks and setup guidance first, then runs steps 1-4 automatically (log parse, Splunk correlation, GitHub context fetch), followed by Step 5 analysis.

3) Manual fallback (only if needed)

If preflight setup does not complete in your environment:

Copy and update .claude/settings.example.json.
Apply it to your local .claude/settings.local.json (project-level), including env vars and hooks.

Note: These hooks are required for MLflow tracing. The Stop hook flushes traces and the SessionStart hook captures the session ID.

For tracing:

MLflow tracing is optional.
If MLflow is still not configured after step 2, use MLflow Tracing Setup (Manual Fallback).

Available Skills

Skill	Description	Key Features
template-skill	Template for creating new skills	Starter template, best practices
logs-fetcher	Fetch Ansible/AAP logs via SSH	Time-based filtering, job number lookup
root-cause-analysis	Automated RCA for failed jobs	Log correlation, Splunk + GitHub integration
context-fetcher	Fetch job configs and docs	GitHub and Confluence integration
feedback-capture	Capture user feedback	Structured storage, categorization

Skill Details

🔍 logs-fetcher

Fetch Ansible/AAP logs from remote servers with flexible filtering

# Fetch logs from a specific time range
python -m scripts.fetch_logs_ssh \
  --start-time "2025-12-09 08:00:00" \
  --end-time "2025-12-10 17:00:00" \
  --mode processed

# Fetch logs by job number
python -m scripts.fetch_logs_by_job 1234567 1234568 1234569

Use cases:

Fetch logs from specific time windows (minute/second precision)
Retrieve logs for specific job numbers
Download recent processed or ignored job logs
Investigate incidents within a known timeframe

View detailed documentation →

🔎 root-cause-analysis

Investigate failed jobs by correlating Ansible/AAP logs with Splunk OCP pod logs and GitHub configuration

Step 1   [Python]  Parse local job log (extract GUID, namespace, failed tasks)
Step 2   [Python]  Query Splunk for correlated pod logs
Step 3   [Python]  Build correlation timeline
Step 4   [Python]  Fetch GitHub configs (AgnosticD/AgnosticV)
Step 5   [Claude]  Analyze and summarize root cause

Command Usage:

# By job ID (auto-fetches log from remote if not found locally)
.venv/bin/python scripts/cli.py analyze --job-id <JOB_ID> --fetch

# By explicit path (when you already have the log file)
.venv/bin/python scripts/cli.py analyze --job-log <path-to-job-log>

Use cases:

Investigate job failures
Analyze logs for errors and patterns
Find root causes of infrastructure issues
Debug failed deployments
Troubleshoot Kubernetes/OpenShift problems

View detailed documentation →

context-fetcher

Fetch configuration and documentation context via MCP servers

Integrates with:

GitHub: Job configs, recent commits, CI workflows
Confluence: Runbooks, troubleshooting guides, documentation

Use cases:

Retrieve job configuration from repositories
Access relevant documentation during investigations
Review recent code changes related to failures

View detailed documentation →

💬 feedback-capture

Capture and store user feedback during interactions

Features:

Ask users for feedback interactively
Categorize feedback (Complexity, Clarity, Accuracy, etc.)
Summarize interaction context
Record structured feedback with timestamps

Feedback is appended to ~/feedback.txt by default with session tracking.

Use cases:

Collect feedback at the end of skill invocations
Track user sentiment across sessions
Categorize and store bug reports

View detailed documentation →

How It Works

Architecture

                    ┌─────────────────────┐
                    │   Claude Code UI    │
                    │  (User Interface)   │
                    └──────────┬──────────┘
                               │
         ┌─────────────────────┴─────────────────────┐
         │         RHDP RCA Plugin Marketplace       │
         │                                           │
         │  ┌─────────────────────────────────────┐  │
         │  │  Skills (SKILL.md definitions)      │  │
         │  │                                     │  │
         │  │  • template-skill                   │  │
         │  │  • logs-fetcher ──────► SSH         │  │
         │  │  • root-cause-analysis ──► Splunk   │  │
         │  │                      └──► GitHub API│  │
         │  │  • context-fetcher ──► MCP Servers  │  │
         │  │  • feedback-capture ──► Local FS    │  │
         │  └─────────────────────────────────────┘  │
         └───────────────────────────────────────────┘
                               │
         ┌─────────────────────┴─────────────────────┐
         │                                           │
    ┌────▼────┐    ┌────────┐    ┌──────────────┐   │
    │ GitHub  │    │Confluen│    │ External     │   │
    │   MCP   │    │ce MCP  │    │ Systems      │   │
    │         │    │        │    │ (SSH/Splunk) │   │
    └─────────┘    └────────┘    └──────────────┘   │

Integration Points:

MCP Servers: GitHub (code search, file retrieval) and Confluence (documentation)
Direct APIs: Splunk REST API, GitHub API
SSH: Remote log server access
Local: File system for logs and feedback

Each skill follows the Anthropic Agent Skills Specification with SKILL.md definitions that Claude Code loads automatically.

End-to-End RCA Workflow

When investigating a failed job:

User Query: "/root-cause-analysis job 1234567"
Skill Selection: Claude selects root-cause-analysis
Data Collection (Steps 1-4, automated):
- Parse job log (local file)
- Query Splunk for pod logs
- Correlate timeline
- Fetch GitHub configs via API
AI Analysis (Step 5): Claude analyzes and identifies root cause
Results: Summary with evidence and recommendations

Usage with Claude Code

Simply invoke skills by describing your task:

"Analyze job 1234567 for root cause"
"Investigate why this deployment failed"
"Fetch logs from the last 2 hours"

Claude will automatically select and invoke the appropriate skill based on your request.

Creating a New Skill

Create a directory with your skill name (lowercase, hyphen-separated)
Add a SKILL.md file:

---
name: my-skill
description: Brief description of what this skill does
allowed-tools:
  - Bash
  - Read
---

# My Skill

Instructions for Claude...

See template-skill for a minimal example and agent_skills_spec.md for the full specification.

Contributing

We welcome contributions! Please ensure your skill:

Follows the Agent Skills Spec
Includes clear, actionable instructions
Is focused on a specific AIOps domain
Includes appropriate documentation and examples

See CONTRIBUTING.md for detailed contribution guidelines.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Individual skills may specify their own licenses in their frontmatter.

Support

Issues: Report issues on GitHub Issues
Documentation: See docs/ for additional guides

Built by the Red Hat ACE Team

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.claude-plugin		.claude-plugin
.claude		.claude
.github/workflows		.github/workflows
docs		docs
experiments		experiments
skills		skills
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RHDP RCA Plugin

What is RHDP RCA Plugin?

Quick Start

Prerequisites

1) Install the plugin

2) Run root cause analysis

3) Manual fallback (only if needed)

Available Skills

Skill Details

🔍 logs-fetcher

🔎 root-cause-analysis

context-fetcher

💬 feedback-capture

How It Works

Architecture

End-to-End RCA Workflow

Usage with Claude Code

Creating a New Skill

Contributing

License

Support

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RHDP RCA Plugin

What is RHDP RCA Plugin?

Quick Start

Prerequisites

1) Install the plugin

2) Run root cause analysis

3) Manual fallback (only if needed)

Available Skills

Skill Details

🔍 logs-fetcher

🔎 root-cause-analysis

context-fetcher

💬 feedback-capture

How It Works

Architecture

End-to-End RCA Workflow

Usage with Claude Code

Creating a New Skill

Contributing

License

Support

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages