Skip to content

OKD-370: Add promptfoo evals for agentic-docs plugin#477

Open
kenjpais wants to merge 15 commits into
openshift-eng:mainfrom
kenjpais:add-evaluation
Open

OKD-370: Add promptfoo evals for agentic-docs plugin#477
kenjpais wants to merge 15 commits into
openshift-eng:mainfrom
kenjpais:add-evaluation

Conversation

@kenjpais
Copy link
Copy Markdown

@kenjpais kenjpais commented May 15, 2026

Summary

Add promptfoo-based eval workflows to the agentic-docs plugin via [PR #437].

Introduces two new skills:

  • /agentic-docs:generate-evals — generate repository-specific promptfoo eval suites from templates
  • /agentic-docs:evaluate — validate provider configuration and execute eval suites with automated analysis

Features

generate-evals

  • Template-based eval generation
  • Repository-scoped eval configs

evaluate

  • Bundled eval execution workflow
  • Spawns judge sub-agent to analyze results and session logs and metrics
  • Pass/fail quality reporting
  • Cost and latency regression checks

These workflows add deterministic and LLM-judged validation for:

  • skill routing
  • command execution
  • output quality
  • regression detection

Assertion types used

  • skill-used / not-skill-used
  • icontains / not-icontains
  • llm-rubric
  • cost / latency

Test coverage

Test infrastructure: Both skills include their own test suites in evals/evals.json

Summary by CodeRabbit

  • New Features

    • Added agentic-docs plugin for creating and maintaining AI-optimized documentation for OpenShift.
    • Added /agentic-docs:evaluate to run comparative documentation evaluations and produce structured reports.
    • Added /agentic-docs:generate-evals to generate repository-specific evaluation suites.
    • Added /metrics:ai-docs-telemetry to analyze ai-docs usage from session logs.
  • Documentation

    • Updated plugin registry and docs index; added README/command docs and usage examples for new commands.

Prashanth684 and others added 15 commits May 15, 2026 02:56
Introduces tier-1 platform documentation skills for creating and maintaining
AI-optimized documentation in openshift/enhancements.

Skills:

/agentic-docs:platform (/platform-docs):
Creates tier-1 platform documentation with:
- AGENTS.md navigation index
- DESIGN_PHILOSOPHY.md and KNOWLEDGE_GRAPH.md
- platform/, domain/, practices/, decisions/, workflows/, references/
- Automated discovery, structure creation, template population, and validation

/agentic-docs:update-platform-docs (/update-platform-docs):
Incrementally updates tier-1 documentation with:
- Automatic gap detection (scans existing ai-docs/ for missing files)
- Targeted additions without full regeneration
- Smart navigation updates (auto-updates indexes and AGENTS.md)
- Validation of naming conventions, line counts, and link integrity
Introduces tier-2 lean component documentation skill for creating
structured component-level documentation in OpenShift repositories.

Skills:

/agentic-docs:component (/component-docs):
Creates tier-2 lean component documentation with:
- Component-specific CRDs and architecture only
- Pointers to tier-1 for generic patterns
- Component ADRs and exec-plan tracking
- AGENTS.md entry point
- DEVELOPMENT.md and TESTING.md guides
- Domain concepts and ecosystem maps
Platform documentation in openshift/enhancements/ai-docs/ already exists
and was created using this skill. Remove the /platform-docs skill that was designed
to create it from scratch - it's no longer needed.

Changes:
- Remove entire skills/platform/ directory
- Keep /update-platform-docs for incremental updates to existing platform docs
- Keep /component-docs for creating component-level documentation
- Update README to clarify platform docs "already exist"
- Simplify tier architecture description (tier-1/tier-2 → platform/component)
- Update component skill templates to reference "platform docs" consistently
- Update validation scripts to remove platform-specific checks
- Remove platform-docs from marketplace registration

This simplifies the plugin to focus on its two active use cases:
1. Creating new component documentation (/component-docs)
2. Updating existing platform documentation (/update-platform-docs)
- Fix generate-evals to use only anthropic:claude-sonnet-4-6 provider
- Rewrite evaluate skill to use 2-agent architecture:
  - Code claude sub-agent: runs promptfoo tool
  - Judge claude sub-agent: evaluates results + metrics
- Integrate metrics plugin for session telemetry
- Remove manual test spawning approach
- Add comprehensive error handling and documentation
Critical fixes:
1. EVALUATE SKILL (v5.0):
   - Actually spawn judge sub-agent after code agent completes (was missing)
   - Use bundled scripts/run-eval.sh instead of raw promptfoo commands
   - Add explicit step-by-step workflow with Agent tool examples
   - Fix sequential execution (code → collect metrics → judge)
   - Add comprehensive error handling for 100% error rate scenario
   - Document common issues and fixes

2. GENERATE-EVALS SKILL:
   - Fix provider format to simple string: anthropic:claude-sonnet-4-6
   - Remove incorrect object format with id/config
   - Add explicit DO/DON'T examples for provider configuration
   - Change outputPath to ./promptfoo-results.json
   - Change prompts to use file://prompts/system.txt

Issues fixed:
- Test 1 failure: Judge sub-agent now explicitly spawned with results
- 100% error rate: Provider format corrected (was using API format not promptfoo format)
- Missing workflow: Added complete sequential workflow with Agent() examples
- Script usage: Now uses bundled run-eval.sh for reliable execution
Remove unnecessary code sub-agent - Option B implementation:

BEFORE (v5.0 - Two sub-agents):
1. Spawn code sub-agent → run promptfoo
2. Spawn judge sub-agent → analyze results

AFTER (v6.0 - One sub-agent):
1. Main agent runs run-eval.sh directly
2. Main agent collects session metrics
3. Spawn judge sub-agent → analyze results + metrics

Benefits:
- ✅ Simpler: One sub-agent instead of two
- ✅ Faster: ~20-30s saved (no code sub-agent spawn overhead)
- ✅ Cheaper: ~$0.02-0.05 saved per evaluation
- ✅ Clearer: Main agent runs tools, judge analyzes
- ✅ More reliable: Fewer moving parts, fewer failure modes

Technical changes:
- Removed Step 2 (spawn code sub-agent)
- Main agent now executes bash /scripts/run-eval.sh
- Main agent collects metrics directly from session
- Judge sub-agent receives results from main agent (not from code sub-agent)
- Updated all documentation and examples
- Added complete example workflow showing direct execution

Addresses user question: 'Why cannot the coding sub-agent directly pass results to judge?'
Answer: It can't (sub-agents can't spawn sub-agents), but we don't need it anyway - main agent can run the script directly.
## Changes

### generate-evals skill (v2.0)
- Add canonical template at templates/promptfooconfig.example.yaml
- Update skill to always use template as foundation
- Document common provider format mistakes to avoid
- Switch from weight-based to llm-rubric assertions
- Use vars.prompt instead of vars.task_description

### evaluate skill (v6.2)
- Add provider validation before running promptfoo
- Add bundled run-eval.sh script for consistent execution
- Add test suite (evals/evals.json) with 3 test cases
- Document skill testing and iteration workflow

### Plugin version
- Bump agentic-docs plugin from 1.0.0 to 1.1.0 (MINOR)
- Reflects enhanced functionality in both skills

## Key improvements
- Prevents invalid Vertex AI provider format (vertex:anthropic:claude-...)
- Template-first approach ensures consistency
- Skills now include their own test infrastructure
- Better error detection and user guidance

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@openshift-ci openshift-ci Bot requested review from stleerh and theobarberbany May 15, 2026 00:56
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 15, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kenjpais
Once this PR has been reviewed and has the lgtm label, please assign theobarberbany for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Walkthrough

Adds an agentic-docs plugin (generate-evals and evaluate skills, templates, scripts, docs), registers it in marketplace/docs registries and PLUGINS.md, and introduces ai-docs telemetry: a metrics command plus a Python script to extract ai-docs usage from Claude Code session logs.

Changes

agentic-docs Plugin and Evaluation Suite

Layer / File(s) Summary
Plugin registration and manifest
.claude-plugin/marketplace.json, docs/data.json, plugins/agentic-docs/.claude-plugin/plugin.json, PLUGINS.md
Adds agentic-docs plugin entry to marketplace and docs registries, updates metrics command registration in docs/data.json, and updates PLUGINS.md TOC/sections.
Generate-evals skill and template
plugins/agentic-docs/commands/generate-evals.md, plugins/agentic-docs/skills/generate-evals/SKILL.md, plugins/agentic-docs/skills/generate-evals/templates/promptfooconfig.example.yaml
Skill and example promptfoo template to generate repository-specific promptfooconfig.yaml and EVALUATION.md with multiple SME/convention test cases and rubric assertions.
Evaluate skill, eval configs, and runner
plugins/agentic-docs/commands/evaluate.md, plugins/agentic-docs/skills/evaluate/SKILL.md, plugins/agentic-docs/skills/evaluate/evals/evals.json, plugins/agentic-docs/skills/evaluate/scripts/run-eval.sh
Adds evaluation command docs, comprehensive SKILL describing baseline/with-docs comparative workflow, eval scenarios, evidence collection, judge sub-agent prompt, and a run-eval.sh helper to run promptfoo.
Metrics: ai-docs telemetry
plugins/metrics/README.md, plugins/metrics/commands/ai-docs-telemetry.md, plugins/metrics/scripts/ai_docs_telemetry.py
Adds /metrics:ai-docs-telemetry documentation and a Python script that scans Claude Code JSONL sessions for ai-docs/AGENTS/CLAUDE reads and emits structured telemetry JSON; supports scanning recent sessions and single-session analysis.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

approved, lgtm

Suggested reviewers

  • theobarberbany
  • stleerh
🚥 Pre-merge checks | ✅ 10
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: adding promptfoo-based evaluation workflows (evals) to the agentic-docs plugin, which is the primary focus of this changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
No Real People Names In Style References ✅ Passed Comprehensive search across all PR files found no real people names used as style references, in example prompts, or in instructions. All style guidance uses explicit quality descriptions instead.
No Assumed Git Remote Names ✅ Passed No hardcoded git remote names (origin, upstream) found in any new files added by this PR. All scripts, documentation, and configurations avoid assuming git remote names.
Git Push Safety Rules ✅ Passed PR contains no git push, force push, or autonomous push workflows. New skills are for evaluation/analysis only with no version control operations.
No Untrusted Mcp Servers ✅ Passed PR introduces no MCP servers from untrusted sources. Only legitimate tools used: promptfoo via npx and Python standard library. No new package dependencies or external service calls.
Ai-Helpers Overlap Detection ✅ Passed No AI-helpers overlap detected. New agentic-docs and metrics commands occupy distinct domains (docs evaluation, eval generation, session telemetry) with <35% similarity to existing functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

skillsaw: additional violations

Severity Rule File Message
⚠️ warning plugin-readme plugins/agentic-docs/README.md Missing README.md (recommended)
❌ error plugins-doc-up-to-date PLUGINS.md PLUGINS.md is out of sync with plugin metadata. Run 'make update' to update.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ warning (agentskill-evals): evals[0] all assertions must be strings

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ warning (agentskill-evals): evals[1] all assertions must be strings

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ warning (agentskill-evals): evals[2] all assertions must be strings

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error (plugin-owners-required): Plugin 'agentic-docs' is missing an OWNERS file

@kenjpais kenjpais changed the title Add evaluation Add agentic-docs evals May 15, 2026
@kenjpais kenjpais changed the title Add agentic-docs evals OKD-370: Add promptfoo evals for agentic-docs plugin May 15, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 15, 2026

@kenjpais: This pull request references OKD-370 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

Add promptfoo-based eval workflows to the agentic-docs plugin via [PR #437].

Introduces two new skills:

  • /agentic-docs:generate-evals — generate repository-specific promptfoo eval suites from templates
  • /agentic-docs:evaluate — validate provider configuration and execute eval suites with automated analysis

Features

generate-evals

  • Template-based eval generation
  • Repository-scoped eval configs

evaluate

  • Bundled eval execution workflow
  • Spawns judge sub-agent to analyze results and session logs and metrics
  • Pass/fail quality reporting
  • Cost and latency regression checks

These workflows add deterministic and LLM-judged validation for:

  • skill routing
  • command execution
  • output quality
  • regression detection

Assertion types used

  • skill-used / not-skill-used
  • icontains / not-icontains
  • llm-rubric
  • cost / latency

Test coverage

Test infrastructure: Both skills include their own test suites in evals/evals.json

Summary by CodeRabbit

  • New Features
  • Added agentic-docs plugin for creating and maintaining AI-optimized documentation for OpenShift.
  • Added /agentic-docs:evaluate command to assess documentation quality through behavioral validation.
  • Added /agentic-docs:generate-evals command to generate repository-specific evaluation configurations.
  • Added /metrics:ai-docs-telemetry command to analyze documentation usage patterns from Claude Code sessions.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 15, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
plugins/metrics/commands/ai-docs-telemetry.md (2)

10-13: ⚡ Quick win

Add language specifiers to fenced code blocks.

For better syntax highlighting and rendering, specify the language for fenced code blocks. Since these are command examples, use bash:

📝 Proposed fix
 ## Synopsis
-```
+```bash
 /metrics:ai-docs-telemetry -scan [-project <name>]
 /metrics:ai-docs-telemetry -session <path-to-session.jsonl>
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @plugins/metrics/commands/ai-docs-telemetry.md around lines 10 - 13, The
fenced code block containing the command examples /metrics:ai-docs-telemetry -scan [-project <name>] and /metrics:ai-docs-telemetry -session <path-to-session.jsonl> needs a language specifier for proper highlighting;
update the block to start with bash and keep the closing so the two
command lines are rendered as bash code.


</details>

---

`44-88`: _⚡ Quick win_

**Add language specifiers to example code blocks.**

The example code blocks should specify `bash` for better rendering:

<details>
<summary>📝 Proposed fix</summary>

```diff
 1. **Scan all recent sessions (last 7 days)**:
-   ```
+   ```bash
    /metrics:ai-docs-telemetry -scan
    ```
```

```diff
 2. **Scan only enhancements repository**:
-   ```
+   ```bash
    /metrics:ai-docs-telemetry -scan -project enhancements
    ```
```

```diff
 3. **Scan only machine-config-operator repository**:
-   ```
+   ```bash
    /metrics:ai-docs-telemetry -scan -project machine-config-operator
    ```
```

```diff
 4. **Analyze a specific session**:
-   ```
+   ```bash
    /metrics:ai-docs-telemetry -session ~/.claude/projects/<project>/<session-id>.jsonl
    ```
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @plugins/metrics/commands/ai-docs-telemetry.md around lines 44 - 88, Add
explicit language specifiers (bash) to all example fenced code blocks that show
command usage for the ai-docs telemetry tool (e.g., blocks containing
"/metrics:ai-docs-telemetry -scan", "/metrics:ai-docs-telemetry -scan -project
enhancements", "/metrics:ai-docs-telemetry -scan -project
machine-config-operator", "/metrics:ai-docs-telemetry -session
~/.claude/projects//.jsonl" and the bash pipeline examples
using jq) by changing the opening triple backticks to ```bash so the snippets
render correctly as shell commands.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.claude-plugin/marketplace.json:

  • Around line 250-255: The marketplace entry for "agentic-docs" has a version
    mismatch: marketplace declares "version": "1.0.0" while the plugin manifest
    (plugin.json) declares "1.1.0"; update the "version" value in the marketplace
    JSON to match the plugin manifest's "1.1.0" (or vice‑versa if you intend to
    downgrade) so both "agentic-docs" version fields are identical, and ensure
    future releases update both the marketplace entry and the plugin.json together.

In @docs/data.json:

  • Around line 1826-1844: Update docs/data.json to match the actual plugin
    contents: replace the empty "commands" array with the two command names
    "generate-evals" and "evaluate"; replace the "skills" entries for "component"
    and "update-platform-docs" with the actual skill objects for the new skills
    (ids/names "generate-evals" and "evaluate" and appropriate descriptions matching
    the PR); and change the "version" value from "1.0.0" to "1.1.0" to match
    plugin.json (verify plugin.json for authoritative version). Ensure the keys
    "commands", "skills", and "version" exactly reflect the new symbols
    generate-evals and evaluate.

In @plugins/agentic-docs/skills/evaluate/evals/evals.json:

  • Around line 6-17: This eval is internally inconsistent: the eval named
    "happy-path-evaluation" and its prompt/expected_output describe a normal run but
    the assertions (e.g., "detected_invalid_provider_config",
    "did_not_run_promptfoo", "v60_runs_without_validation") expect invalid-provider
    behavior; change this case to consistently represent an invalid-provider
    scenario by renaming "eval_name" (e.g., "invalid-provider-evaluation"), updating
    "prompt" to state the promptfooconfig.yaml contains an invalid Vertex AI
    provider format, and adjust "expected_output" to assert detection of the invalid
    provider, instructions to fix, reference to the generate-evals skill, and that
    promptfoo is not run; keep the listed assertions as-is so the test suite checks
    for detection, fix instructions, no run, and baseline v6.0 behavior.

In @plugins/agentic-docs/skills/evaluate/scripts/run-eval.sh:

  • Around line 7-8: REPO_ROOT is being set to a plugin-relative path so promptfoo
    runs in the wrong directory and misses promptfooconfig.yaml; update run-eval.sh
    to compute the true repository root (e.g., use git rev-parse --show-toplevel or
    resolve SCRIPT_DIR up to the repo root) and ensure the script cds into that
    computed REPO_ROOT before invoking promptfoo (the area around the current
    cd/execution that references REPO_ROOT). Also verify promptfoo is invoked with
    the correct working directory or explicit config path so promptfooconfig.yaml in
    the repo root is found.

In @plugins/metrics/scripts/ai_docs_telemetry.py:

  • Around line 102-107: The try/except around opening and reading session_path
    currently catches broad Exception; narrow it to file-related exceptions (e.g.,
    catch FileNotFoundError, PermissionError and IsADirectoryError or a general
    OSError) when opening/reading the file so different failure modes aren’t masked,
    keep the same error print to sys.stderr and return None as before; update the
    block that opens session_path and reads content (the with open(session_path,
    'r') as f: / content = f.read() section) to catch these specific exceptions
    instead of Exception.
  • Around line 204-209: The pre-filter around session_file.read_text() should
    also check for "CLAUDE.md" in addition to "ai-docs/" and "AGENTS.md" so sessions
    that only touched CLAUDE.md aren't skipped; update the conditional that
    currently reads if not ("ai-docs/" in content or "AGENTS.md" in content) to
    include "CLAUDE.md". Also replace the silent except: continue with logged error
    handling—catch the exception from session_file.read_text(), log the exception
    and the session_file (or its path) using the module's existing logger (e.g.,
    logger.exception or logger.error) for visibility, then continue. Ensure you
    modify the try/except block around session_file.read_text() and the conditional
    that inspects content.

Nitpick comments:
In @plugins/metrics/commands/ai-docs-telemetry.md:

  • Around line 10-13: The fenced code block containing the command examples
    /metrics:ai-docs-telemetry -scan [-project <name>] and
    /metrics:ai-docs-telemetry -session <path-to-session.jsonl> needs a language
    specifier for proper highlighting; update the block to start with bash and keep the closing so the two command lines are rendered as bash code.
  • Around line 44-88: Add explicit language specifiers (bash) to all example
    fenced code blocks that show command usage for the ai-docs telemetry tool (e.g.,
    blocks containing "/metrics:ai-docs-telemetry -scan",
    "/metrics:ai-docs-telemetry -scan -project enhancements",
    "/metrics:ai-docs-telemetry -scan -project machine-config-operator",
    "/metrics:ai-docs-telemetry -session
    ~/.claude/projects//.jsonl" and the bash pipeline examples
    using jq) by changing the opening triple backticks to ```bash so the snippets
    render correctly as shell commands.

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Enterprise

**Run ID**: `f0fc34ce-a64e-46b8-b2b5-72896a629198`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 503e009a7755326342b30ef7efc736e5f89d079c and d2b43331a30d942ddda31300873a247280284e9e.

</details>

<details>
<summary>📒 Files selected for processing (13)</summary>

* `.claude-plugin/marketplace.json`
* `docs/data.json`
* `plugins/agentic-docs/.claude-plugin/plugin.json`
* `plugins/agentic-docs/commands/evaluate.md`
* `plugins/agentic-docs/commands/generate-evals.md`
* `plugins/agentic-docs/skills/evaluate/SKILL.md`
* `plugins/agentic-docs/skills/evaluate/evals/evals.json`
* `plugins/agentic-docs/skills/evaluate/scripts/run-eval.sh`
* `plugins/agentic-docs/skills/generate-evals/SKILL.md`
* `plugins/agentic-docs/skills/generate-evals/templates/promptfooconfig.example.yaml`
* `plugins/metrics/README.md`
* `plugins/metrics/commands/ai-docs-telemetry.md`
* `plugins/metrics/scripts/ai_docs_telemetry.py`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment thread .claude-plugin/marketplace.json
Comment thread docs/data.json
Comment on lines +1826 to +1844
{
"commands": [],
"description": "Create and maintain AI-optimized documentation for OpenShift",
"has_readme": true,
"hooks": [],
"name": "agentic-docs",
"skills": [
{
"description": "Create lean component documentation for OpenShift repositories",
"id": "component",
"name": "component-docs"
},
{
"description": "Update existing platform documentation with automatic gap detection in openshift/enhancements",
"id": "update-platform-docs",
"name": "update-platform-docs"
}
],
"version": "1.0.0"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Plugin registration data is inconsistent with actual plugin contents.

The docs/data.json entry has three critical mismatches:

  1. Commands: Empty array, but the PR adds two commands (generate-evals and evaluate).
  2. Skills: Lists component and update-platform-docs, but the PR description and actual files define generate-evals and evaluate skills.
  3. Version: Shows "1.0.0", but plugin.json declares "1.1.0".
🔧 Proposed fix
     },
     {
-      "commands": [],
+      "commands": [
+        {
+          "argument_hint": "[repository-path]",
+          "description": "Generate repository-specific promptfoo evaluation suites for OpenShift documentation",
+          "name": "generate-evals",
+          "synopsis": "/agentic-docs:generate-evals [repository-path]"
+        },
+        {
+          "argument_hint": "[repository-path]",
+          "description": "Evaluate agentic documentation quality using promptfoo-based behavioral validation",
+          "name": "evaluate",
+          "synopsis": "/agentic-docs:evaluate [repository-path]"
+        }
+      ],
       "description": "Create and maintain AI-optimized documentation for OpenShift",
       "has_readme": true,
       "hooks": [],
       "name": "agentic-docs",
       "skills": [
         {
-          "description": "Create lean component documentation for OpenShift repositories",
-          "id": "component",
-          "name": "component-docs"
+          "description": "Generate repository-specific promptfoo evaluation suites tailored to OpenShift conventions and repository patterns",
+          "id": "generate-evals",
+          "name": "agentic-docs:generate-evals"
         },
         {
-          "description": "Update existing platform documentation with automatic gap detection in openshift/enhancements",
-          "id": "update-platform-docs",
-          "name": "update-platform-docs"
+          "description": "Evaluate agentic documentation quality using promptfoo-based behavioral validation with natural discovery testing",
+          "id": "evaluate",
+          "name": "agentic-docs:evaluate"
         }
       ],
-      "version": "1.0.0"
+      "version": "1.1.0"
     }
   ]
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{
"commands": [],
"description": "Create and maintain AI-optimized documentation for OpenShift",
"has_readme": true,
"hooks": [],
"name": "agentic-docs",
"skills": [
{
"description": "Create lean component documentation for OpenShift repositories",
"id": "component",
"name": "component-docs"
},
{
"description": "Update existing platform documentation with automatic gap detection in openshift/enhancements",
"id": "update-platform-docs",
"name": "update-platform-docs"
}
],
"version": "1.0.0"
{
"commands": [
{
"argument_hint": "[repository-path]",
"description": "Generate repository-specific promptfoo evaluation suites for OpenShift documentation",
"name": "generate-evals",
"synopsis": "/agentic-docs:generate-evals [repository-path]"
},
{
"argument_hint": "[repository-path]",
"description": "Evaluate agentic documentation quality using promptfoo-based behavioral validation",
"name": "evaluate",
"synopsis": "/agentic-docs:evaluate [repository-path]"
}
],
"description": "Create and maintain AI-optimized documentation for OpenShift",
"has_readme": true,
"hooks": [],
"name": "agentic-docs",
"skills": [
{
"description": "Generate repository-specific promptfoo evaluation suites tailored to OpenShift conventions and repository patterns",
"id": "generate-evals",
"name": "agentic-docs:generate-evals"
},
{
"description": "Evaluate agentic documentation quality using promptfoo-based behavioral validation with natural discovery testing",
"id": "evaluate",
"name": "agentic-docs:evaluate"
}
],
"version": "1.1.0"
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/data.json` around lines 1826 - 1844, Update docs/data.json to match the
actual plugin contents: replace the empty "commands" array with the two command
names "generate-evals" and "evaluate"; replace the "skills" entries for
"component" and "update-platform-docs" with the actual skill objects for the new
skills (ids/names "generate-evals" and "evaluate" and appropriate descriptions
matching the PR); and change the "version" value from "1.0.0" to "1.1.0" to
match plugin.json (verify plugin.json for authoritative version). Ensure the
keys "commands", "skills", and "version" exactly reflect the new symbols
generate-evals and evaluate.

Comment on lines +6 to +17
"eval_name": "happy-path-evaluation",
"prompt": "I just created documentation for the multiarch-tuning-operator repository at /Users/kpais/kpais-workspace/claude-tmp/multiarch-tuning-operator-test-plugin. I ran /agentic-docs:generate-evals and it created a promptfooconfig.yaml file with 43 test cases. Now I want to run the evaluation to see if the documentation is good. Can you evaluate it?",
"expected_output": "Should spawn code sub-agent to run promptfoo, collect metrics, spawn judge sub-agent to analyze results, and produce comprehensive evaluation report",
"files": [],
"setup_required": "Repository with promptfooconfig.yaml, ANTHROPIC_API_KEY set",
"assertions": [
{"name": "detected_invalid_provider_config", "description": "v6.1 should detect the invalid Vertex AI provider format in promptfooconfig.yaml"},
{"name": "provided_fix_instructions", "description": "Should provide clear instructions on how to fix the provider configuration"},
{"name": "referenced_generate_evals_skill", "description": "Should reference the generate-evals skill documentation for the correct format"},
{"name": "did_not_run_promptfoo", "description": "Should NOT run promptfoo when invalid config is detected"},
{"name": "clear_next_steps", "description": "Should provide clear next steps (edit config or regenerate)"},
{"name": "v60_runs_without_validation", "description": "v6.0 (baseline) should attempt to run promptfoo and encounter API errors"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make eval case 1 internally consistent.

Line 6/Line 8 define a happy-path run, but Line 12–Line 17 assert invalid-provider handling and “did_not_run_promptfoo”. This contradiction can make the suite report misleading results.

Suggested fix
       "assertions": [
-        {"name": "detected_invalid_provider_config", "description": "v6.1 should detect the invalid Vertex AI provider format in promptfooconfig.yaml"},
-        {"name": "provided_fix_instructions", "description": "Should provide clear instructions on how to fix the provider configuration"},
-        {"name": "referenced_generate_evals_skill", "description": "Should reference the generate-evals skill documentation for the correct format"},
-        {"name": "did_not_run_promptfoo", "description": "Should NOT run promptfoo when invalid config is detected"},
-        {"name": "clear_next_steps", "description": "Should provide clear next steps (edit config or regenerate)"},
-        {"name": "v60_runs_without_validation", "description": "v6.0 (baseline) should attempt to run promptfoo and encounter API errors"}
+        {"name": "spawned_code_subagent", "description": "Should spawn code sub-agent to run promptfoo"},
+        {"name": "ran_promptfoo_tests", "description": "Should execute promptfoo evals successfully"},
+        {"name": "spawned_judge_subagent", "description": "Should spawn judge sub-agent to analyze results"},
+        {"name": "reported_quality_summary", "description": "Should report pass/fail quality summary"},
+        {"name": "reported_cost_latency", "description": "Should include cost and latency regression checks"},
+        {"name": "clear_next_steps", "description": "Should provide clear next steps based on results"}
       ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"eval_name": "happy-path-evaluation",
"prompt": "I just created documentation for the multiarch-tuning-operator repository at /Users/kpais/kpais-workspace/claude-tmp/multiarch-tuning-operator-test-plugin. I ran /agentic-docs:generate-evals and it created a promptfooconfig.yaml file with 43 test cases. Now I want to run the evaluation to see if the documentation is good. Can you evaluate it?",
"expected_output": "Should spawn code sub-agent to run promptfoo, collect metrics, spawn judge sub-agent to analyze results, and produce comprehensive evaluation report",
"files": [],
"setup_required": "Repository with promptfooconfig.yaml, ANTHROPIC_API_KEY set",
"assertions": [
{"name": "detected_invalid_provider_config", "description": "v6.1 should detect the invalid Vertex AI provider format in promptfooconfig.yaml"},
{"name": "provided_fix_instructions", "description": "Should provide clear instructions on how to fix the provider configuration"},
{"name": "referenced_generate_evals_skill", "description": "Should reference the generate-evals skill documentation for the correct format"},
{"name": "did_not_run_promptfoo", "description": "Should NOT run promptfoo when invalid config is detected"},
{"name": "clear_next_steps", "description": "Should provide clear next steps (edit config or regenerate)"},
{"name": "v60_runs_without_validation", "description": "v6.0 (baseline) should attempt to run promptfoo and encounter API errors"}
"eval_name": "happy-path-evaluation",
"prompt": "I just created documentation for the multiarch-tuning-operator repository at /Users/kpais/kpais-workspace/claude-tmp/multiarch-tuning-operator-test-plugin. I ran /agentic-docs:generate-evals and it created a promptfooconfig.yaml file with 43 test cases. Now I want to run the evaluation to see if the documentation is good. Can you evaluate it?",
"expected_output": "Should spawn code sub-agent to run promptfoo, collect metrics, spawn judge sub-agent to analyze results, and produce comprehensive evaluation report",
"files": [],
"setup_required": "Repository with promptfooconfig.yaml, ANTHROPIC_API_KEY set",
"assertions": [
{"name": "spawned_code_subagent", "description": "Should spawn code sub-agent to run promptfoo"},
{"name": "ran_promptfoo_tests", "description": "Should execute promptfoo evals successfully"},
{"name": "spawned_judge_subagent", "description": "Should spawn judge sub-agent to analyze results"},
{"name": "reported_quality_summary", "description": "Should report pass/fail quality summary"},
{"name": "reported_cost_latency", "description": "Should include cost and latency regression checks"},
{"name": "clear_next_steps", "description": "Should provide clear next steps based on results"}
]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/agentic-docs/skills/evaluate/evals/evals.json` around lines 6 - 17,
This eval is internally inconsistent: the eval named "happy-path-evaluation" and
its prompt/expected_output describe a normal run but the assertions (e.g.,
"detected_invalid_provider_config", "did_not_run_promptfoo",
"v60_runs_without_validation") expect invalid-provider behavior; change this
case to consistently represent an invalid-provider scenario by renaming
"eval_name" (e.g., "invalid-provider-evaluation"), updating "prompt" to state
the promptfooconfig.yaml contains an invalid Vertex AI provider format, and
adjust "expected_output" to assert detection of the invalid provider,
instructions to fix, reference to the generate-evals skill, and that promptfoo
is not run; keep the listed assertions as-is so the test suite checks for
detection, fix instructions, no run, and baseline v6.0 behavior.

Comment on lines +7 to +8
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
REPO_ROOT="$( cd "$SCRIPT_DIR/../.." && pwd )"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix target directory resolution before running promptfoo.

Line 8 resolves to a plugin-relative path, not the repository being evaluated; then Line 45–Line 46 force execution there. This can break evaluation by missing promptfooconfig.yaml.

Suggested fix
 SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
-REPO_ROOT="$( cd "$SCRIPT_DIR/../.." && pwd )"
+TARGET_REPO="${TARGET_REPO:-$PWD}"
@@
-# Change to repo root (where config and files are)
-cd "$REPO_ROOT"
+# Change to target repository (where promptfooconfig.yaml should exist)
+cd "$TARGET_REPO"
+
+if [ ! -f "promptfooconfig.yaml" ]; then
+    echo "❌ Error: promptfooconfig.yaml not found in $TARGET_REPO"
+    echo "   Run /agentic-docs:generate-evals first or set TARGET_REPO correctly."
+    exit 1
+fi

Also applies to: 45-50

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/agentic-docs/skills/evaluate/scripts/run-eval.sh` around lines 7 - 8,
REPO_ROOT is being set to a plugin-relative path so promptfoo runs in the wrong
directory and misses promptfooconfig.yaml; update run-eval.sh to compute the
true repository root (e.g., use git rev-parse --show-toplevel or resolve
SCRIPT_DIR up to the repo root) and ensure the script cds into that computed
REPO_ROOT before invoking promptfoo (the area around the current cd/execution
that references REPO_ROOT). Also verify promptfoo is invoked with the correct
working directory or explicit config path so promptfooconfig.yaml in the repo
root is found.

Comment on lines +102 to +107
try:
with open(session_path, 'r') as f:
content = f.read()
except Exception as e:
print(f"Error reading session: {e}", file=sys.stderr)
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Catch specific exceptions instead of broad Exception.

Catching broad Exception masks different error scenarios. Specify file-related exceptions for clearer error handling:

🛡️ Proposed fix
     try:
         with open(session_path, 'r') as f:
             content = f.read()
-    except Exception as e:
+    except (FileNotFoundError, PermissionError, IOError) as e:
         print(f"Error reading session: {e}", file=sys.stderr)
         return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
with open(session_path, 'r') as f:
content = f.read()
except Exception as e:
print(f"Error reading session: {e}", file=sys.stderr)
return None
try:
with open(session_path, 'r') as f:
content = f.read()
except (FileNotFoundError, PermissionError, IOError) as e:
print(f"Error reading session: {e}", file=sys.stderr)
return None
🧰 Tools
🪛 Ruff (0.15.12)

[warning] 105-105: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/metrics/scripts/ai_docs_telemetry.py` around lines 102 - 107, The
try/except around opening and reading session_path currently catches broad
Exception; narrow it to file-related exceptions (e.g., catch FileNotFoundError,
PermissionError and IsADirectoryError or a general OSError) when opening/reading
the file so different failure modes aren’t masked, keep the same error print to
sys.stderr and return None as before; update the block that opens session_path
and reads content (the with open(session_path, 'r') as f: / content = f.read()
section) to catch these specific exceptions instead of Exception.

Comment on lines +204 to +209
try:
content = session_file.read_text()
if not ("ai-docs/" in content or "AGENTS.md" in content):
continue
except Exception:
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Pre-filter is missing "CLAUDE.md" check and lacks error logging.

Two issues:

  1. Line 206 checks for "ai-docs/" and "AGENTS.md" but not "CLAUDE.md", even though the full processing at line 140 includes it. Sessions with only CLAUDE.md accesses will be incorrectly skipped.
  2. The try-except silently continues without logging, making it difficult to diagnose issues.
🔧 Proposed fix
         # Quick pre-filter: check if file contains ai-docs markers
         try:
             content = session_file.read_text()
-            if not ("ai-docs/" in content or "AGENTS.md" in content):
+            if not ("ai-docs/" in content or "AGENTS.md" in content or "CLAUDE.md" in content):
                 continue
-        except Exception:
+        except (FileNotFoundError, PermissionError, IOError) as e:
+            print(f"Warning: Could not read {session_file}: {e}", file=sys.stderr)
             continue
🧰 Tools
🪛 Ruff (0.15.12)

[error] 208-209: try-except-continue detected, consider logging the exception

(S112)


[warning] 208-208: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/metrics/scripts/ai_docs_telemetry.py` around lines 204 - 209, The
pre-filter around session_file.read_text() should also check for "CLAUDE.md" in
addition to "ai-docs/" and "AGENTS.md" so sessions that only touched CLAUDE.md
aren't skipped; update the conditional that currently reads if not ("ai-docs/"
in content or "AGENTS.md" in content) to include "CLAUDE.md". Also replace the
silent except: continue with logged error handling—catch the exception from
session_file.read_text(), log the exception and the session_file (or its path)
using the module's existing logger (e.g., logger.exception or logger.error) for
visibility, then continue. Ensure you modify the try/except block around
session_file.read_text() and the conditional that inspects content.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
plugins/agentic-docs/skills/evaluate/SKILL.md (1)

1034-1034: 💤 Low value

Consider adding language identifiers to fenced code blocks.

Several fenced code blocks (at lines 1034, 1067, 1099, 1112, 1139, and 1179) lack language identifiers. Adding text, markdown, or other appropriate language tags would improve syntax highlighting and accessibility.

Example fix
-```
+```text
 ERROR: Evaluation configuration not found
 ...
 ```

Also applies to: 1067-1067, 1099-1099, 1112-1112, 1139-1139, 1179-1179

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/agentic-docs/skills/evaluate/SKILL.md` at line 1034, Several fenced
code blocks in the evaluate skill markdown currently open with bare ``` and lack
language hints (e.g., blocks containing "ERROR: Evaluation configuration not
found" and similar examples); update each opening fence from ``` to a suitable
language tag such as ```text or ```markdown (choose `text` for plain
error/output blocks and `markdown`/other for formatted snippets) so syntax
highlighting and accessibility are improved, ensuring every code fence in the
SKILL.md evaluate documentation has a language identifier.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/data.json`:
- Around line 1329-1332: The ai-docs-telemetry command metadata is inconsistent:
the argument_hint field contains "[-session <path>]" but the synopsis does not;
update the JSON so both match—either remove "[-session <path>]" from
argument_hint or add "[-session <path>]" into the synopsis string for the
"ai-docs-telemetry" entry so that "argument_hint" and "synopsis" are consistent.

In `@plugins/agentic-docs/skills/evaluate/SKILL.md`:
- Around line 1269-1272: The SKILL.md lists a non-existent command
'/agentic-docs:component' causing inaccurate docs; remove that entry (or replace
it with a real command such as '/agentic-docs:evaluate' if intended) from the
markdown and ensure the plugin command list matches the registry in
docs/data.json which only contains 'evaluate' and 'generate-evals'; update the
line in SKILL.md that currently contains '/agentic-docs:component' so the
documented commands exactly match the names in docs/data.json.

---

Nitpick comments:
In `@plugins/agentic-docs/skills/evaluate/SKILL.md`:
- Line 1034: Several fenced code blocks in the evaluate skill markdown currently
open with bare ``` and lack language hints (e.g., blocks containing "ERROR:
Evaluation configuration not found" and similar examples); update each opening
fence from ``` to a suitable language tag such as ```text or ```markdown (choose
`text` for plain error/output blocks and `markdown`/other for formatted
snippets) so syntax highlighting and accessibility are improved, ensuring every
code fence in the SKILL.md evaluate documentation has a language identifier.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3822426f-7a26-4c5b-8542-f6ccf8fdafed

📥 Commits

Reviewing files that changed from the base of the PR and between d2b4333 and dfda728.

📒 Files selected for processing (5)
  • .claude-plugin/marketplace.json
  • PLUGINS.md
  • docs/data.json
  • plugins/agentic-docs/.claude-plugin/plugin.json
  • plugins/agentic-docs/skills/evaluate/SKILL.md
✅ Files skipped from review due to trivial changes (1)
  • PLUGINS.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/agentic-docs/.claude-plugin/plugin.json

Comment thread docs/data.json Outdated
Comment on lines +1329 to +1332
"argument_hint": "[-scan] [-project <name>] [-session <path>]",
"description": "Analyze Claude Code session logs for ai-docs usage patterns",
"name": "ai-docs-telemetry",
"synopsis": "/metrics:ai-docs-telemetry -scan [-project <name>]"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Minor: Inconsistency between argument_hint and synopsis.

The argument_hint includes [-session <path>] but the synopsis omits it. Either add -session to the synopsis or remove it from the argument_hint to keep them consistent.

📝 Proposed fix
         {
-          "argument_hint": "[-scan] [-project <name>] [-session <path>]",
+          "argument_hint": "[-scan] [-project <name>]",
           "description": "Analyze Claude Code session logs for ai-docs usage patterns",
           "name": "ai-docs-telemetry",
           "synopsis": "/metrics:ai-docs-telemetry -scan [-project <name>]"
         }

Or if -session is intentional:

         {
           "argument_hint": "[-scan] [-project <name>] [-session <path>]",
           "description": "Analyze Claude Code session logs for ai-docs usage patterns",
           "name": "ai-docs-telemetry",
-          "synopsis": "/metrics:ai-docs-telemetry -scan [-project <name>]"
+          "synopsis": "/metrics:ai-docs-telemetry -scan [-project <name>] [-session <path>]"
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"argument_hint": "[-scan] [-project <name>] [-session <path>]",
"description": "Analyze Claude Code session logs for ai-docs usage patterns",
"name": "ai-docs-telemetry",
"synopsis": "/metrics:ai-docs-telemetry -scan [-project <name>]"
"argument_hint": "[-scan] [-project <name>]",
"description": "Analyze Claude Code session logs for ai-docs usage patterns",
"name": "ai-docs-telemetry",
"synopsis": "/metrics:ai-docs-telemetry -scan [-project <name>]"
Suggested change
"argument_hint": "[-scan] [-project <name>] [-session <path>]",
"description": "Analyze Claude Code session logs for ai-docs usage patterns",
"name": "ai-docs-telemetry",
"synopsis": "/metrics:ai-docs-telemetry -scan [-project <name>]"
"argument_hint": "[-scan] [-project <name>] [-session <path>]",
"description": "Analyze Claude Code session logs for ai-docs usage patterns",
"name": "ai-docs-telemetry",
"synopsis": "/metrics:ai-docs-telemetry -scan [-project <name>] [-session <path>]"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/data.json` around lines 1329 - 1332, The ai-docs-telemetry command
metadata is inconsistent: the argument_hint field contains "[-session <path>]"
but the synopsis does not; update the JSON so both match—either remove
"[-session <path>]" from argument_hint or add "[-session <path>]" into the
synopsis string for the "ai-docs-telemetry" entry so that "argument_hint" and
"synopsis" are consistent.

Comment on lines +1269 to +1272

- `/agentic-docs:generate-evals` - Generate promptfooconfig.yaml before evaluating
- `/agentic-docs:component` - Create component documentation to evaluate
- `/metrics:ai-docs-telemetry` - Analyze documentation usage patterns
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that component command exists in docs registry
if grep -q '"name": "component"' docs/data.json; then
  echo "component command found in registry"
else
  echo "component command NOT found in registry - reference should be removed"
fi

Repository: openshift-eng/ai-helpers

Length of output: 138


Remove non-existent command reference.

Line 1271 references /agentic-docs:component, but this command does not exist in the plugin's command registry (docs/data.json contains only evaluate and generate-evals).

🔧 Proposed fix
 ## Related Commands
 
 - `/agentic-docs:generate-evals` - Generate promptfooconfig.yaml before evaluating
-- `/agentic-docs:component` - Create component documentation to evaluate
 - `/metrics:ai-docs-telemetry` - Analyze documentation usage patterns
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- `/agentic-docs:generate-evals` - Generate promptfooconfig.yaml before evaluating
- `/agentic-docs:component` - Create component documentation to evaluate
- `/metrics:ai-docs-telemetry` - Analyze documentation usage patterns
- `/agentic-docs:generate-evals` - Generate promptfooconfig.yaml before evaluating
- `/metrics:ai-docs-telemetry` - Analyze documentation usage patterns
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/agentic-docs/skills/evaluate/SKILL.md` around lines 1269 - 1272, The
SKILL.md lists a non-existent command '/agentic-docs:component' causing
inaccurate docs; remove that entry (or replace it with a real command such as
'/agentic-docs:evaluate' if intended) from the markdown and ensure the plugin
command list matches the registry in docs/data.json which only contains
'evaluate' and 'generate-evals'; update the line in SKILL.md that currently
contains '/agentic-docs:component' so the documented commands exactly match the
names in docs/data.json.

Comment thread docs/data.json
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error (plugins-doc-up-to-date): docs/data.json is out of sync with plugin metadata. Run 'make update' to update.

label: claude

prompts:
- "{{prompt}}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's craft the prompt like this:

You are working in the <repo-name> repository.

    {{prompt}}

    ===================================
    MANDATORY: End your response with a "## Documentation Used" section listing all files you read:

    ## Documentation Used

    - /path/to/file.md (reason)

    DO NOT SKIP THIS SECTION.
    ===================================

so that we can later check in the rubric that the documentation was indeed used. check https://github.com/openshift/enhancements/pull/1992/changes#diff-c7c3415c9cea54e2f9f4b6c84a6d9f381aaad790522f156c94dd39cf4af278d9 for an example

What API changes and controller logic are needed?
assert:
- type: llm-rubric
value: "The output mentions platform-specific KMS services (AWS KMS and Azure Key Vault)"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rubric should also check that the agentic documentation was actually used

vars:
agent: cloud-provider-sme
prompt: |
We want to implement customer-managed encryption key support for
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also ensure that any new features that it tries to develop must either a) not be present and b) are hypothetical features (in case it comes up with a name, it must make sure that API name/CRD name should not be present)

- description: "conventions/01-api-versioning"
vars:
prompt: |
Review: "We should create a new <RepoSpecificAPI> starting at v1."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than asking if it is correct - the prompt should just ask the LLM to do it with the violation. we expect the LLM to tell it it shouldn't based on the documentation guidelines


**Repository-specific anti-patterns**:

Extract from CLAUDE.md or ai-docs/ sections that say:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while this is true, the cases are not limited to this .for example again in openshift/enhancements#1992, one anti pattern test is to create stable v1 apis which is strongly discouraged. maybe we want to keep this a little open and in the end anyway the component owner will have to review these cases

- Auto-invocation after agentic-docs:create
- Three test categories (navigation, authoring, anti-pattern)
- Standard + repository-specific anti-patterns
- promptfooconfig.yaml generation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skill is too long. i'm worried claude will miss some context. let's try to keep it succinct


The generated configuration follows the exact format from the template (HyperShift-based evaluation framework).

### Why Repository-Specific Evals?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this section about why we ned repo specific evals?


### Phase 2: Navigation Test Generation

Generate 2-3 navigation tests that verify agents can find repository-specific documentation.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe how many of each test is something that can be user input

• promptfooconfig.yaml - Evaluation configuration
• EVALUATION.md - Evaluation documentation

Run evaluations: make eval
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have templated guidance on writing the makefile changes?

@Prashanth684
Copy link
Copy Markdown
Contributor

/hold
only to be merged after #437 merges

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 15, 2026
```

### Phase 4: Anti-Pattern Test Generation

Copy link
Copy Markdown
Contributor

@Prashanth684 Prashanth684 May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the anti-pattern test generation is also repo specific. i.e, the example below of API starting at v1 is more of a generic example which is why it was in enhancements. maybe other repos will have repo specific anti patterns

{
"id": 1,
"eval_name": "happy-path-evaluation",
"prompt": "I just created documentation for the multiarch-tuning-operator repository at /Users/kpais/kpais-workspace/claude-tmp/multiarch-tuning-operator-test-plugin. I ran /agentic-docs:generate-evals and it created a promptfooconfig.yaml file with 43 test cases. Now I want to run the evaluation to see if the documentation is good. Can you evaluate it?",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this file meant to be an example ?

Copy link
Copy Markdown
Author

@kenjpais kenjpais May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this file contains evals to test the /evaluate skill itself.
evals.json file is generated by the skill-creator skill as part of its predefined workflow:
https://github.com/anthropics/skills/blob/main/skills/skill-creator/SKILL.md

However, I haven't added evals.json for /generate-evals skill or the /component skill yet.

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 18, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 18, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants