-
Notifications
You must be signed in to change notification settings - Fork 266
OKD-370: Add promptfoo evals for agentic-docs plugin #477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
d7536c5
42ce286
0b01023
28c45fe
7029d28
d560ce6
3bc9046
f471cb2
83e473c
e89bd5e
9cf65a9
6864da2
ad5da0b
f58d024
d2b4333
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ❌ error ( |
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ❌ error ( |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| { | ||
| "name": "agentic-docs", | ||
| "description": "Create and maintain AI-optimized documentation for OpenShift", | ||
| "version": "1.1.0", | ||
| "author": { | ||
| "name": "github.com/openshift-eng" | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| --- | ||
| description: Evaluate agentic documentation quality using promptfoo-based behavioral validation | ||
| argument-hint: "[repository-path]" | ||
| --- | ||
|
|
||
| ## Name | ||
| agentic-docs:evaluate | ||
|
|
||
| ## Synopsis | ||
| ``` | ||
| /agentic-docs:evaluate [repository-path] | ||
| ``` | ||
|
|
||
| ## Description | ||
| The `agentic-docs:evaluate` command evaluates documentation quality by testing whether AI agents naturally discover and correctly apply repository conventions without being explicitly told to read documentation. | ||
|
|
||
| This command validates **documentation-first natural discovery behavior** using the OpenShift Enhancements Agentic Docs Evaluation framework. It measures: | ||
| - **Natural discovery**: Does the agent find documentation without instruction? | ||
| - **Correct navigation**: Does the agent follow documentation structure? | ||
| - **Pattern application**: Does the agent apply repository conventions correctly? | ||
| - **Anti-pattern rejection**: Does the agent reject incorrect patterns? | ||
|
|
||
| The evaluation uses promptfoo to run assertions from `promptfooconfig.yaml` and generates detailed HTML reports with pass/fail grades. | ||
|
|
||
| ## Implementation | ||
| When this command is invoked, Claude will execute the `agentic-docs:evaluate` skill, which: | ||
| 1. Loads evaluation configuration from `promptfooconfig.yaml` | ||
| 2. Runs coding sub-agents with task descriptions (no explicit file instructions) | ||
| 3. Evaluates whether agents naturally discovered and applied documentation | ||
| 4. Generates graded results with pass/fail assertions | ||
| 5. Creates HTML reports for review | ||
|
|
||
| The skill maintains strict separation between coding agents (who must discover docs naturally) and evaluation agents (who grade the results). | ||
|
|
||
| ## Return Value | ||
| - Evaluation results with pass/fail grades for each test case | ||
| - HTML report showing which documentation was discovered and applied | ||
| - Metrics on natural discovery patterns | ||
|
|
||
| ## Examples | ||
|
|
||
| 1. **Evaluate current directory**: | ||
| ``` | ||
| /agentic-docs:evaluate | ||
| ``` | ||
| Evaluates documentation in the current working directory. | ||
|
|
||
| 2. **Evaluate specific repository**: | ||
| ``` | ||
| /agentic-docs:evaluate /path/to/openshift/repo | ||
| ``` | ||
| Evaluates documentation in the specified repository. | ||
|
|
||
| ## Arguments | ||
| - `repository-path` (optional): Path to the repository to evaluate. Defaults to current directory if not specified. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,54 @@ | ||
| --- | ||
| description: Generate repository-specific promptfoo evaluation suites for OpenShift documentation | ||
| argument-hint: "[repository-path]" | ||
| --- | ||
|
|
||
| ## Name | ||
| agentic-docs:generate-evals | ||
|
|
||
| ## Synopsis | ||
| ``` | ||
| /agentic-docs:generate-evals [repository-path] | ||
| ``` | ||
|
|
||
| ## Description | ||
| The `agentic-docs:generate-evals` command generates a tailored `promptfooconfig.yaml` evaluation suite for a specific OpenShift repository. Instead of using a generic evaluation configuration, it analyzes the repository's documentation structure, code patterns, and conventions to create repository-specific test cases. | ||
|
|
||
| The generated evaluation suite tests whether AI agents can: | ||
| - Naturally discover repository documentation | ||
| - Apply repository-specific patterns correctly | ||
| - Follow established conventions without explicit instruction | ||
| - Reject anti-patterns specific to the repository | ||
|
|
||
| This follows the OpenShift Enhancements Agentic Docs Evaluation framework, which emphasizes documentation-first natural discovery. | ||
|
|
||
| ## Implementation | ||
| When this command is invoked, Claude will execute the `agentic-docs:generate-evals` skill, which: | ||
| 1. Analyzes repository documentation structure (CLAUDE.md, ai-docs/, ARCHITECTURE.md) | ||
| 2. Identifies code patterns (API versions, operator patterns, controller structure) | ||
| 3. Extracts repository-specific conventions | ||
| 4. Generates test cases that validate natural documentation discovery | ||
| 5. Creates `promptfooconfig.yaml` with assertions tailored to the repository | ||
| 6. Saves configuration to the repository root | ||
|
|
||
| ## Return Value | ||
| - Generated `promptfooconfig.yaml` file in the repository root | ||
| - Test cases specific to the repository's patterns and conventions | ||
| - Assertions configured for natural discovery validation | ||
|
|
||
| ## Examples | ||
|
|
||
| 1. **Generate evals for current directory**: | ||
| ``` | ||
| /agentic-docs:generate-evals | ||
| ``` | ||
| Analyzes the current repository and generates `promptfooconfig.yaml`. | ||
|
|
||
| 2. **Generate evals for specific repository**: | ||
| ``` | ||
| /agentic-docs:generate-evals /path/to/openshift/repo | ||
| ``` | ||
| Analyzes the specified repository and generates tailored evaluation configuration. | ||
|
|
||
| ## Arguments | ||
| - `repository-path` (optional): Path to the target repository for analysis. Defaults to current directory if not specified. |
Uh oh!
There was an error while loading. Please reload this page.