Add contributing eval, lint-fix target, and eval rules#473
Conversation
- Add eval-contributing Makefile target for root evals/promptfooconfig.yaml - Add contributing workflow eval: tests that new plugin plans follow CLAUDE.md rules (mentions make lint, evals, plugin.json, marketplace) - Add lint-fix Makefile target (skillsaw fix -y) - Fix budget rule to resolve root-level eval configs to "contributing" budget entity - Add contributing budget entry ($1.00 allowed, $0.50 current) - Update GH Actions workflow to run eval-contributing when CLAUDE.md, AGENTS.md, or .claude/rules/ change Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CLAUDE.md: add make lint-fix and make eval-plugins to dev commands, add eval requirement to contributing rules - evals/AGENTS.md: add linting section, add lint step to adding evals guide - .claude/rules/evals.md: new rule reminding to add evals with metadata - .gitignore: allow .claude/rules/ to be tracked Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enxebre The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a "contributing" eval track: enforces per-test eval metadata, maps root-level promptfoo tests to ChangesContributing evaluation workflow
sequenceDiagram
autonumber
participant Dev as Developer
participant Repo as Repository
participant CI as GitHub Actions
participant Linter as skillsaw
participant Eval as promptfoo/Vertex
Dev->>Repo: push changes (evals, rules, docs, Makefile)
CI->>Repo: run detect-changes job
alt detect-changes.outputs.has_contributing == 'true'
CI->>Linter: run make lint / make lint-fix (if invoked)
CI->>Eval: run eval-contributing (promptfoo eval -c evals/promptfooconfig.yaml)
CI->>CI: upload eval-results-contributing/*.xml and *.html and .promptfoo/
else
CI->>CI: build plugin matrix from detect-changes.outputs.plugins
CI->>Eval: run behavioral-evals (eval-plugins)
CI->>CI: upload eval-results/*.xml and *.html and .promptfoo/
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (8 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
|
||
| .PHONY: eval-contributing | ||
| eval-contributing: ## Run contributing workflow evals (root evals/promptfooconfig.yaml) | ||
| @npm install |
There was a problem hiding this comment.
Can we commit package-lock.json for supply chain protection?
I didn't notice this in my first pass https://github.com/openshift-eng/ai-helpers/blob/main/package.json
We should also set a minimum release age, nothing newer than 48 hours old.
.npmrc in the root of the repo:
min-release-age=2
There was a problem hiding this comment.
🧹 Nitpick comments (2)
.github/workflows/eval-plugins.yml (1)
96-121: 💤 Low valueConsider adding artifact upload for consistency.
The
contributing-evalsjob runs evals but doesn't upload results as artifacts, unlike thebehavioral-evalsjob. If you want to preserve contributing eval results for debugging or analysis, consider adding an upload step similar to lines 88-94.Optional: Add artifact upload
- name: Run contributing evals env: ANTHROPIC_VERTEX_PROJECT_ID: ${{ secrets.ANTHROPIC_VERTEX_PROJECT_ID }} run: make eval-contributing + + - name: Upload contributing eval results + if: always() + uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2 + with: + name: eval-results-contributing + path: .promptfoo/output/*.json + retention-days: 30🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/eval-plugins.yml around lines 96 - 121, The contributing-evals job currently runs the evals (step name "Run contributing evals" which executes "make eval-contributing") but does not persist results; add an artifact upload step after that run similar to the behavioral-evals job: add a step that uses actions/upload-artifact to upload the eval output files (choose the same artifact name and path pattern you use for behavioral-evals, e.g., results/*.json or the directory produced by make eval-contributing) so the contributing-evals job saves its output for debugging and analysis.evals/promptfooconfig.yaml (1)
45-52: ⚡ Quick winBroaden rubric checks to cover overlap + plugin-scope rules.
This eval can still pass plans that skip core contributing decision rules (overlap check and whether a new plugin is even warranted). Add rubric assertions for those rules.
♻️ Proposed assertion additions
assert: - type: llm-rubric value: "The plan includes a verification section that mentions running make lint (or make lint-fix if lint fails)" + - type: llm-rubric + value: "The plan checks for overlap first by running /utils:review-ai-helpers-overlap --idea \"...\" before proposing a new plugin" + - type: llm-rubric + value: "The plan explains that commands should be added to an existing plugin when scope fits, or to plugins/utils when no clear parent exists, and only creates a new plugin for a distinct command group" - type: llm-rubric value: "The plan mentions adding evals or running make eval-plugins for the new plugin" - type: llm-rubric value: "The plan mentions creating a plugin.json with name, description, version, and author fields"Based on learnings: "Check for overlaps with existing helpers before contributing by running
/utils:review-ai-helpers-overlap --idea \"description\"" and "Add new commands to an existing plugin when they fit its scope ... Create a new plugin only for distinct groups of related commands".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/promptfooconfig.yaml` around lines 45 - 52, The eval currently misses checks for overlap and plugin-scope decision rules; update evals/promptfooconfig.yaml by adding llm-rubric assertions that (1) require the plan to run the overlap check command "/utils:review-ai-helpers-overlap --idea \"description\"" or explicitly state that no overlaps exist, and (2) require the plan to justify creating a new plugin vs adding commands to an existing plugin (e.g., mention "create a new plugin only for distinct groups of related commands" or "add commands to existing plugin scope"), while keeping existing rubrics like checks for make lint/make lint-fix, make eval-plugins, plugin.json fields, and marketplace.json/make update.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In @.github/workflows/eval-plugins.yml:
- Around line 96-121: The contributing-evals job currently runs the evals (step
name "Run contributing evals" which executes "make eval-contributing") but does
not persist results; add an artifact upload step after that run similar to the
behavioral-evals job: add a step that uses actions/upload-artifact to upload the
eval output files (choose the same artifact name and path pattern you use for
behavioral-evals, e.g., results/*.json or the directory produced by make
eval-contributing) so the contributing-evals job saves its output for debugging
and analysis.
In `@evals/promptfooconfig.yaml`:
- Around line 45-52: The eval currently misses checks for overlap and
plugin-scope decision rules; update evals/promptfooconfig.yaml by adding
llm-rubric assertions that (1) require the plan to run the overlap check command
"/utils:review-ai-helpers-overlap --idea \"description\"" or explicitly state
that no overlaps exist, and (2) require the plan to justify creating a new
plugin vs adding commands to an existing plugin (e.g., mention "create a new
plugin only for distinct groups of related commands" or "add commands to
existing plugin scope"), while keeping existing rubrics like checks for make
lint/make lint-fix, make eval-plugins, plugin.json fields, and
marketplace.json/make update.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 05b20eeb-f65e-43e3-bf02-cf167af035e8
📒 Files selected for processing (9)
.claude/rules/evals.md.github/workflows/eval-plugins.yml.gitignore.skillsaw/promptfoo_budget_rule.pyAGENTS.mdMakefileevals/AGENTS.mdevals/budget.yamlevals/promptfooconfig.yaml
- Remove package-lock.json from .gitignore and commit the lockfile - Add .npmrc with min-release-age=2 (reject packages newer than 48 hours) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update SDK from 0.2.132 to 0.2.141 to fix missing native binary error on CI (linux-x64-musl). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
33d5e62 to
f0710c1
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/eval-plugins.yml:
- Around line 12-24: The detect-changes job currently runs with repository
content access denied by the workflow-wide permissions: {}; update the
detect-changes job to explicitly grant read access by adding a permissions block
(permissions: { contents: read }) at the job level so the actions/checkout step
and subsequent git diff operations can access the repo; ensure the permissions
block is added to the detect-changes job (not just workflow-level) so the
actions/checkout action and any commands that run git diff succeed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 59cbbcb0-dff4-4ce9-bdf0-39ca72807904
📒 Files selected for processing (2)
.github/workflows/eval-plugins.ymlMakefile
| detect-changes: | ||
| runs-on: ubuntu-latest | ||
| outputs: | ||
| plugins: ${{ steps.detect.outputs.plugins }} | ||
| has_plugins: ${{ steps.detect.outputs.has_plugins }} | ||
| has_contributing: ${{ steps.detect.outputs.has_contributing }} | ||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 | ||
| with: | ||
| persist-credentials: false | ||
| fetch-depth: 0 | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file=".github/workflows/eval-plugins.yml"
echo "Top-level permissions declaration:"
rg -n '^\s*permissions:\s*\{\s*\}\s*$' "$file" || true
echo
echo "detect-changes block:"
awk '
/^ detect-changes:/ {in_block=1}
in_block {print NR ":" $0}
in_block && /^ behavioral-evals:/ {exit}
' "$file"
echo
echo "permissions entries inside detect-changes:"
awk '
/^ detect-changes:/ {in_block=1; next}
/^ behavioral-evals:/ {in_block=0}
in_block && $1=="permissions:" {print NR ":" $0}
' "$file"Repository: openshift-eng/ai-helpers
Length of output: 2000
Add explicit contents: read permission to detect-changes.
The workflow sets permissions: {} at line 6, which denies all permissions by default. The detect-changes job (lines 12–23) lacks an explicit permissions block, so actions/checkout at line 20 will fail without read access to repository content. Additionally, the git diff command at line 28 requires repo content access.
🔧 Proposed fix
detect-changes:
+ permissions:
+ contents: read
runs-on: ubuntu-latest📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| detect-changes: | |
| runs-on: ubuntu-latest | |
| outputs: | |
| plugins: ${{ steps.detect.outputs.plugins }} | |
| has_plugins: ${{ steps.detect.outputs.has_plugins }} | |
| has_contributing: ${{ steps.detect.outputs.has_contributing }} | |
| steps: | |
| - name: Checkout code | |
| uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 | |
| with: | |
| persist-credentials: false | |
| fetch-depth: 0 | |
| detect-changes: | |
| permissions: | |
| contents: read | |
| runs-on: ubuntu-latest | |
| outputs: | |
| plugins: ${{ steps.detect.outputs.plugins }} | |
| has_plugins: ${{ steps.detect.outputs.has_plugins }} | |
| has_contributing: ${{ steps.detect.outputs.has_contributing }} | |
| steps: | |
| - name: Checkout code | |
| uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 | |
| with: | |
| persist-credentials: false | |
| fetch-depth: 0 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/eval-plugins.yml around lines 12 - 24, The detect-changes
job currently runs with repository content access denied by the workflow-wide
permissions: {}; update the detect-changes job to explicitly grant read access
by adding a permissions block (permissions: { contents: read }) at the job level
so the actions/checkout step and subsequent git diff operations can access the
repo; ensure the permissions block is added to the detect-changes job (not just
workflow-level) so the actions/checkout action and any commands that run git
diff succeed.
fd6bf84 to
212d487
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
Makefile (1)
53-53: ⚡ Quick winConsider using full config paths in eval artifact filenames to prevent future collisions.
Line 53 + Line 61 currently derive output names from basename only. While no collisions exist in the repository today, parallel
eval-pluginsruns could overwrite XML/HTML outputs inEVAL_OUTPUT_DIRif two eval configs ever share a filename. Using the full config path instead is a defensive measure:♻️ Proposed change
- $(eval EVAL_NAME := $(basename $(notdir $(CONFIG)))) + $(eval EVAL_NAME := $(subst /,__,$(basename $(CONFIG))))Also applies to: 61-61
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Makefile` at line 53, The current artifact naming uses only the basename via EVAL_NAME := $(basename $(notdir $(CONFIG))) which can cause collisions; change the derivation to use the full config path (e.g., use $(CONFIG) or $(abspath $(CONFIG))) and sanitize path separators to a filename-safe form (replace "/" with "_" or similar) when computing EVAL_NAME and the similar expression at the other occurrence so eval XML/HTML outputs use a unique name derived from the entire config path rather than just the basename.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@Makefile`:
- Line 53: The current artifact naming uses only the basename via EVAL_NAME :=
$(basename $(notdir $(CONFIG))) which can cause collisions; change the
derivation to use the full config path (e.g., use $(CONFIG) or $(abspath
$(CONFIG))) and sanitize path separators to a filename-safe form (replace "/"
with "_" or similar) when computing EVAL_NAME and the similar expression at the
other occurrence so eval XML/HTML outputs use a unique name derived from the
entire config path rather than just the basename.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ba568012-df5f-48d0-a6aa-fb75504f6d95
📒 Files selected for processing (2)
.github/workflows/eval-plugins.ymlMakefile
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/eval-plugins.yml
212d487 to
3e17d16
Compare
Generate both JUnit XML and HTML reports when EVAL_OUTPUT_DIR is set. The HTML file is a self-contained promptfoo comparison view — download from the GH Actions artifact and open in a browser. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Plugin Evals → Evals - behavioral-evals → plugins - contributing-evals → contributing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3e17d16 to
a18fc10
Compare
Explicitly forward PROMPTFOO_CONFIG_DIR in both Makefile targets and GH Actions workflow so promptfoo writes its database to eval-results/. This enables downloading the artifact and running `promptfoo view` locally for the full interactive UI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
make lint-fixandmake eval-contributingMakefile targetscontributingbudget entityeval-contributingwhen contributing-relevant files change.claude/rules/evals.mdpackage-lock.jsonand add.npmrcwithmin-release-age=2for supply chain protectionContributing eval (
evals/promptfooconfig.yaml)Tests that when asked to plan a new plugin, the agent:
make lint(ormake lint-fix)make eval-pluginsplugin.jsonwith required fieldsmarketplace.jsonor runningmake updateUses sonnet as judge (
tier: medium,token-usage: small,judge-size: sonnet).CI triggers
plugins/changes (existing)CLAUDE.md,AGENTS.md,.claude/rules/, orevals/promptfooconfig.yamlchangeEval artifacts
Each CI job uploads:
*.xml) for structured results*.html) for static viewingpromptfoo.db) for full interactive viewTo view results locally after downloading the artifact:
New Makefile targets
Budget
Workflow rename
Plugin Evals→Evalsbehavioral-evals→pluginscontributing-evals→contributingSupply chain
package-lock.jsonfor reproducible installs.npmrcwithmin-release-age=2(reject packages newer than 48 hours)Test plan
make eval-contributing— 1/1 passedmake lint— passes (budget rule validates root eval config)PROMPTFOO_CONFIG_DIRproduces downloadable promptfoo.db in CI artifactsPROMPTFOO_CONFIG_DIR=/path/to/artifact npx promptfoo viewworks locally🤖 Generated with Claude Code