Skip to content

Add contributing eval, lint-fix target, and eval rules#473

Open
enxebre wants to merge 7 commits into
mainfrom
eval-contributing-docs
Open

Add contributing eval, lint-fix target, and eval rules#473
enxebre wants to merge 7 commits into
mainfrom
eval-contributing-docs

Conversation

@enxebre
Copy link
Copy Markdown
Collaborator

@enxebre enxebre commented May 14, 2026

Summary

  • Add a contributing workflow eval that tests whether Claude Code follows CLAUDE.md rules when planning a new plugin
  • Add make lint-fix and make eval-contributing Makefile targets
  • Fix the skillsaw budget rule to resolve root-level eval configs to the contributing budget entity
  • Update GH Actions workflow to run eval-contributing when contributing-relevant files change
  • Document eval requirements in CLAUDE.md contributing rules and .claude/rules/evals.md
  • Add HTML output and promptfoo.db to eval artifacts for interactive viewing
  • Commit package-lock.json and add .npmrc with min-release-age=2 for supply chain protection

Contributing eval (evals/promptfooconfig.yaml)

Tests that when asked to plan a new plugin, the agent:

  • Mentions running make lint (or make lint-fix)
  • Mentions adding evals or running make eval-plugins
  • Plans a plugin.json with required fields
  • Mentions registering in marketplace.json or running make update

Uses sonnet as judge (tier: medium, token-usage: small, judge-size: sonnet).

CI triggers

  • plugins job: runs when plugins/ changes (existing)
  • contributing job: runs when CLAUDE.md, AGENTS.md, .claude/rules/, or evals/promptfooconfig.yaml change

Eval artifacts

Each CI job uploads:

  • JUnit XML (*.xml) for structured results
  • HTML report (*.html) for static viewing
  • promptfoo database (promptfoo.db) for full interactive view

To view results locally after downloading the artifact:

unzip eval-results-contributing.zip -d /tmp/eval-view
PROMPTFOO_CONFIG_DIR=/tmp/eval-view npx promptfoo view

New Makefile targets

make lint-fix                    # auto-fix lint violations (skillsaw fix -y)
make eval-contributing           # run contributing workflow eval
make eval-contributing EVAL_REPEAT=5 EVAL_PASS_RATE_THRESHOLD=80

Budget

Entity Allowed Current
contributing $1.00 $0.50

Workflow rename

  • Plugin EvalsEvals
  • behavioral-evalsplugins
  • contributing-evalscontributing

Supply chain

  • Committed package-lock.json for reproducible installs
  • Added .npmrc with min-release-age=2 (reject packages newer than 48 hours)

Test plan

  • make eval-contributing — 1/1 passed
  • make lint — passes (budget rule validates root eval config)
  • Budget enforcement verified (linter catches over-budget)
  • PROMPTFOO_CONFIG_DIR produces downloadable promptfoo.db in CI artifacts
  • PROMPTFOO_CONFIG_DIR=/path/to/artifact npx promptfoo view works locally

🤖 Generated with Claude Code

enxebre and others added 2 commits May 14, 2026 19:05
- Add eval-contributing Makefile target for root evals/promptfooconfig.yaml
- Add contributing workflow eval: tests that new plugin plans follow
  CLAUDE.md rules (mentions make lint, evals, plugin.json, marketplace)
- Add lint-fix Makefile target (skillsaw fix -y)
- Fix budget rule to resolve root-level eval configs to "contributing"
  budget entity
- Add contributing budget entry ($1.00 allowed, $0.50 current)
- Update GH Actions workflow to run eval-contributing when CLAUDE.md,
  AGENTS.md, or .claude/rules/ change

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CLAUDE.md: add make lint-fix and make eval-plugins to dev commands,
  add eval requirement to contributing rules
- evals/AGENTS.md: add linting section, add lint step to adding evals guide
- .claude/rules/evals.md: new rule reminding to add evals with metadata
- .gitignore: allow .claude/rules/ to be tracked

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 14, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 14, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 14, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 14, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a "contributing" eval track: enforces per-test eval metadata, maps root-level promptfoo tests to contributing for budget aggregation, adds a root contributing promptfoo eval, new Makefile targets (lint-fix, eval-contributing), CI detection and conditional job to run contributing evals, supporting docs, and a contributing budget entry.

Changes

Contributing evaluation workflow

Layer / File(s) Summary
Eval rules & budget mapping
.claude/rules/evals.md, .skillsaw/promptfoo_budget_rule.py, evals/budget.yaml
Documents required per-test YAML metadata (token-usage, judge-size, tier) with allowed values and anchor guidance; PromptfooBudgetRule._entity_name returns literal "contributing" for root-level promptfoo nodes; adds budgets.contributing (allowed: 1.00, current: 0.50).
Contributing root eval & linting docs
evals/promptfooconfig.yaml, evals/AGENTS.md, AGENTS.md
Repurposes root promptfooconfig.yaml into a contributing/new-plugin-plan eval with LLM-rubric assertions (including explicit make lint / evals guidance); documents skillsaw linting and inserts make lint into the "Adding Evals for a New Plugin" steps.
Makefile targets
Makefile
Adds lint-fix target (skillsaw auto-fix) and eval-contributing target (runs promptfoo eval -c evals/promptfooconfig.yaml honoring PROMPTFOO_PASS_RATE_THRESHOLD, EVAL_FILTER, EVAL_REPEAT, --no-cache, --table-cell-max-length 500).
CI detection & jobs
.github/workflows/eval-plugins.yml, .gitignore
Renames detect-changed-pluginsdetect-changes, adds has_contributing output (detects changes to CLAUDE.md/AGENTS.md/.claude/rules/ or evals/promptfooconfig.yaml), wires behavioral-evals to use detect-changes.outputs.plugins, expands artifact uploads to include HTML and .promptfoo/, and adds contributing-evals job conditioned on has_contributing; unignores .claude/rules/ in .gitignore.
Repository config
.npmrc
Adds min-release-age=2 setting.
sequenceDiagram
    autonumber
    participant Dev as Developer
    participant Repo as Repository
    participant CI as GitHub Actions
    participant Linter as skillsaw
    participant Eval as promptfoo/Vertex

    Dev->>Repo: push changes (evals, rules, docs, Makefile)
    CI->>Repo: run detect-changes job
    alt detect-changes.outputs.has_contributing == 'true'
      CI->>Linter: run make lint / make lint-fix (if invoked)
      CI->>Eval: run eval-contributing (promptfoo eval -c evals/promptfooconfig.yaml)
      CI->>CI: upload eval-results-contributing/*.xml and *.html and .promptfoo/
    else
      CI->>CI: build plugin matrix from detect-changes.outputs.plugins
      CI->>Eval: run behavioral-evals (eval-plugins)
      CI->>CI: upload eval-results/*.xml and *.html and .promptfoo/
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

ok-to-test, lgtm

Suggested reviewers

  • mrunalp
  • Prashanth684

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
No Assumed Git Remote Names ❌ Error The PR hardcodes the 'origin' git remote in .github/workflows/eval-plugins.yml line 28 without discovering it first: git diff --name-only origin/${{ github.base_ref }}...HEAD Discover the remote name dynamically before using it, e.g., via git remote -v, or use a safe reference that doesn't assume 'origin' as the remote name.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (8 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the three main changes: adding a contributing eval, adding the lint-fix target, and documenting eval rules.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
No Real People Names In Style References ✅ Passed No real people names found in style references, plugin commands, skills documentation, or example prompts. The PR correctly documents and enforces the "Never reference real people" rule in AGENTS.md.
Git Push Safety Rules ✅ Passed No git push or force push operations found. PR contains only documentation, configs, and evaluation tools. No unsafe push patterns detected.
No Untrusted Mcp Servers ✅ Passed PR introduces no untrusted MCP server installations. All npm packages are from trusted sources (@anthropic-ai, promptfoo). No MCP servers are installed or configured.
Ai-Helpers Overlap Detection ✅ Passed PR #473 does not modify files in filtered directories (plugins//commands/, plugins//skills/, agents/). Per Phase 1 filtering criteria, the overlap detection check is not applicable to this PR.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch eval-contributing-docs

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread Makefile

.PHONY: eval-contributing
eval-contributing: ## Run contributing workflow evals (root evals/promptfooconfig.yaml)
@npm install
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we commit package-lock.json for supply chain protection?

I didn't notice this in my first pass https://github.com/openshift-eng/ai-helpers/blob/main/package.json

We should also set a minimum release age, nothing newer than 48 hours old.

.npmrc in the root of the repo:

min-release-age=2

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
.github/workflows/eval-plugins.yml (1)

96-121: 💤 Low value

Consider adding artifact upload for consistency.

The contributing-evals job runs evals but doesn't upload results as artifacts, unlike the behavioral-evals job. If you want to preserve contributing eval results for debugging or analysis, consider adding an upload step similar to lines 88-94.

Optional: Add artifact upload
       - name: Run contributing evals
         env:
           ANTHROPIC_VERTEX_PROJECT_ID: ${{ secrets.ANTHROPIC_VERTEX_PROJECT_ID }}
         run: make eval-contributing
+
+      - name: Upload contributing eval results
+        if: always()
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+        with:
+          name: eval-results-contributing
+          path: .promptfoo/output/*.json
+          retention-days: 30
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/eval-plugins.yml around lines 96 - 121, The
contributing-evals job currently runs the evals (step name "Run contributing
evals" which executes "make eval-contributing") but does not persist results;
add an artifact upload step after that run similar to the behavioral-evals job:
add a step that uses actions/upload-artifact to upload the eval output files
(choose the same artifact name and path pattern you use for behavioral-evals,
e.g., results/*.json or the directory produced by make eval-contributing) so the
contributing-evals job saves its output for debugging and analysis.
evals/promptfooconfig.yaml (1)

45-52: ⚡ Quick win

Broaden rubric checks to cover overlap + plugin-scope rules.

This eval can still pass plans that skip core contributing decision rules (overlap check and whether a new plugin is even warranted). Add rubric assertions for those rules.

♻️ Proposed assertion additions
     assert:
       - type: llm-rubric
         value: "The plan includes a verification section that mentions running make lint (or make lint-fix if lint fails)"
+      - type: llm-rubric
+        value: "The plan checks for overlap first by running /utils:review-ai-helpers-overlap --idea \"...\" before proposing a new plugin"
+      - type: llm-rubric
+        value: "The plan explains that commands should be added to an existing plugin when scope fits, or to plugins/utils when no clear parent exists, and only creates a new plugin for a distinct command group"
       - type: llm-rubric
         value: "The plan mentions adding evals or running make eval-plugins for the new plugin"
       - type: llm-rubric
         value: "The plan mentions creating a plugin.json with name, description, version, and author fields"

Based on learnings: "Check for overlaps with existing helpers before contributing by running /utils:review-ai-helpers-overlap --idea \"description\"" and "Add new commands to an existing plugin when they fit its scope ... Create a new plugin only for distinct groups of related commands".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/promptfooconfig.yaml` around lines 45 - 52, The eval currently misses
checks for overlap and plugin-scope decision rules; update
evals/promptfooconfig.yaml by adding llm-rubric assertions that (1) require the
plan to run the overlap check command "/utils:review-ai-helpers-overlap --idea
\"description\"" or explicitly state that no overlaps exist, and (2) require the
plan to justify creating a new plugin vs adding commands to an existing plugin
(e.g., mention "create a new plugin only for distinct groups of related
commands" or "add commands to existing plugin scope"), while keeping existing
rubrics like checks for make lint/make lint-fix, make eval-plugins, plugin.json
fields, and marketplace.json/make update.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/workflows/eval-plugins.yml:
- Around line 96-121: The contributing-evals job currently runs the evals (step
name "Run contributing evals" which executes "make eval-contributing") but does
not persist results; add an artifact upload step after that run similar to the
behavioral-evals job: add a step that uses actions/upload-artifact to upload the
eval output files (choose the same artifact name and path pattern you use for
behavioral-evals, e.g., results/*.json or the directory produced by make
eval-contributing) so the contributing-evals job saves its output for debugging
and analysis.

In `@evals/promptfooconfig.yaml`:
- Around line 45-52: The eval currently misses checks for overlap and
plugin-scope decision rules; update evals/promptfooconfig.yaml by adding
llm-rubric assertions that (1) require the plan to run the overlap check command
"/utils:review-ai-helpers-overlap --idea \"description\"" or explicitly state
that no overlaps exist, and (2) require the plan to justify creating a new
plugin vs adding commands to an existing plugin (e.g., mention "create a new
plugin only for distinct groups of related commands" or "add commands to
existing plugin scope"), while keeping existing rubrics like checks for make
lint/make lint-fix, make eval-plugins, plugin.json fields, and
marketplace.json/make update.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 05b20eeb-f65e-43e3-bf02-cf167af035e8

📥 Commits

Reviewing files that changed from the base of the PR and between 433c72b and 395d88d.

📒 Files selected for processing (9)
  • .claude/rules/evals.md
  • .github/workflows/eval-plugins.yml
  • .gitignore
  • .skillsaw/promptfoo_budget_rule.py
  • AGENTS.md
  • Makefile
  • evals/AGENTS.md
  • evals/budget.yaml
  • evals/promptfooconfig.yaml

@enxebre enxebre marked this pull request as ready for review May 14, 2026 18:53
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 14, 2026
@openshift-ci openshift-ci Bot requested review from rvanderp3 and stleerh May 14, 2026 18:54
enxebre and others added 2 commits May 14, 2026 20:57
- Remove package-lock.json from .gitignore and commit the lockfile
- Add .npmrc with min-release-age=2 (reject packages newer than 48 hours)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update SDK from 0.2.132 to 0.2.141 to fix missing native binary
error on CI (linux-x64-musl).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@enxebre enxebre force-pushed the eval-contributing-docs branch 2 times, most recently from 33d5e62 to f0710c1 Compare May 14, 2026 19:44
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/eval-plugins.yml:
- Around line 12-24: The detect-changes job currently runs with repository
content access denied by the workflow-wide permissions: {}; update the
detect-changes job to explicitly grant read access by adding a permissions block
(permissions: { contents: read }) at the job level so the actions/checkout step
and subsequent git diff operations can access the repo; ensure the permissions
block is added to the detect-changes job (not just workflow-level) so the
actions/checkout action and any commands that run git diff succeed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 59cbbcb0-dff4-4ce9-bdf0-39ca72807904

📥 Commits

Reviewing files that changed from the base of the PR and between 33d5e62 and f0710c1.

📒 Files selected for processing (2)
  • .github/workflows/eval-plugins.yml
  • Makefile

Comment on lines +12 to 24
detect-changes:
runs-on: ubuntu-latest
outputs:
plugins: ${{ steps.detect.outputs.plugins }}
has_plugins: ${{ steps.detect.outputs.has_plugins }}
has_contributing: ${{ steps.detect.outputs.has_contributing }}
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
fetch-depth: 0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

file=".github/workflows/eval-plugins.yml"

echo "Top-level permissions declaration:"
rg -n '^\s*permissions:\s*\{\s*\}\s*$' "$file" || true

echo
echo "detect-changes block:"
awk '
/^  detect-changes:/ {in_block=1}
in_block {print NR ":" $0}
in_block && /^  behavioral-evals:/ {exit}
' "$file"

echo
echo "permissions entries inside detect-changes:"
awk '
/^  detect-changes:/ {in_block=1; next}
/^  behavioral-evals:/ {in_block=0}
in_block && $1=="permissions:" {print NR ":" $0}
' "$file"

Repository: openshift-eng/ai-helpers

Length of output: 2000


Add explicit contents: read permission to detect-changes.

The workflow sets permissions: {} at line 6, which denies all permissions by default. The detect-changes job (lines 12–23) lacks an explicit permissions block, so actions/checkout at line 20 will fail without read access to repository content. Additionally, the git diff command at line 28 requires repo content access.

🔧 Proposed fix
   detect-changes:
+    permissions:
+      contents: read
     runs-on: ubuntu-latest
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
detect-changes:
runs-on: ubuntu-latest
outputs:
plugins: ${{ steps.detect.outputs.plugins }}
has_plugins: ${{ steps.detect.outputs.has_plugins }}
has_contributing: ${{ steps.detect.outputs.has_contributing }}
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
fetch-depth: 0
detect-changes:
permissions:
contents: read
runs-on: ubuntu-latest
outputs:
plugins: ${{ steps.detect.outputs.plugins }}
has_plugins: ${{ steps.detect.outputs.has_plugins }}
has_contributing: ${{ steps.detect.outputs.has_contributing }}
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
fetch-depth: 0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/eval-plugins.yml around lines 12 - 24, The detect-changes
job currently runs with repository content access denied by the workflow-wide
permissions: {}; update the detect-changes job to explicitly grant read access
by adding a permissions block (permissions: { contents: read }) at the job level
so the actions/checkout step and subsequent git diff operations can access the
repo; ensure the permissions block is added to the detect-changes job (not just
workflow-level) so the actions/checkout action and any commands that run git
diff succeed.

@enxebre enxebre force-pushed the eval-contributing-docs branch 2 times, most recently from fd6bf84 to 212d487 Compare May 14, 2026 19:58
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
Makefile (1)

53-53: ⚡ Quick win

Consider using full config paths in eval artifact filenames to prevent future collisions.

Line 53 + Line 61 currently derive output names from basename only. While no collisions exist in the repository today, parallel eval-plugins runs could overwrite XML/HTML outputs in EVAL_OUTPUT_DIR if two eval configs ever share a filename. Using the full config path instead is a defensive measure:

♻️ Proposed change
-	$(eval EVAL_NAME := $(basename $(notdir $(CONFIG))))
+	$(eval EVAL_NAME := $(subst /,__,$(basename $(CONFIG))))

Also applies to: 61-61

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` at line 53, The current artifact naming uses only the basename via
EVAL_NAME := $(basename $(notdir $(CONFIG))) which can cause collisions; change
the derivation to use the full config path (e.g., use $(CONFIG) or $(abspath
$(CONFIG))) and sanitize path separators to a filename-safe form (replace "/"
with "_" or similar) when computing EVAL_NAME and the similar expression at the
other occurrence so eval XML/HTML outputs use a unique name derived from the
entire config path rather than just the basename.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@Makefile`:
- Line 53: The current artifact naming uses only the basename via EVAL_NAME :=
$(basename $(notdir $(CONFIG))) which can cause collisions; change the
derivation to use the full config path (e.g., use $(CONFIG) or $(abspath
$(CONFIG))) and sanitize path separators to a filename-safe form (replace "/"
with "_" or similar) when computing EVAL_NAME and the similar expression at the
other occurrence so eval XML/HTML outputs use a unique name derived from the
entire config path rather than just the basename.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ba568012-df5f-48d0-a6aa-fb75504f6d95

📥 Commits

Reviewing files that changed from the base of the PR and between f0710c1 and fd6bf84.

📒 Files selected for processing (2)
  • .github/workflows/eval-plugins.yml
  • Makefile
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/eval-plugins.yml

@enxebre enxebre force-pushed the eval-contributing-docs branch from 212d487 to 3e17d16 Compare May 14, 2026 20:02
enxebre and others added 2 commits May 14, 2026 22:08
Generate both JUnit XML and HTML reports when EVAL_OUTPUT_DIR is set.
The HTML file is a self-contained promptfoo comparison view — download
from the GH Actions artifact and open in a browser.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Plugin Evals → Evals
- behavioral-evals → plugins
- contributing-evals → contributing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@enxebre enxebre force-pushed the eval-contributing-docs branch from 3e17d16 to a18fc10 Compare May 14, 2026 20:09
Explicitly forward PROMPTFOO_CONFIG_DIR in both Makefile targets and
GH Actions workflow so promptfoo writes its database to eval-results/.
This enables downloading the artifact and running `promptfoo view`
locally for the full interactive UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants