-
Notifications
You must be signed in to change notification settings - Fork 266
Add contributing eval, lint-fix target, and eval rules #473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
4569c54
395d88d
6aabcf2
9637016
2d24b83
a18fc10
96031cc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| When modifying plugin commands or skills, add or update evals in `plugins/<name>/evals/`. | ||
|
|
||
| Every eval test must have per-test metadata: | ||
| ```yaml | ||
| metadata: | ||
| token-usage: small | medium | large | ||
| judge-size: none | sonnet | opus | ||
| tier: fast | medium | heavy | ||
| ``` | ||
|
|
||
| Use YAML anchors (`&meta-fast` / `*meta-fast`) to avoid repetition. | ||
|
|
||
| After adding or modifying evals: | ||
| 1. Run `make lint` — the skillsaw linter validates metadata, tier classification, and budget compliance against `evals/budget.yaml` | ||
| 2. If lint fails, run `make lint-fix` to auto-fix what it can | ||
| 3. Run `make eval-plugins EVAL_PLUGIN=<name>` to verify tests pass | ||
| 4. Update `evals/budget.yaml` budgets.current if cost thresholds changed | ||
|
|
||
| See `evals/AGENTS.md` for the full tiering model and budget rules. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,6 +18,10 @@ help: ## Show this help message | |
| lint: ## Run plugin linter (verbose, strict mode) | ||
| $(CONTAINER_RUNTIME) run --rm --platform linux/amd64 $(SELINUX_OPT) -v $(PWD):/workspace:Z $(SKILLSAW_IMAGE) . | ||
|
|
||
| .PHONY: lint-fix | ||
| lint-fix: ## Auto-fix lint violations | ||
| $(CONTAINER_RUNTIME) run --rm --platform linux/amd64 $(SELINUX_OPT) -v $(PWD):/workspace:Z $(SKILLSAW_IMAGE) fix -y . | ||
|
|
||
| .PHONY: lint-pull | ||
| lint-pull: ## Pull the latest skillsaw image | ||
| $(CONTAINER_RUNTIME) pull $(SKILLSAW_IMAGE) | ||
|
|
@@ -59,4 +63,16 @@ $(EVAL_TARGETS): | |
| --no-cache \ | ||
| --table-cell-max-length 500 | ||
|
|
||
| .PHONY: eval-contributing | ||
| eval-contributing: ## Run contributing workflow evals (root evals/promptfooconfig.yaml) | ||
| @npm install | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we commit package-lock.json for supply chain protection? I didn't notice this in my first pass https://github.com/openshift-eng/ai-helpers/blob/main/package.json We should also set a minimum release age, nothing newer than 48 hours old.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
| @CLAUDE_CODE_USE_VERTEX=true \ | ||
| PROMPTFOO_PASS_RATE_THRESHOLD=$(EVAL_PASS_RATE_THRESHOLD) \ | ||
| npx promptfoo eval \ | ||
| -c evals/promptfooconfig.yaml \ | ||
| $(if $(EVAL_FILTER),--filter-pattern "$(EVAL_FILTER)") \ | ||
| --repeat $(EVAL_REPEAT) \ | ||
| --no-cache \ | ||
| --table-cell-max-length 500 | ||
|
|
||
| .DEFAULT_GOAL := help | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,38 +1,57 @@ | ||
| # Root eval config — smoke test to verify the provider works. | ||
| # For full evals, use make eval-plugins or run individual plugin evals. | ||
| # Root evals — contributing workflow validation. | ||
| # These test the repo-level behavior, not individual plugins. | ||
|
|
||
| description: "ai-helpers — provider smoke test" | ||
| description: "ai-helpers — root evals" | ||
|
|
||
| providers: | ||
| - id: anthropic:claude-agent-sdk | ||
| label: smoke | ||
| label: ai-helpers | ||
| config: | ||
| model: claude-opus-4-6 | ||
| plugins: | ||
| - type: local | ||
| path: ../plugins/hello-world | ||
| append_allowed_tools: ['Bash'] | ||
| permission_mode: 'auto' | ||
| working_dir: ../ | ||
| append_allowed_tools: ['Read', 'Grep', 'Glob'] | ||
| permission_mode: 'default' | ||
|
|
||
| prompts: | ||
| - "{{prompt}}" | ||
|
|
||
| defaultTest: | ||
| options: | ||
| provider: | ||
| id: vertex:claude-opus-4-6 | ||
| id: vertex:claude-sonnet-4-6 | ||
| config: | ||
| projectId: "{{ env.ANTHROPIC_VERTEX_PROJECT_ID }}" | ||
| region: global | ||
| temperature: 0 | ||
| assert: | ||
| - type: latency | ||
| threshold: 30000 | ||
| - type: cost | ||
| threshold: 0.50 | ||
|
|
||
| tests: | ||
| # skillsaw-disable promptfoo-budget | ||
| # skillsaw-disable promptfoo-assertions | ||
| # skillsaw-disable promptfoo-metadata | ||
| - description: "smoke/provider-loads" | ||
| - description: "contributing/new-plugin-plan — follows contributing rules" | ||
| metadata: | ||
| token-usage: small | ||
| judge-size: sonnet | ||
| tier: medium | ||
| vars: | ||
| prompt: "Run /hello-world:echo" | ||
| prompt: | | ||
| I want to create a new plugin called "bye-world" with a single command | ||
| "farewell" that prints "Goodbye world" or "Goodbye <name>". | ||
| Do NOT create any files — only describe what you would do, what files | ||
| you would create, and what verification steps you would run. | ||
| assert: | ||
| - type: llm-rubric | ||
| value: "The plan includes a verification section that mentions running make lint (or make lint-fix if lint fails)" | ||
| - type: llm-rubric | ||
| value: "The plan mentions adding evals or running make eval-plugins for the new plugin" | ||
| - type: llm-rubric | ||
| value: "The plan mentions creating a plugin.json with name, description, version, and author fields" | ||
| - type: llm-rubric | ||
| value: "The plan mentions registering the plugin in marketplace.json or running make update" | ||
| - type: icontains | ||
| value: "Hello world" | ||
| value: "make lint" | ||
|
|
||
| evaluateOptions: | ||
| maxConcurrency: 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: openshift-eng/ai-helpers
Length of output: 2000
Add explicit
contents: readpermission todetect-changes.The workflow sets
permissions: {}at line 6, which denies all permissions by default. Thedetect-changesjob (lines 12–23) lacks an explicit permissions block, soactions/checkoutat line 20 will fail without read access to repository content. Additionally, thegit diffcommand at line 28 requires repo content access.🔧 Proposed fix
📝 Committable suggestion
🤖 Prompt for AI Agents