feat(evals): add Braintrust evals package by barryroodt · Pull Request #1 · barryroodt/refine-skill

barryroodt · 2026-05-27T09:52:17Z

Summary

New evals/ package: separate npm workspace with 5 JS scorers for refine-skill output quality
Env-knob model swap (REFINE_EVAL_MODEL) + dryrun mode (REFINE_EVAL_DRYRUN=1) so you can smoke-test without burning API credits
NEXT_STEPS.md updated with install, free-model validation, fixture-growth, and future CI-gate todos

Test plan

`cd evals && npm install`
`REFINE_EVAL_DRYRUN=1 npm run eval:dryrun` — confirm scorers wire up and run without API key
With `BRAINTRUST_API_KEY` + `REFINE_EVAL_MODEL=gemini-2.5-flash`: `npm run eval` — confirm push to Braintrust

🤖 Generated with Claude Code

Separate npm package under evals/ with 5 scorers and dryrun mode. Env-knob model swap via REFINE_EVAL_MODEL; defaults to local dryrun. NEXT_STEPS.md updated with install/validation/CI-gate todos. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(evals): add Braintrust evals package#1

feat(evals): add Braintrust evals package#1
barryroodt wants to merge 1 commit into
mainfrom
feat/braintrust-evals

barryroodt commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

barryroodt commented May 27, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant