Skip to content

Commit ca19f1b

Browse files
JReinholdCopilot
andauthored
Evals: Copy from flexible hook-based dirs, move meta-level configs to preview.ts (#92)
* change "expected" eval dir to hook based dirs, move meta-level eval configs to preview.ts * cleanup * Update eval/lib/run-hook.ts Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Copilot <[email protected]>
1 parent dce8c8d commit ca19f1b

File tree

28 files changed

+292
-2227
lines changed

28 files changed

+292
-2227
lines changed

β€Ž.github/instructions/eval.instructions.mdβ€Ž

Lines changed: 64 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,8 @@ eval/
5858
β”‚ β”œβ”€β”€ components.json # Optional component manifest
5959
β”‚ β”œβ”€β”€ mcp.config.json # Optional MCP server config
6060
β”‚ β”œβ”€β”€ *.md # Optional additional context
61-
β”‚ β”œβ”€β”€ expected/ # Expected output for reference
61+
β”‚ β”œβ”€β”€ pre-evaluate/ # Optional files to copy before evaluation
62+
β”‚ β”œβ”€β”€ {hook-name}/ # Optional hook directories (see Lifecycle Hooks)
6263
β”‚ └── experiments/ # Generated experiment runs
6364
└── templates/
6465
β”œβ”€β”€ project/ # Base Vite + React template
@@ -156,10 +157,11 @@ node eval.ts --help
156157
} satisfies Hooks;
157158
```
158159
159-
5. **Optional: Create `expected/` directory:**
160-
- Add reference implementation in `expected/src/components/`
161-
- Add expected stories in `expected/stories/`
162-
- Used for comparison during evaluation
160+
5. **Optional: Create hook directories:**
161+
- Create directories named after lifecycle hooks in kebab-case
162+
- Files in these directories are copied to `projectPath` at that lifecycle point
163+
- Example: `pre-evaluate/stories/MyComponent.stories.ts` copies test stories before evaluation
164+
- See [Lifecycle Hooks](#lifecycle-hooks) for the full list of supported directories
163165
164166
### Viewing Results
165167
@@ -362,51 +364,91 @@ Each experiment produces comprehensive metrics:
362364
363365
## Lifecycle Hooks
364366
365-
Evals can define lifecycle hooks in `hooks.ts` to customize behavior:
367+
Evals can customize behavior at each lifecycle step through two mechanisms:
368+
369+
### Hook Directories
370+
371+
Create directories named after lifecycle hooks (kebab-case) to automatically copy files to `projectPath` at that step:
372+
373+
| Directory | When Contents Are Copied |
374+
|-----------|-------------------------|
375+
| `pre-prepare-experiment/` | Before project template is copied |
376+
| `post-prepare-experiment/` | After dependencies are installed |
377+
| `pre-execute-agent/` | Before agent starts execution |
378+
| `post-execute-agent/` | After agent completes |
379+
| `pre-evaluate/` | Before evaluation runs |
380+
| `post-evaluate/` | After evaluation completes |
381+
| `pre-save/` | Before results are saved |
382+
| `post-save/` | After results are saved |
383+
384+
**Example:** To add test stories that run against agent-generated components:
385+
```
386+
evals/200-my-component/
387+
β”œβ”€β”€ prompt.md
388+
β”œβ”€β”€ pre-evaluate/
389+
β”‚ └── stories/
390+
β”‚ └── MyComponent.stories.ts
391+
```
392+
393+
The `pre-evaluate/stories/MyComponent.stories.ts` file will be copied to `project/stories/MyComponent.stories.ts` before evaluation runs.
394+
395+
Directories merge with existing content in `projectPath`, and files overwrite if they already exist.
396+
397+
### Hook Functions
398+
399+
For programmatic customization, define hooks in `hooks.ts`:
366400
367401
```typescript
368402
import type { Hooks } from '../../types.ts';
369403
import * as fs from 'node:fs/promises';
370404
import * as path from 'node:path';
371405
372406
export default {
373-
// Before project template is copied
407+
// Before project template is copied (after pre-prepare-experiment/ is copied)
374408
prePrepareExperiment: async (args, log) => {
375409
log.message('Custom pre-preparation');
376410
},
377411
378-
// After dependencies are installed
412+
// After dependencies are installed (after post-prepare-experiment/ is copied)
379413
postPrepareExperiment: async (args, log) => {
380-
// Copy fixture files
381-
await fs.cp(
382-
path.join(args.evalPath, 'fixtures'),
383-
path.join(args.projectPath, 'public/fixtures'),
384-
{ recursive: true }
385-
);
414+
// Install additional dependencies
415+
await addDependency('some-package', { cwd: args.projectPath, silent: true });
386416
},
387417
388-
// Before agent starts execution
418+
// Before agent starts (after pre-execute-agent/ is copied)
389419
preExecuteAgent: async (args, log) => {
390420
log.message('Starting agent');
391421
},
392422
393-
// After agent completes
423+
// After agent completes (after post-execute-agent/ is copied)
394424
postExecuteAgent: async (args, log) => {
395425
log.message('Agent finished');
396426
},
397427
398-
// Before evaluation runs
428+
// Before evaluation runs (after pre-evaluate/ is copied)
399429
preEvaluate: async (args, log) => {
400430
log.start('Custom pre-evaluation');
401431
},
402432
403-
// After evaluation completes
433+
// After evaluation completes (after post-evaluate/ is copied)
404434
postEvaluate: async (args, log) => {
405435
log.success('Custom post-evaluation');
436+
},
437+
438+
// Before results are saved (after pre-save/ is copied)
439+
preSave: async (args, log) => {
440+
log.message('Saving results');
441+
},
442+
443+
// After results are saved (after post-save/ is copied)
444+
postSave: async (args, log) => {
445+
log.success('All done');
406446
}
407447
} satisfies Hooks;
408448
```
409449
450+
**Execution Order:** For each lifecycle step, the framework first copies files from the hook directory (if it exists), then calls the hook function (if defined).
451+
410452
**Logger Interface:**
411453
412454
Both `taskLog` (verbose) and `spinner` (normal) are wrapped in a unified interface:
@@ -472,7 +514,7 @@ Each experiment's project includes:
472514
473515
### Expected Stories
474516
475-
Evals should include `expected/stories/*.stories.ts` files that:
517+
Evals should include `pre-evaluate/stories/*.stories.ts` files that:
476518
477519
1. Import the component
478520
2. Define basic stories (e.g., Default)
@@ -614,7 +656,7 @@ When using `--context mcp.config.json`, the framework:
614656
3. **Check `typecheck-output.txt`**: TypeScript issues
615657
4. **Inspect `lint-output.txt`**: Code quality problems
616658
5. **Read `test-results.json`**: Test failures and a11y violations
617-
6. **Compare with `expected/`**: See reference implementation
659+
6. **Compare with `pre-evaluate/`**: See reference files copied before evaluation
618660
619661
### Common Issues
620662
@@ -633,7 +675,8 @@ When using `--context mcp.config.json`, the framework:
633675
634676
- The framework is designed for reproducibility - same inputs should give comparable outputs
635677
- Always check `collect-args.ts` for the canonical list of CLI options
636-
- Hooks are optional - most evals don't need them
678+
- Hooks are optional - most evals only need `pre-evaluate/` for test stories
679+
- Hook directories copy files first, then hook functions run
637680
- Extra prompts are append-only - they don't replace the main prompt
638681
- The `CONSTRAINTS_PROMPT` is always appended to prevent package manager usage
639682
- Agent token counting is approximate - uses client-side tokenizer, not actual API response

β€Ževal/.storybook/main.tsβ€Ž

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@ function getAbsolutePath(value: string): any {
1515
const config: StorybookConfig = {
1616
stories: [
1717
'../evals/*/experiments/*/project/stories/*.stories.@(js|jsx|mjs|ts|tsx)',
18-
'../evals/*/expected/stories/*.stories.@(js|jsx|mjs|ts|tsx)',
1918
'../templates/result-docs/*.stories.@(js|jsx|mjs|ts|tsx)',
2019
],
2120
addons: [

β€Ževal/README.mdβ€Ž

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,6 @@ eval/
5858
β”‚ β”œβ”€β”€ mcp.config.json # Optional: MCP server config
5959
β”‚ β”œβ”€β”€ extra-prompt-*.md # Optional: additional context
6060
β”‚ β”œβ”€β”€ hooks.ts # Optional: lifecycle hooks
61-
β”‚ β”œβ”€β”€ expected/ # Expected output for comparison
6261
β”‚ └── experiments/ # Generated experiment runs
6362
β”‚ └── {context}-{agent}-{timestamp}/
6463
β”‚ β”œβ”€β”€ prompt.md # Full prompt sent to agent
@@ -211,7 +210,6 @@ pnpm storybook
211210

212211
- Use `--verbose` to see detailed agent activity and tool calls
213212
- Check `full-conversation.js` to debug agent behavior
214-
- Compare `project/` output with `expected/` directory
215213
- Use extra prompts to guide agent without modifying main prompt
216214
- Component manifests work best when agents need library documentation
217215

0 commit comments

Comments
Β (0)