Commit db0973e
committed
data: OPE-99 benchmark -- 3-run results (publication quality)
Results from 30 Claude API calls (5 tasks x 3 runs x 2 conditions):
Without AGENTS.md: 27/33 checks passed (81.8%)
With AGENTS.md: 33/33 checks passed (100.0%)
Improvement: 100% fewer violations (consistent across all 3 runs)
Two violations found (both deterministic, 3/3 times):
1. install_package: Claude used 'npm install date-fns' without context.
With AGENTS.md ('Package manager: bun -- always use bun install, never npm'):
Claude used 'bun add date-fns'. Package manager is codebase-specific --
Claude cannot know this from training data.
2. frontend_component: Claude used className template literals without context.
With AGENTS.md ('Class merging: use cn() from @/lib/utils'):
Claude used cn() correctly. This is a shadcn/ui convention, not universal.
Tasks that passed in both conditions (Claude already knew from training):
- FastAPI Depends() auth pattern (well-known FastAPI convention)
- snake_case Python functions (general Python knowledge)
- HTTPException for errors (documented FastAPI pattern)
- pytest fixtures (standard pytest knowledge)
Conclusion: AGENTS.md matters most for PROJECT-SPECIFIC conventions
that Claude can't know from training data alone. Generic patterns
(FastAPI Depends, snake_case) Claude already knows. Specific tools
(bun, cn()) and local conventions require context.1 parent 88aa268 commit db0973e
2 files changed
Lines changed: 273 additions & 29 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
6 | | - | |
| 5 | + | |
| 6 | + | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
13 | | - | |
14 | | - | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| |||
0 commit comments