Skip to content

Commit 131cc28

Browse files
cameroncookecodex
andcommitted
fix(benchmarks): Validate session defaults during load
Validate Claude UI benchmark sessionDefaults while reading the suite config so unknown keys and invalid value types fail before simulator setup starts. Co-Authored-By: OpenAI Codex <codex@openai.com>
1 parent ed3417c commit 131cc28

3 files changed

Lines changed: 35 additions & 2 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424

2525
- Fixed Claude UI benchmark preflight so transient malformed or still-loading UI snapshots no longer crash the harness or finish before app UI is observable.
2626
- Fixed Claude UI benchmark preflight so configured first-run dismissals require a concrete simulator ID and suite-provided simulator IDs are recorded in command logs.
27-
- Fixed Claude UI benchmark config handling so invalid `failurePatterns` regexes fail before a suite starts and partial `allowedVariance` overrides preserve defaults for omitted metrics.
27+
- Fixed Claude UI benchmark config handling so invalid `failurePatterns` regexes and `sessionDefaults` fail before a suite starts and partial `allowedVariance` overrides preserve defaults for omitted metrics.
2828
- Fixed Claude UI benchmark temporary simulator cleanup so simulators created by the harness are deleted even when post-creation setup fails.
2929
- Fixed UI action snapshot refreshes so timeout while waiting for a settled post-action snapshot returns a recoverable warning instead of unstable element refs.
3030
- Fixed Claude UI benchmark suite runs so temporary simulators are applied through an isolated per-run MCP config instead of being overridden by repo or example-project config defaults.

src/benchmarks/claude-ui/__tests__/claude-ui-benchmark.test.ts

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,34 @@ describe('Claude UI benchmark analysis', () => {
213213
).toThrow('weather.yml.failurePatterns[1]: invalid regular expression');
214214
});
215215

216+
it('rejects invalid session defaults when loading config', () => {
217+
expect(() =>
218+
readConfig(
219+
{
220+
name: 'weather',
221+
prompt: 'prompt.md',
222+
sessionDefaults: {
223+
simulatorTypo: 'iPhone 17 Pro Max',
224+
},
225+
},
226+
'weather.yml',
227+
),
228+
).toThrow("unknown sessionDefaults key 'simulatorTypo'");
229+
230+
expect(() =>
231+
readConfig(
232+
{
233+
name: 'weather',
234+
prompt: 'prompt.md',
235+
sessionDefaults: {
236+
simulatorId: 42,
237+
},
238+
},
239+
'weather.yml',
240+
),
241+
).toThrow('sessionDefaults.simulatorId must be a string or boolean');
242+
});
243+
216244
it('warns by default when tool sequences drift', () => {
217245
const config: BenchmarkConfig = {
218246
name: 'weather',

src/benchmarks/claude-ui/config.ts

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,12 @@ export function readConfig(raw: unknown, source: string): BenchmarkConfig {
185185
),
186186
};
187187

188-
if (isRecord(raw.sessionDefaults)) config.sessionDefaults = raw.sessionDefaults;
188+
if (raw.sessionDefaults !== undefined) {
189+
if (!isRecord(raw.sessionDefaults)) {
190+
throw new Error(`${source}.sessionDefaults: expected object`);
191+
}
192+
config.sessionDefaults = validateSessionDefaults(raw.sessionDefaults);
193+
}
189194
config.allowedVariance = readAllowedVariance(raw.allowedVariance, `${source}.allowedVariance`);
190195

191196
if (raw.baseline !== undefined) {

0 commit comments

Comments
 (0)