Expose self-healing benchmark diagnostics by giaphutran12 · Pull Request #57 · tinyfish-io/bigset

giaphutran12 · 2026-05-22T23:54:04Z

Summary

Stacked on #56. Adds the narrow verifier surface the self-healing/Playwright handoff needs before any compiler or app-runtime migration:

collection self-healing benchmark output now includes compact diagnostics
diagnostics summarize self-healing action, artifact kinds, process-trace counts, browser-step counts, and Playwright candidate readiness
benchmark summary.json carries the same high-signal fields per lane result
added a pure helper for reading self-healing artifacts without committing raw run folders
docs now say Agent canaries must prove browser-action provenance through these fields, not only row/evidence quality

This does not generate Playwright scripts, infer browser actions, migrate Meteor's runtime, or make collection the default runtime.

Verification

node --test benchmarks/dataset-agent/run-benchmark.test.mjs
node --check benchmarks/dataset-agent/adapters/collection-self-healing-adapter.mjs
node --check benchmarks/dataset-agent/adapters/self-healing-output.mjs
node --check benchmarks/dataset-agent/run-benchmark.mjs
no-key collection adapter smoke with OPENROUTER_API_KEY/TINYFISH_API_KEY unset
npm --prefix backend test
npm --prefix backend run build
git diff --check
make verify-self-healing

coderabbitai · 2026-05-22T23:54:11Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 01c8ab40-2f67-408a-88ed-85f7ffb73b25

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/self-healing-benchmark-diagnostics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

giaphutran12 · 2026-05-25T20:13:34Z

Closing stale draft cleanup PR; superseded by later BigSet work.

Expose self-healing benchmark diagnostics

c9f8438

giaphutran12 self-assigned this May 22, 2026

giaphutran12 mentioned this pull request May 23, 2026

Gate benchmark runs on Playwright readiness #58

Closed

giaphutran12 closed this May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose self-healing benchmark diagnostics#57

Expose self-healing benchmark diagnostics#57
giaphutran12 wants to merge 1 commit into
codex/agent-browser-action-reportingfrom
codex/self-healing-benchmark-diagnostics

giaphutran12 commented May 22, 2026

Uh oh!

coderabbitai Bot commented May 22, 2026

Review skipped

Uh oh!

giaphutran12 commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

giaphutran12 commented May 22, 2026

Summary

Verification

Uh oh!

coderabbitai Bot commented May 22, 2026

Review skipped

Uh oh!

giaphutran12 commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant