Skip to content

Expose self-healing benchmark diagnostics#57

Closed
giaphutran12 wants to merge 1 commit into
codex/agent-browser-action-reportingfrom
codex/self-healing-benchmark-diagnostics
Closed

Expose self-healing benchmark diagnostics#57
giaphutran12 wants to merge 1 commit into
codex/agent-browser-action-reportingfrom
codex/self-healing-benchmark-diagnostics

Conversation

@giaphutran12
Copy link
Copy Markdown
Collaborator

Summary

Stacked on #56. Adds the narrow verifier surface the self-healing/Playwright handoff needs before any compiler or app-runtime migration:

  • collection self-healing benchmark output now includes compact diagnostics
  • diagnostics summarize self-healing action, artifact kinds, process-trace counts, browser-step counts, and Playwright candidate readiness
  • benchmark summary.json carries the same high-signal fields per lane result
  • added a pure helper for reading self-healing artifacts without committing raw run folders
  • docs now say Agent canaries must prove browser-action provenance through these fields, not only row/evidence quality

This does not generate Playwright scripts, infer browser actions, migrate Meteor's runtime, or make collection the default runtime.

Verification

  • node --test benchmarks/dataset-agent/run-benchmark.test.mjs
  • node --check benchmarks/dataset-agent/adapters/collection-self-healing-adapter.mjs
  • node --check benchmarks/dataset-agent/adapters/self-healing-output.mjs
  • node --check benchmarks/dataset-agent/run-benchmark.mjs
  • no-key collection adapter smoke with OPENROUTER_API_KEY/TINYFISH_API_KEY unset
  • npm --prefix backend test
  • npm --prefix backend run build
  • git diff --check
  • make verify-self-healing

@giaphutran12 giaphutran12 self-assigned this May 22, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 01c8ab40-2f67-408a-88ed-85f7ffb73b25

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/self-healing-benchmark-diagnostics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@giaphutran12
Copy link
Copy Markdown
Collaborator Author

Closing stale draft cleanup PR; superseded by later BigSet work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant