Skip to content

feat: add v0.2.2 benchmark export and repeated-run summary tooling#2

Merged
yeezhouyi merged 3 commits into
mainfrom
feature/v0.2.2-runtime-characterization
Jun 3, 2026
Merged

feat: add v0.2.2 benchmark export and repeated-run summary tooling#2
yeezhouyi merged 3 commits into
mainfrom
feature/v0.2.2-runtime-characterization

Conversation

@yeezhouyi
Copy link
Copy Markdown
Owner

Summary

This PR adds the v0.2.2 benchmark tooling updates:

  • Added CSV export (--csv)
  • Added JSON export with per-run details and weighted cross-run summary (--json)
  • Added P95/P99/median/std timing metrics for both solve and cycle time
  • Added repeated-run summary output (--summary)
  • Added README benchmark reproduction instructions
  • Updated ROADMAP to mark completed v0.2.2 benchmark tooling tasks

Tracking

Refs #1

Native Ubuntu 24.04 benchmark validation remains open.

yeezhouyi and others added 3 commits June 3, 2026 21:46
- --csv flag: exports one row per run with all metrics
- --json flag: exports per-run details + weighted cross-run summary
- cycle_time P95/P99/median/std now computed and displayed
- EXPORT_KEYS defines consistent CSV column order

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- --summary prints cross-run weighted averages with std/min/max
- README section: how to reproduce benchmarks with --csv --json --summary

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@yeezhouyi yeezhouyi merged commit 25b0929 into main Jun 3, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant