Skip to content

[MAINTENANCE] replace single-sample benchmark comparisons with repeated-sample + benchstat #7317

@coderabbitai

Description

@coderabbitai

Summary

The perf-regression workflow was disabled in PR #7307 because the current benchmark CI setup produces false regression alerts.

The root cause is that the workflow relies on single-sample, cached comparisons via github-action-benchmark. The alloc metric is highly sensitive to how many iterations the harness happens to run:

  • BenchmarkRunEnumeration/Multiproto showed 1,174,753 allocs/op at -benchtime=1x but only 1,047,282 allocs/op at -benchtime=10x
  • BenchmarkRunEnumeration/Default dropped from ~54.4M allocs/op (-benchtime=1x) to ~25.6M (-benchtime=10x)

A single cached baseline sample can easily be contaminated by setup/teardown work, leading to misleading threshold breaches (e.g. the 2.15x ratio alert triggered for Multiproto - allocs/op).

Proposed Approach

Replace the single-sample approach with a statistically sound methodology:

  1. Multiple samples per run — use -count=N (e.g., -count=6 or -count=10) so each CI run produces multiple data points per benchmark.
  2. benchstat for comparison — use golang.org/x/perf/cmd/benchstat to compare the distributions from the base branch vs. the PR branch. benchstat applies a statistical test (Welch's t-test by default) to determine whether a difference is significant, dramatically reducing false positives.
  3. Stabilize the benchmark environment — optionally pin CPU affinity / disable frequency scaling on the CI runner, or use a dedicated self-hosted runner for benchmarks to reduce noise.
  4. Configurable threshold — expose a p-value threshold (e.g., alpha=0.05) and a minimum effect-size threshold (e.g., delta >= 10%) so only statistically significant and practically meaningful regressions produce alerts.

Example Workflow Sketch

- name: Run benchmarks (PR branch)
  run: go test -run='^$' -bench=BenchmarkRunEnumeration -benchmem -count=6 ./... | tee bench-new.txt

- name: Run benchmarks (base branch)
  run: |
    git checkout ${{ github.base_ref }}
    go test -run='^$' -bench=BenchmarkRunEnumeration -benchmem -count=6 ./... | tee bench-old.txt

- name: Compare with benchstat
  run: |
    go install golang.org/x/perf/cmd/benchstat@latest
    benchstat bench-old.txt bench-new.txt

References

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions