Report ASV information to the workflow summary page #2829

poodlewars · 2025-12-29T16:30:00Z

Give a summary of any ASV benchmarks that fail to run (eg, they time out).

Test run: https://github.com/man-group/ArcticDB/actions/runs/20578479281

Gave example summary https://github.com/man-group/ArcticDB/actions/runs/20578479281 with a placeholder timing out benchmark 69a37af

Example runtime reporting, using python build_tooling/transform_asv_results.py --mode= analyze --arcticdb_client_override="s3://s3.eu-west-1.amazonaws.com:arcticdb-ci-benchmark-results?aws_auth=true&path_prefix=asv_results" --hash=abaaa08b:

### Time spent outside of benchmarks (excluding build)

| Step                                          |   Duration (s) |
|:----------------------------------------------|---------------:|
| <setup_cache version_chain:42>                |     585.851    |
| <setup_cache list_versions:44>                |     418.164    |
...
| <setup_cache real_list_operations:58>         |       3.15759  |
| <setup_cache finalize_staged_data:100>        |       0.980301 |

### Time spent in benchmarks

| test_name                                                                                    |   Duration (s) |
|:---------------------------------------------------------------------------------------------|---------------:|
| list_versions.ListVersions.time_list_versions                                                |     291.06     |
| real_finalize_staged_data.AWSFinalizeStagedData.time_finalize_staged_data                    |     272.44     |
| real_finalize_staged_data.AWSFinalizeStagedData.peakmem_finalize_staged_data                 |     270.84     |
| real_batch_functions.AWSBatchBasicFunctions.time_write_batch                                 |     221.73     |
...
| list_snapshots.SnaphotFunctions.peakmem_snapshots_no_metadata_list                           |       0.028348 |
| finalize_staged_data.FinalizeStagedDataWiderDataframeX3.peakmem_finalize_staged_data         |       0.010657 |
| finalize_staged_data.FinalizeStagedDataWiderDataframeX3.time_finalize_staged_data            |       0.008269 |

### Summary

* **Total time outside benchmarks (mins):** 28.18
* **Total time running benchmarks (mins):** 91.27

Test runs after PR feedback,

With a "dummy" failure:

Benchmark given commit https://github.com/man-group/ArcticDB/actions/runs/20601506805
Without the "dummy" failure:
https://github.com/man-group/ArcticDB/actions/runs/20615918441

.github/workflows/benchmark_commits.yml

IvoDD · 2025-12-30T07:39:20Z

build_tooling/summarize_asv_run.py

+        # Results are stored in a dictionary; failed ones are null
+        for bench_name, result in data.get('results', {}).items():
+            if result[0] is None:
+                failures.append(bench_name)


If a benchmark regresses will it also be None? I.e. the most common usecase where a benchmark was running for 100 ms before and now it runs for 170ms but is still below the timeout.

Would be good to run a manual test to verify this.

This field is the benchmark value (eg time to run, memory used). Its None if and only if the benchmark failed to execute. The regression reporting is done separately, using these files as input data.

Here's some testing from a scratch project I created, the commit 8616e7 caused a big regression

{"commit_hash": "8616e7d7e7d60b18701e33c9340204cd8270e0de", "env_name": "virtualenv-py3.12", "date": 1767102219000, "params": {"arch": "x86_64", "cpu": "13th Gen Intel(R) Core(TM) i7-13700H", "machine": "alex-seaton-XPS-15-9530", "num_cpu": "20", "os": "Linux 6.5.0-27-generic", "ram": "65500168", "python": "3.12"}, "python": "3.12", "requirements": {}, "env_vars": {}, "result_columns": ["result", "params", "version", "started_at", "duration", "stats_ci_99_a", "stats_ci_99_b", "stats_q_25", "stats_q_75", "stats_number", "stats_repeat", "samples", "profile"], "results": {"benchmarks.TimeSuite.time_thing": [[2.0001261205034098], [], "aaaead7941820640c847bb804717c259f673d9431554c621753ad8f8966d93ce", 1767102241523, 24.034, [2.0001], [2.0002], [2.0001], [2.0002], [1], [10]]}, "durations": {}, "version": 2} .asv/results/alex-seaton-XPS-15-9530/8616e7d7-virtualenv-py3.12.json (END)

IvoDD · 2025-12-30T07:45:09Z

.github/workflows/benchmark_commits.yml

+          python -m asv run --show-stderr --bench $SUITE ${{ inputs.commit }}^!
+        continue-on-error: true  # custom failure reporting below
+
+      - name: ASV Failure Report


Shouldn't we run this "ASV Failure Report" after the Benchmark against master also? So we can notice we broke a test (timeout) e.g. in PR not only after merged to master.

I realise this introduces a new class of failures from my previous comment.

…ing benchmarks

…tand what it is doing otherwise

poodlewars · 2025-12-31T09:15:11Z

.github/workflows/benchmark_commits.yml

          git config --global --add safe.directory .
-          python -m asv run -v --show-stderr --bench $SUITE ${{ inputs.commit }}^!
-        
+          python -m asv run --show-stderr --durations all --bench $SUITE ${{ inputs.commit }}^!


My custom timing script seems to underestimate how long it takes to run some benchmarks (although the ordering looks correct, which is still useful). I've since discovered ASV's built in --durations all which will be good to compare against.

I've removed the -v as otherwise the logs are just too hard to comprehend.

github-advanced-security bot found potential problems Dec 29, 2025

View reviewed changes

.github/workflows/benchmark_commits.yml Fixed Show fixed Hide fixed

poodlewars added patch Small change, should increase patch version no-release-notes This PR shouldn't be added to release notes. labels Dec 29, 2025

poodlewars force-pushed the aseaton/asv/failure-reporting branch from 3e42dbe to 8b2d959 Compare December 29, 2025 17:49

poodlewars marked this pull request as ready for review December 29, 2025 17:51

poodlewars changed the title ~~Aseaton/asv/failure reporting~~ Clearer reporting for ASV failures Dec 29, 2025

poodlewars changed the title ~~Clearer reporting for ASV failures~~ Report ASV information to the workflow summary page Dec 29, 2025

IvoDD requested changes Dec 30, 2025

View reviewed changes

IvoDD reviewed Dec 30, 2025

View reviewed changes

poodlewars requested a review from alexowens90 as a code owner December 30, 2025 14:45

poodlewars added 5 commits December 31, 2025 09:08

Add a summary step after the ASV run to explain what failed

63f3a5d

Also publish runtime report to GITHUB_STEP_SUMMARY

9ead8aa

Also run the ASV summary in the "Benchmark against master" step

ea3f9fe

Use ASV's built in --durations all features to show time spent runn…

73d0a32

…ing benchmarks

Remove -e option so that the error reporting works

31cbf51

poodlewars force-pushed the aseaton/asv/failure-reporting branch from 11a74c9 to 31cbf51 Compare December 31, 2025 09:09

Make the "asv continuous" run non-verbose as it is too hard to unders…

d557850

…tand what it is doing otherwise

poodlewars commented Dec 31, 2025

View reviewed changes

IvoDD approved these changes Jan 5, 2026

View reviewed changes

vasil-pashov approved these changes Jan 6, 2026

View reviewed changes

poodlewars merged commit 11b15c8 into master Jan 6, 2026
218 checks passed

poodlewars deleted the aseaton/asv/failure-reporting branch January 6, 2026 12:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Report ASV information to the workflow summary page #2829

Report ASV information to the workflow summary page #2829

Uh oh!

poodlewars commented Dec 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

IvoDD Dec 30, 2025

Uh oh!

poodlewars Dec 30, 2025

Uh oh!

poodlewars Dec 30, 2025

Uh oh!

IvoDD Dec 30, 2025 •

edited

Loading

Uh oh!

poodlewars Dec 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Report ASV information to the workflow summary page #2829

Report ASV information to the workflow summary page #2829

Uh oh!

Conversation

poodlewars commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

IvoDD Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

poodlewars Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

poodlewars Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

IvoDD Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

poodlewars Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

poodlewars commented Dec 29, 2025 •

edited

Loading

IvoDD Dec 30, 2025 •

edited

Loading

poodlewars Dec 31, 2025 •

edited

Loading