Skip to content

Conversation

@poodlewars
Copy link
Collaborator

@poodlewars poodlewars commented Dec 29, 2025

Give a summary of any ASV benchmarks that fail to run (eg, they time out).

Test run: https://github.com/man-group/ArcticDB/actions/runs/20578479281

Gave example summary https://github.com/man-group/ArcticDB/actions/runs/20578479281 with a placeholder timing out benchmark 69a37af

Example runtime reporting, using python build_tooling/transform_asv_results.py --mode= analyze --arcticdb_client_override="s3://s3.eu-west-1.amazonaws.com:arcticdb-ci-benchmark-results?aws_auth=true&path_prefix=asv_results" --hash=abaaa08b:

### Time spent outside of benchmarks (excluding build)

| Step                                          |   Duration (s) |
|:----------------------------------------------|---------------:|
| <setup_cache version_chain:42>                |     585.851    |
| <setup_cache list_versions:44>                |     418.164    |
...
| <setup_cache real_list_operations:58>         |       3.15759  |
| <setup_cache finalize_staged_data:100>        |       0.980301 |

### Time spent in benchmarks

| test_name                                                                                    |   Duration (s) |
|:---------------------------------------------------------------------------------------------|---------------:|
| list_versions.ListVersions.time_list_versions                                                |     291.06     |
| real_finalize_staged_data.AWSFinalizeStagedData.time_finalize_staged_data                    |     272.44     |
| real_finalize_staged_data.AWSFinalizeStagedData.peakmem_finalize_staged_data                 |     270.84     |
| real_batch_functions.AWSBatchBasicFunctions.time_write_batch                                 |     221.73     |
...
| list_snapshots.SnaphotFunctions.peakmem_snapshots_no_metadata_list                           |       0.028348 |
| finalize_staged_data.FinalizeStagedDataWiderDataframeX3.peakmem_finalize_staged_data         |       0.010657 |
| finalize_staged_data.FinalizeStagedDataWiderDataframeX3.time_finalize_staged_data            |       0.008269 |

### Summary

* **Total time outside benchmarks (mins):** 28.18
* **Total time running benchmarks (mins):** 91.27

Test runs after PR feedback,

With a "dummy" failure:

@poodlewars poodlewars added patch Small change, should increase patch version no-release-notes This PR shouldn't be added to release notes. labels Dec 29, 2025
@poodlewars poodlewars force-pushed the aseaton/asv/failure-reporting branch from 3e42dbe to 8b2d959 Compare December 29, 2025 17:49
@poodlewars poodlewars marked this pull request as ready for review December 29, 2025 17:51
@poodlewars poodlewars changed the title Aseaton/asv/failure reporting Clearer reporting for ASV failures Dec 29, 2025
@poodlewars poodlewars changed the title Clearer reporting for ASV failures Report ASV information to the workflow summary page Dec 29, 2025
# Results are stored in a dictionary; failed ones are null
for bench_name, result in data.get('results', {}).items():
if result[0] is None:
failures.append(bench_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a benchmark regresses will it also be None? I.e. the most common usecase where a benchmark was running for 100 ms before and now it runs for 170ms but is still below the timeout.

Would be good to run a manual test to verify this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is the benchmark value (eg time to run, memory used). Its None if and only if the benchmark failed to execute. The regression reporting is done separately, using these files as input data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's some testing from a scratch project I created, the commit 8616e7 caused a big regression

{"commit_hash": "8616e7d7e7d60b18701e33c9340204cd8270e0de", "env_name": "virtualenv-py3.12", "date": 1767102219000, "params": {"arch": "x86_64", "cpu": "13th Gen Intel(R) Core(TM) i7-13700H", "machine": "alex-seaton-XPS-15-9530", "num_cpu": "20", "os": "Linux 6.5.0-27-generic", "ram": "65500168", "python": "3.12"}, "python": "3.12", "requirements": {}, "env_vars": {}, "result_columns": ["result", "params", "version", "started_at", "duration", "stats_ci_99_a", "stats_ci_99_b", "stats_q_25", "stats_q_75", "stats_number", "stats_repeat", "samples", "profile"], "results": {"benchmarks.TimeSuite.time_thing": [[2.0001261205034098], [], "aaaead7941820640c847bb804717c259f673d9431554c621753ad8f8966d93ce", 1767102241523, 24.034, [2.0001], [2.0002], [2.0001], [2.0002], [1], [10]]}, "durations": {}, "version": 2}
.asv/results/alex-seaton-XPS-15-9530/8616e7d7-virtualenv-py3.12.json (END)

python -m asv run --show-stderr --bench $SUITE ${{ inputs.commit }}^!
continue-on-error: true # custom failure reporting below

- name: ASV Failure Report
Copy link
Collaborator

@IvoDD IvoDD Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we run this "ASV Failure Report" after the Benchmark against master also? So we can notice we broke a test (timeout) e.g. in PR not only after merged to master.

I realise this introduces a new class of failures from my previous comment.

@poodlewars poodlewars force-pushed the aseaton/asv/failure-reporting branch from 11a74c9 to 31cbf51 Compare December 31, 2025 09:09
git config --global --add safe.directory .
python -m asv run -v --show-stderr --bench $SUITE ${{ inputs.commit }}^!
python -m asv run --show-stderr --durations all --bench $SUITE ${{ inputs.commit }}^!
Copy link
Collaborator Author

@poodlewars poodlewars Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My custom timing script seems to underestimate how long it takes to run some benchmarks (although the ordering looks correct, which is still useful). I've since discovered ASV's built in --durations all which will be good to compare against.

I've removed the -v as otherwise the logs are just too hard to comprehend.

@poodlewars poodlewars merged commit 11b15c8 into master Jan 6, 2026
218 checks passed
@poodlewars poodlewars deleted the aseaton/asv/failure-reporting branch January 6, 2026 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-release-notes This PR shouldn't be added to release notes. patch Small change, should increase patch version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants