-
Notifications
You must be signed in to change notification settings - Fork 158
Report ASV information to the workflow summary page #2829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3e42dbe to
8b2d959
Compare
| # Results are stored in a dictionary; failed ones are null | ||
| for bench_name, result in data.get('results', {}).items(): | ||
| if result[0] is None: | ||
| failures.append(bench_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a benchmark regresses will it also be None? I.e. the most common usecase where a benchmark was running for 100 ms before and now it runs for 170ms but is still below the timeout.
Would be good to run a manual test to verify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field is the benchmark value (eg time to run, memory used). Its None if and only if the benchmark failed to execute. The regression reporting is done separately, using these files as input data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's some testing from a scratch project I created, the commit 8616e7 caused a big regression
{"commit_hash": "8616e7d7e7d60b18701e33c9340204cd8270e0de", "env_name": "virtualenv-py3.12", "date": 1767102219000, "params": {"arch": "x86_64", "cpu": "13th Gen Intel(R) Core(TM) i7-13700H", "machine": "alex-seaton-XPS-15-9530", "num_cpu": "20", "os": "Linux 6.5.0-27-generic", "ram": "65500168", "python": "3.12"}, "python": "3.12", "requirements": {}, "env_vars": {}, "result_columns": ["result", "params", "version", "started_at", "duration", "stats_ci_99_a", "stats_ci_99_b", "stats_q_25", "stats_q_75", "stats_number", "stats_repeat", "samples", "profile"], "results": {"benchmarks.TimeSuite.time_thing": [[2.0001261205034098], [], "aaaead7941820640c847bb804717c259f673d9431554c621753ad8f8966d93ce", 1767102241523, 24.034, [2.0001], [2.0002], [2.0001], [2.0002], [1], [10]]}, "durations": {}, "version": 2}
.asv/results/alex-seaton-XPS-15-9530/8616e7d7-virtualenv-py3.12.json (END)
| python -m asv run --show-stderr --bench $SUITE ${{ inputs.commit }}^! | ||
| continue-on-error: true # custom failure reporting below | ||
|
|
||
| - name: ASV Failure Report |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we run this "ASV Failure Report" after the Benchmark against master also? So we can notice we broke a test (timeout) e.g. in PR not only after merged to master.
I realise this introduces a new class of failures from my previous comment.
11a74c9 to
31cbf51
Compare
…tand what it is doing otherwise
| git config --global --add safe.directory . | ||
| python -m asv run -v --show-stderr --bench $SUITE ${{ inputs.commit }}^! | ||
| python -m asv run --show-stderr --durations all --bench $SUITE ${{ inputs.commit }}^! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My custom timing script seems to underestimate how long it takes to run some benchmarks (although the ordering looks correct, which is still useful). I've since discovered ASV's built in --durations all which will be good to compare against.
I've removed the -v as otherwise the logs are just too hard to comprehend.
Give a summary of any ASV benchmarks that fail to run (eg, they time out).
Test run: https://github.com/man-group/ArcticDB/actions/runs/20578479281
Gave example summary https://github.com/man-group/ArcticDB/actions/runs/20578479281 with a placeholder timing out benchmark 69a37af
Example runtime reporting, using
python build_tooling/transform_asv_results.py --mode= analyze --arcticdb_client_override="s3://s3.eu-west-1.amazonaws.com:arcticdb-ci-benchmark-results?aws_auth=true&path_prefix=asv_results" --hash=abaaa08b:Test runs after PR feedback,
With a "dummy" failure:
Without the "dummy" failure:
https://github.com/man-group/ArcticDB/actions/runs/20615918441