Releases: pytorch/test-infra
Releases · pytorch/test-infra
v20251002-162215
[autorevert] Inject synthetic PENDING events for pending workflows in…
v20251002-150750
[autorevert] Fix pacing query logic (#7274)
`Any` has an unexpected semantics in CH, it returns [first
value](https://clickhouse.com/docs/sql-reference/aggregate-functions/reference/any),
the correct way to check if any value is true is to use `countIf`.
The effect of this bug was that pacing was not working in some rare
cases when there are multiple events for commit and some were not
matching the condition.
Basically, when the first event goes out of the window, and second event
is added, we get two rows: 0 and 1, and depending on the random order
either would be returned by `any`.
The correct way (among many) would use `countIf` instead.
Testing:
```
SELECT
(countIf(failed = 0 AND ts > now() - toIntervalSecond(5200)) > 0) AS has_success_within_window,
any(failed = 0 AND ts > now() - toIntervalSecond(5200)) AS has_success_within_window_old
FROM misc.autorevert_events_v2
WHERE repo = 'pytorch/pytorch'
AND action = 'restart'
AND dry_run = 0
AND commit_sha = 'b5c4f46bb9ede8dc6adf11975c93b9f285d9ed67'
```
result:
```
"has_success_within_window","has_success_within_window_old"
"1","0"
```
more testing:
```
python -m pytorch_auto_revert --dry-run autorevert-checker Lint trunk
pull inductor rocm rocm-mi300 --hours 18 --hud-html
```
v20251001-182920
[autorevert] Add 'linux-aarch64' to default workflows (#7268) see the list of viable strict workflows: https://github.com/pytorch/pytorch/pull/164374/files testing: ``` HOURS=18 python -m pytorch_auto_revert --dry-run 2025-10-01 11:19:05,293 INFO [root] [v2] Start: workflows=Lint,trunk,pull,inductor,linux-aarch64 hours=18 repo=pytorch/pytorch restart_action=log revert_action=log notify_issue_number=163650 bisection=unlimited 2025-10-01 11:19:05,293 INFO [root] [v2] Run timestamp (CH log ts) = 2025-10-01T18:19:05.293306+00:00 2025-10-01 11:19:05,294 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Fetching commits in time range: repo=pytorch/pytorch lookback=18h 2025-10-01 11:19:06,055 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Commits fetched: 47 commits in 0.76s 2025-10-01 11:19:06,055 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Fetching jobs: repo=pytorch/pytorch workflows=Lint,trunk,pull,inductor,linux-aarch64 commits=47 lookback=18h 2025-10-01 11:20:14,477 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Jobs fetched: 7058 rows in 68.42s 2025-10-01 11:20:14,539 INFO [root] [v2] Extracted 1 signals 2025-10-01 11:20:14,539 INFO [root] [v2][signal] wf=inductor key=inductor-test / test outcome=Ineligible(reason=<IneligibleReason.FLAKY: 'flaky'>, message='signal is flaky (mixed outcomes on same commit)') 2025-10-01 11:20:14,539 INFO [root] [v2] Candidate action groups: 0 2025-10-01 11:20:14,539 INFO [root] [v2] Executed action groups: 0 2025-10-01 11:20:15,101 INFO [root] [v2] State logged ```
v20251001-181055
[autorevert] Implement autobisect functionality (#7238)
Testing on the periodic workflow (on top of
https://github.com/pytorch/test-infra/pull/7248):
```
python -m pytorch_auto_revert autorevert-checker periodic --hours 128 --bisection-limit 2 --hud-html
python -m pytorch_auto_revert --dry-run autorevert-checker periodic --hours 256 --bisection-limit 2 --hud-html
```
[2025-09-29T22-00-27.941916-00-00.html](https://github.com/user-attachments/files/22607006/2025-09-29T22-00-27.941916-00-00.html)
[2025-09-29T22-03-58.012711-00-00.html](https://github.com/user-attachments/files/22607013/2025-09-29T22-03-58.012711-00-00.html)
----
Algorithm:
- Goal: Cover the “unknown” span between failure and success partitions
by scheduling at most N new restarts, sampling widely via iterative
bisection.
- Intuition: Always split the largest unknown gap; choose its midpoint;
repeat until the budget is exhausted.
Inputs/Output
- Input covered: boolean list over the unknown region
- True = already covered/separator (e.g., pending), False = uncovered
candidate.
- Input limit: optional int; total target coverage for this run.
- Budget allowed = max(0, limit − sum(covered)); None = unlimited.
- Output: boolean list of equal length; True marks indices to newly
cover (schedule now).
Procedure
- If limit is None: return NOT covered (select all uncovered).
- Else:
- Build contiguous uncovered gaps (sequences of False) separated by True
entries.
- Push each gap into a max-heap keyed by (-length, lo, hi) using Gap(lo,
hi):
- length = hi − lo + 1
- heap_key = (-length, lo, hi) for deterministic tie-breaking.
- While allowed > 0 and heap not empty:
- Pop largest gap g; pick mid = floor((g.lo + g.hi)/2); select mid;
allowed -= 1.
- Push back sub-gaps [g.lo, mid-1] and [mid+1, g.hi] if non-empty.
- Return the selection mask.
Properties
- Deterministic ties (equal-length gaps) prefer lower lo.
- Already-covered (pending) entries both reduce the budget and split
gaps, pacing new work naturally.
- If limit ≤ current_covered → allowed = 0 → no new selections.
- Complexity: O(A log G), where A = number of picks (≤ allowed), G =
initial number of gaps.
Integration in signal processing
- PartitionedCommits.cover_gap_unknown_commits:
- Builds covered mask for the unknown partition: pending=True
(separator), missing=False (candidate).
- Calls the planner; maps selected indices back to commit SHAs to
restart.
- process_valid_autorevert_pattern(bisection_limit=...):
- Applies gap-cover selections, then independently applies
failure-/success-side restarts based on infra and threshold heuristics.
---------
Co-authored-by: Copilot <[email protected]>
v20251001-180704
[AUTOREVERT] Makefile targets pointing to canary (#7267) Setting the makefile targets to point to `pytorch/pytorch-canary` as an example.
v20251001-163637
[autorevert] add job & hud links to the autorevert message and debug…
v20250930-222836
[autorever] exclude unstable jobs (#7260)
v20250930-134331
[AUTOREVERT] [BUGFIX] fixing typo in variable name preventing revert …
v20250930-125800
[autorevert] correctly fetch and build the gaps in the signal (#7248)
1. Fixed commits-without-jobs issue
- Problem: Commits with no workflow jobs (e.g., periodic workflow) were
excluded from signal extraction
- Solution:
- Added fetch_commits_in_time_range() to query push table directly
- Modified job query to filter by explicit list of head_shas instead of
JOIN
- Changed ORDER BY to use sha dimension first (preserves grouping,
actual order doesn't matter as internally extractors now iterate over
the list of commits passed explicitly)
2. Added mandatory timestamp field to SignalCommit
- Changes:
- SignalCommit.__init__(head_sha, timestamp, events) - timestamp is now
mandatory
- Signal extraction populates timestamps from push table
- HUD state logger uses commit timestamp instead of computing from event
times
- Updated 36 test constructor calls
### Testing
Before:
[2025-09-29T19-29-47.670686-00-00.html](https://github.com/user-attachments/files/22606856/2025-09-29T19-29-47.670686-00-00.html)
After:
[2025-09-29T21-38-10.190584-00-00.html](https://github.com/user-attachments/files/22606859/2025-09-29T21-38-10.190584-00-00.html)
v20250929-230908
[autorever] fix indentation in `fetch_tests_for_job_ids` (#7250) Accidentally noticed another bug introduced by https://github.com/pytorch/test-infra/pull/7241 when testing locally on the large lookback windows: ``` python -m pytorch_auto_revert --dry-run autorevert-checker periodic --hours 256 --bisection-limit 2 --hud-html 2025-09-29 15:56:16,356 INFO [root] [v2] Start: workflows=periodic hours=256 repo=pytorch/pytorch restart_action=log revert_action=log notify_issue_number=163650 2025-09-29 15:56:16,356 INFO [root] [v2] Run timestamp (CH log ts) = 2025-09-29T22:56:16.356213+00:00 2025-09-29 15:56:16,356 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Fetching commits in time range: repo=pytorch/pytorch lookback=256h 2025-09-29 15:56:16,909 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Commits fetched: 419 commits in 0.55s 2025-09-29 15:56:16,909 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Fetching jobs: repo=pytorch/pytorch workflows=periodic commits=419 lookback=256h 2025-09-29 15:56:56,850 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Jobs fetched: 2848 rows in 39.94s 2025-09-29 15:56:56,859 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Fetching tests for 1077 job_ids (453 failed jobs) in batches 2025-09-29 15:56:56,859 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Test batch 1/2 (size=1024) 2025-09-29 15:56:56,859 INFO [pytorch_auto_revert.signal_extraction_datasource] existing rows: 0 2025-09-29 15:56:56,859 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Test batch 2/2 (size=53) 2025-09-29 15:56:56,859 INFO [pytorch_auto_revert.signal_extraction_datasource] existing rows: 0 2025-09-29 15:56:57,718 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Tests fetched: 265 rows for 1077 job_ids in 0.86s ``` notice, that no tests are read in the first batch! after this fix: ``` python -m pytorch_auto_revert --dry-run autorevert-checker periodic --hours 256 --hud-html 2025-09-29 16:03:06,896 INFO [root] [v2] Start: workflows=periodic hours=256 repo=pytorch/pytorch restart_action=log revert_action=log notify_issue_number=163650 2025-09-29 16:03:06,896 INFO [root] [v2] Run timestamp (CH log ts) = 2025-09-29T23:03:06.896595+00:00 2025-09-29 16:03:06,897 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Fetching jobs: repo=pytorch/pytorch workflows=periodic lookback=256h 2025-09-29 16:03:49,456 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Jobs fetched: 2887 rows in 42.56s 2025-09-29 16:03:49,466 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Fetching tests for 1113 job_ids (454 failed jobs) in batches 2025-09-29 16:03:49,466 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Test batch 1/2 (size=1024) 2025-09-29 16:03:51,753 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Test batch 2/2 (size=89) 2025-09-29 16:03:53,056 INFO [pytorch_auto_revert.signal_extraction_datasource] [extract] Tests fetched: 5002 rows for 1113 job_ids in 3.59s 2025-09-29 16:03:53,122 INFO [root] [v2] Extracted 144 signals ```