Releases · pytorch/test-infra

19 Sep 17:29

v20250919-172720

c6532cb

v20250919-172720

[ALI] adds AUTH_GH_ORG and AUTH_GH_REPO to set/restrict authenticatio…

Assets 13

18 Sep 22:33

github-actions

v20250918-223207-custom

f9474e3

v20250918-223207-custom

[AMI] Update pytest-rerunfailures to 10.3 (#7189)

And don't try to install it twice

As that's the one that has been used for testing for quite some time see
https://github.com/pytorch/test-infra/blob/be01a40157c36cd5a48391fdf44a7bc3ebd4c7e3/aws/ami/windows/scripts/Installers/Install-Pip-Dependencies.ps1#L16

And using 10.2 results in 
```
 INTERNALERROR> pluggy._manager.PluginValidationError: unknown hook 'pytest_configure_node'
```

Assets 13

17 Sep 23:31

github-actions

v20250917-232928

d8451ae

v20250917-232928

[autorevert] Breaks down dry_run logic for revert and restart (#7179)

*PLEASE NOTE*
this PR is intended to be on top of
https://github.com/pytorch/test-infra/pull/7179
maybe merge that one before reviewing this one to make reviewing easier

Breaking down the dry_run logic for revert and restart is required so we
can continue to safely work towards improving the autorevert safely.

Not adding the logic is not the best option, as I hope we'll be able to
run it locally a few times and iterate on that before publishing. And
rely on code commenting and other not so great approaches is not ideal.

---------

Co-authored-by: Ivan Zaitsev <[email protected]>

Assets 13

17 Sep 16:44

github-actions

v20250917-164214-custom

65d8de8

v20250917-164214-custom

[autorevert] Remove legacy entry point and fix bugs on ci entry (#7178)

Main entry point had some strange references to past autorevert code
that could not be reached.

And there is no point of using the old version of the code, so, to start
the cleanup I am fixing bugs on entrypoint and removing the references
to it.

Assets 13

17 Sep 16:15

github-actions

v20250917-161407

65d8de8

v20250917-161407

[autorevert] Remove legacy entry point and fix bugs on ci entry (#7178)

Main entry point had some strange references to past autorevert code
that could not be reached.

And there is no point of using the old version of the code, so, to start
the cleanup I am fixing bugs on entrypoint and removing the references
to it.

Assets 13

17 Sep 00:51

github-actions

v20250917-004931

03bf20c

v20250917-004931

[autorevert] implement actions layer and logging (#7169)

This pull request:
* introduces a final "Signal Actions" layer (responsible for executing
side effects of processed Signals, like restarts and reverts)
* changes the main entry point for the PyTorch auto-revert Lambda to use
the new signals-based autorevert flow by default.
* for observability, two CH tables are added: 
  * `autorevert_events_v2`
  * `autorevert_state`


See [the
spec](https://github.com/pytorch/test-infra/blob/ff2645443aafb0209d7f546302a5c09d8243cb31/aws/lambda/pytorch-auto-revert/SIGNAL_ACTIONS.md)
for more details.



### Testing

Tested locally (only restart & state logging):


```
HOURS=18 WORKFLOWS=Lint,trunk,pull,inductor python -m pytorch_auto_revert
INFO:root:[v2] Start: workflows=Lint,trunk,pull,inductor hours=18 repo=pytorch/pytorch dry_run=False
INFO:root:[v2] Run timestamp (CH log ts) = 2025-09-16T15:51:18.656175
INFO:pytorch_auto_revert.signal_extraction_datasource:[extract] Fetching jobs: repo=pytorch/pytorch workflows=Lint,trunk,pull,inductor lookback=18h
INFO:pytorch_auto_revert.signal_extraction_datasource:[extract] Jobs fetched: 6738 rows in 45.70s
INFO:pytorch_auto_revert.signal_extraction_datasource:[extract] Fetching tests for 414 job_ids (20 failed jobs) in batches
INFO:pytorch_auto_revert.signal_extraction_datasource:[extract] Test batch 1/1 (size=414)
INFO:pytorch_auto_revert.signal_extraction_datasource:[extract] Tests fetched: 231 rows for 414 job_ids in 1.95s
INFO:root:[v2] Extracted 19 signals
INFO:root:[v2][signal] wf=trunk key=inductor/test_cudagraph_trees.py::test_graph_partition outcome=Ineligible(reason=<IneligibleReason.FIXED: 'fixed'>, message='signal appears recovered at head')
INFO:root:[v2][signal] wf=trunk key=test_transformers.py::test_fused_sdp_priority_order_use_compile_False_cuda outcome=Ineligible(reason=<IneligibleReason.NO_SUCCESSES: 'no_successes'>, message='no successful commits present in window')
INFO:root:[v2][signal] wf=trunk key=export/test_hop.py::test_retrace_export_local_map_hop_simple_cuda_float32 outcome=Ineligible(reason=<IneligibleReason.NO_SUCCESSES: 'no_successes'>, message='no successful commits present in window')
INFO:root:[v2][signal] wf=trunk key=inductor/test_cudagraph_trees_expandable_segments.py::test_forward_backward_not_called_backend_inductor outcome=Ineligible(reason=<IneligibleReason.FIXED: 'fixed'>, message='signal appears recovered at head')
INFO:root:[v2][signal] wf=trunk key=export/test_hop.py::test_pre_dispatch_export_local_map_hop_simple_cuda_float32 outcome=Ineligible(reason=<IneligibleReason.NO_SUCCESSES: 'no_successes'>, message='no successful commits present in window')
INFO:root:[v2][signal] wf=trunk key=export/test_hop.py::test_serialize_export_local_map_hop_simple_cuda_float32 outcome=Ineligible(reason=<IneligibleReason.NO_SUCCESSES: 'no_successes'>, message='no successful commits present in window')
INFO:root:[v2][signal] wf=trunk key=export/test_hop.py::test_aot_export_local_map_hop_simple_cuda_float32 outcome=Ineligible(reason=<IneligibleReason.NO_SUCCESSES: 'no_successes'>, message='no successful commits present in window')
INFO:root:[v2][signal] wf=trunk key=inductor/test_cudagraph_trees_expandable_segments.py::test_graph_partition outcome=Ineligible(reason=<IneligibleReason.FIXED: 'fixed'>, message='signal appears recovered at head')
INFO:root:[v2][signal] wf=trunk key=inductor/test_cudagraph_trees.py::test_forward_backward_not_called_backend_inductor outcome=Ineligible(reason=<IneligibleReason.FIXED: 'fixed'>, message='signal appears recovered at head')
INFO:root:[v2][signal] wf=trunk key=distributed/tensor/debug/test_debug_mode.py::test_debug_mode_backward outcome=Ineligible(reason=<IneligibleReason.FIXED: 'fixed'>, message='signal appears recovered at head')
INFO:root:[v2][signal] wf=Lint key=lintrunner-noclang / linux-job outcome=Ineligible(reason=<IneligibleReason.FIXED: 'fixed'>, message='signal appears recovered at head')
INFO:root:[v2][signal] wf=pull key=linux-jammy-py3.10-clang12 / test outcome=Ineligible(reason=<IneligibleReason.FLAKY: 'flaky'>, message='signal is flaky (mixed outcomes on same commit)')
INFO:root:[v2][signal] wf=trunk key=win-vs2022-cpu-py3 / test outcome=Ineligible(reason=<IneligibleReason.FLAKY: 'flaky'>, message='signal is flaky (mixed outcomes on same commit)')
INFO:root:[v2][signal] wf=inductor key=unit-test / inductor-test / test outcome=Ineligible(reason=<IneligibleReason.FIXED: 'fixed'>, message='signal appears recovered at head')
INFO:root:[v2][signal] wf=trunk key=win-vs2022-cpu-py3 / build outcome=RestartCommits(commit_shas={'814338826e0b5cd065f8278c4b9487f13e16a5c7'})
INFO:root:[v2][signal] wf=inductor key=inductor-cpu-test / test outcome=Ineligible(reason=<IneligibleReason.FIXED: 'fixed'>, message='signal appears recovered at head')
INFO:root:[v2][signal] wf=trunk key=win-vs2022-cuda12.6-py3 / build outcome=RestartCommits(commit_shas={'814338826e0b5cd065f8278c4b9487f13e16a5c7'})
INFO:root:[v2][signal] wf=inductor key=unit-test / inductor-cpu-build / build outcome=Ineligible(reason=<IneligibleReason.FIXED: 'fixed'>, message='signal appears recovered at head')
INFO:root:[v2][signal] wf=pull key=linux-jammy-py3.13-clang12 / test outcome=Ineligible(reason=<IneligibleReason.FIXED: 'fixed'>, message='signal appears recovered at head')
INFO:root:[v2] Candidate action groups: 1
INFO:root:[v2][action] preparing to execute ActionGroup(type='restart', commit_sha='814338826e0b5cd065f8278c4b9487f13e16a5c7', workflow_target='trunk', sources=[SignalMetadata(workflow_name='trunk', key='win-vs2022-cpu-py3 / build'), SignalMetadata(workflow_name='trunk', key='win-vs2022-cuda12.6-py3 / build')])
INFO:root:[v2][action] restart: skipping pacing (delta_sec=-24852)
INFO:root:[v2] Executed action groups: 0
INFO:root:[v2] State logged
```

Assets 13

16 Sep 13:17

github-actions

v20250916-131543

1ab651d

v20250916-131543

[autorevert] filter out 'mem_leak_check' and 'rerun_disabled_tests' w…

Assets 13

16 Sep 13:16

github-actions

v20250916-131426

85948fd

v20250916-131426

[autorevert] non nullable dates & dedup (#7167)

**Signal event deduplication and timestamp handling:**

* Added a deduplication step in `SignalExtractor` to remove duplicate
signal events within commits, based on identical `(started_at,
wf_run_id)` pairs. This addresses issues with "rerun failed" jobs in
GitHub workflows that reuse the same underlying job (but reports them
with different job ids)

* For test-track signals, extract start_date from the specific job that
hosted the test (when available)

* Changed all job and signal timestamp fields (`started_at`,
`created_at`) to be non-optional and default

Assets 13

16 Sep 09:30

github-actions

v20250916-092913

9ae4838

v20250916-092913

Improve the time series api + add policy for regression (#7156)

For API:
- Add model filters during query
- Add format options table, and raw
- add API hook method for frontend

add listCommits api

For regression lambda
- Add regression policy for compilation latency (if new value> 1.05 x
baseline, consider as regression)
- Change the data format to match with the api

Assets 13

15 Sep 17:56

github-actions

v20250915-175459

3836ad9

v20250915-175459

fix makefile lint & typo (#7166)

Assets 13

Releases: pytorch/test-infra

v20250919-172720

Uh oh!

v20250918-223207-custom

Uh oh!

v20250917-232928

Uh oh!

v20250917-164214-custom

Uh oh!

v20250917-161407

Uh oh!

v20250917-004931

Uh oh!

v20250916-131543

Uh oh!

v20250916-131426

Uh oh!

v20250916-092913

Uh oh!

v20250915-175459

Uh oh!