Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
467 commits
Select commit Hold shift + click to select a range
efd7fd5
Consistently use c10_ovrsource in arvr mode everywhere (#164128)
ezyang Sep 29, 2025
c332d58
[testing] upload test stats: Add info to the invoking file summary an…
clee2000 Sep 29, 2025
50d418f
Replace setup.py bdist_wheel with python -m build --wheel (#156712)
zklaus Sep 29, 2025
84dc54a
Revert "Helper to augment graph with additional deps (#163959)"
pytorchmergebot Sep 29, 2025
b28e4f1
Revert "refactor bucketing (#163754)"
pytorchmergebot Sep 29, 2025
0f619c1
Revert "[inductor] do comm compute overlap at aten fx level (#163215)"
pytorchmergebot Sep 29, 2025
170e030
Bump protobuf from 5.29.4 to 5.29.5 in /.ci/docker (#156157)
dependabot[bot] Sep 29, 2025
d58f7c3
[Easy] Add pointwise tag to fma (#164149)
eellison Sep 29, 2025
704cd77
[PP] Customize pipeline's submod name (#164037)
kwen2501 Sep 28, 2025
cee4e36
[BE] remove manylinuxcxx11-abi-builder:cpu-cxx11-abi docker image (#1…
atalman Sep 30, 2025
da003d7
[3/N] Import Callable from collections.abc in torch/distributed (#164…
cyyever Sep 30, 2025
089f913
Install `fmtlib` headers. (#164139)
ysiraichi Sep 30, 2025
474d075
[dynamic shapes] unbacked-safe slicing (#161414)
pianpwk Sep 30, 2025
4cf2900
CUDACachingHostAllocatorImpl skip event query during capture (#164001)
jeffdaily Sep 30, 2025
3b4ad4a
[AARCH64][CD][CUDA13][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 (…
nWEIdia Sep 30, 2025
b7419b9
[ROCm][CI] Upgrade ROCm to 7.0 (#163140)
jeffdaily Sep 30, 2025
55840fb
[CMake] Fix `USE_FBGEMM_GENAI` option (#164165)
malfet Sep 29, 2025
0b0ed6f
[doc] Add AOTInductor intermediate debug printer OSS user manual (#16…
YUNQIUGUO Sep 30, 2025
ca19815
Revert "Enable outer reductions in fbcode (#163884)"
pytorchmergebot Sep 30, 2025
85012fe
Remove unnecessary list comprehensions (#164103)
cyyever Sep 30, 2025
9f27b0c
[CI] Push `viable/strict/${time}` tags (#164183)
malfet Sep 29, 2025
a293206
Fix invalid f-strings (#164112)
cyyever Sep 30, 2025
c39357b
[torchfuzz] Make scalar and tensor distribution configurable (#164034)
bobrenjc93 Sep 29, 2025
0d7994c
[inductor] do comm compute overlap at aten fx level (#163215)
eellison Sep 29, 2025
0b2fdc3
refactor bucketing (#163754)
eellison Sep 29, 2025
92108f4
Helper to augment graph with additional deps (#163959)
eellison Sep 29, 2025
7d59e37
Add Comm-Compute Preserving Bucketer (#163960)
eellison Sep 29, 2025
ace8935
better error handling for rrelu when lower or upper range is infinite…
vishalgoyal316 Sep 30, 2025
bbf6816
[dynamo] Special path for cloning of torch dispatch tensors (#164081)
anijain2305 Sep 29, 2025
7afcb03
Back out "Revert D81959389" (#163905)
yyetim Sep 30, 2025
5274753
[dynamo][device_mesh] Support mesh_dim_names (#164200)
anijain2305 Sep 30, 2025
6e5b424
[DTensor][Export] Supporting exporting a model with DTensor params/in…
SherlockNoMad Sep 30, 2025
7f4c3e7
distributed/serialization: support zero sized tensors (#164198)
d4l3k Sep 30, 2025
1310d6a
Add functions to setup PrivateUse1 as a python backend device. (#157859)
qihqi Sep 30, 2025
ace6c76
[inductor] Small refactor of CachingAutotuner (#162406)
kundaMwiza Sep 30, 2025
7f29c47
Fix cdist export compute mode validation (#161724)
ahkush Sep 30, 2025
77354e2
[OpenReg] Add AMP Integration guide for accelerators (#162050)
zeshengzong Sep 30, 2025
410ed30
Revert "Add functions to setup PrivateUse1 as a python backend device…
pytorchmergebot Sep 30, 2025
46ec066
Remove unused PyIntXXX, THPUtils_newReal_BOOL, THPQXXX macros (#164056)
cyyever Sep 30, 2025
71b4fad
Revert "Add less warps config to inner reductions (#162447)"
pytorchmergebot Sep 30, 2025
79fcfd4
Revert "[CI] Push `viable/strict/${time}` tags (#164183)"
pytorchmergebot Sep 30, 2025
0fb89b8
Revert "Consistently use c10_ovrsource in arvr mode everywhere (#1641…
pytorchmergebot Sep 30, 2025
edd9e07
[BE] Remove not existing mnist mirror (#164238)
atalman Sep 30, 2025
5c020be
Update LPPool docs to clarify ceil_mode padding semantics when ceil_m…
Jonahcb Sep 30, 2025
e88cca0
Update Sphinx theme (#164147)
svekars Sep 30, 2025
66abba8
[CUDA][Expandable Segments] Follow-up cleanups for even more expandab…
eqy Sep 30, 2025
96330f4
[testing] Add upload for test status during test stat uploads (#164189)
clee2000 Sep 30, 2025
1412a4a
[precompile] Add option to disable guard check on aot-compiled functi…
zhxchen17 Sep 30, 2025
3564cd2
Fix TestExportOpInfo (#164184)
SherlockNoMad Sep 29, 2025
7edd18f
[Inductor-FX] Generalize FloorDiv conversion to handle more complex l…
blaine-rister Sep 30, 2025
906fe7b
[ROCm][CI] no longer build almalinux image for ROCm 6.3 (#164201)
jeffdaily Sep 30, 2025
7d7ae4d
[submodule] upgrade cutlass version to 4.2.1 and completely resolved …
henrylhtsang Sep 30, 2025
9378696
Exporting aten.sdpa with cuda under fake mode on a cuda-less machine …
yiming0416 Sep 30, 2025
84e1cd7
[inductor] fx comm overlap: align runtime estimations across dist ran…
IvanKobzarev Sep 30, 2025
4b8fe79
[dynamo] format cpython_defs.c (#161838)
williamwen42 Sep 25, 2025
763ab2a
[dynamo, 3.14] compile actual code in C dynamo (#161555)
williamwen42 Sep 25, 2025
09c7741
[dynamo, 3.14] Python dynamo changes to get basic programs working (#…
williamwen42 Sep 25, 2025
7cbc011
[dynamo, 3.14] support some bytecodes, fix CALL_FUNCTION_EX (#163009)
williamwen42 Sep 26, 2025
1c9987f
[dynamo, 3.14] fix context managers (#163109)
williamwen42 Sep 26, 2025
44677ad
[dynamo, 3.14] support LOAD_CONST on slice, codegen LOAD_CONST slice …
williamwen42 Sep 26, 2025
008b0a9
[dynamo, 3.14] fix inactive ctx handling in resume functions (#163191)
williamwen42 Sep 26, 2025
9278b18
[dynamo, 3.14] fix WITH_EXCEPT_START (#163292)
williamwen42 Sep 26, 2025
d4b785a
[dynamo, 3.14] fix stack ref copy error (#163796)
williamwen42 Sep 26, 2025
4ead8eb
[dynamo, 3.14] fix BUILD_TUPLE with 0 args (#163818)
williamwen42 Sep 26, 2025
0657de9
[dynamo, 3.14] support LOAD_COMMON_CONSTANT (#163919)
williamwen42 Sep 26, 2025
9ce31e4
[3.14] make unbacked_sym[int/float]_counter integers (#163920)
williamwen42 Sep 26, 2025
2600f8b
[dynamo, 3.14] fix tracing typing.Union (#164004)
williamwen42 Sep 26, 2025
5ed4672
[dynamo, 3.14] fix _detect_and_normalize_assert_statement for 3.14 (#…
williamwen42 Sep 26, 2025
1cf1b91
[inductor][templates] Template hooks should be finalised inside a ker…
kundaMwiza Sep 30, 2025
719b64e
Fix TMA transpose logic to handle 1D shapes + string differences (#16…
njriasan Sep 30, 2025
d615f6b
[inductor] use hint_override in kernel benchmark args (#164207)
pianpwk Sep 30, 2025
a707042
fix: inductor non_blocking test - warmup events to make test pass whe…
v0i0 Sep 29, 2025
cc5d74c
Revert "[BE] Remove HermeticPyObjectTLS and Simplify PythonOpRegistra…
pytorchmergebot Sep 30, 2025
d2c5f23
Fix the shape check inside gnll loss (#147522)
KohakuBlueleaf Sep 30, 2025
60f0a35
Update persons of interest for XLA. The previous one is out of date. …
qihqi Sep 30, 2025
ffc645c
half support for fused_moving_avg_obs_fake_quant() op (#164175)
jeffdaily Sep 30, 2025
e30f01b
[1/N] Simplify "in" operation for containers of a single item (#164224)
cyyever Sep 30, 2025
5a93f00
[CI] Delete binary smoke workflows (#164260)
malfet Sep 30, 2025
1cce6ef
Fix silent incorrectness for bmm/baddmm out_dtype overload (#164095)
PaulZhang12 Sep 30, 2025
9e63139
Missing lambda in torch._check (#164225)
xadupre Sep 30, 2025
1ce9563
[FSDP][Replicate] tests replicate gradient accumulation and 1f1b micr…
anshul-si Sep 25, 2025
d3bdf8c
[FSDP][Replicate] tests replicate with custom forward method (#162851)
anshul-si Sep 25, 2025
01dd2c2
[FSDP][Replicate] tests replicate is composable with tp (#162853)
anshul-si Sep 25, 2025
99e28ff
[FSDP][Replicate] tests replicate core functionality with mixed preci…
anshul-si Sep 25, 2025
adc11a7
[export] avoid checks during tracing of export verification (#164219)
anijain2305 Sep 30, 2025
ae4fd4e
[FSDP2] support AC(FSDP) for torchtitan's MOE (#164009)
weifengpy Sep 29, 2025
2810977
[FSDP][Replicate] tests replicate type casting behavior and edge case…
anshul-si Sep 25, 2025
1f1de20
[c10d][BE][ez] Update tensor ptr inside nccl.cpp (#164276)
fduwjj Sep 30, 2025
bec6541
[CUDA][CUDAGraph] Reduce capture overhead in CUDA Graph memory reuse …
eee4017 Sep 30, 2025
60a4961
[DTensor] Allow redistribute to Partial if src matches (#164253)
SherlockNoMad Sep 30, 2025
ff71536
[vllm hash update] update the pinned vllm hash (#164190)
pytorchupdatebot Sep 30, 2025
7f3dc45
Migrate DeviceType to torch/headeronly (#163999)
janeyx99 Sep 30, 2025
ad7e3c9
[ROCm][CD] librocroller.so missing from ROCm 7 wheel (#164244)
jeffdaily Oct 1, 2025
c4bbc64
[PyTorch CCA] Add an API to get expandable segment sizes (#163771)
banitag1 Oct 1, 2025
28c1d2f
[aoti] AOTI mingw cross compilation (#163188)
yushangdi Oct 1, 2025
7a91199
Split scaled-mm tests into separate file (#164266)
slayton58 Sep 30, 2025
8df3f2f
Revert new-test part of #163829 (#164259)
slayton58 Sep 30, 2025
5b1c39f
Add smoke tests to verify that stable ABI FA3 wheel runs w/ newer tor…
janeyx99 Sep 30, 2025
abfcce5
[torchfuzz] remove erroneous can_produce check (#164209)
bobrenjc93 Sep 30, 2025
1f3995c
[torchfuzz] raise if Operator abstract method is not implemented (#16…
bobrenjc93 Sep 30, 2025
10a005e
[torchfuzz] add layout operators (#164210)
bobrenjc93 Sep 30, 2025
e0f1185
skip non memory deps in memory estimator (#164294)
eellison Sep 30, 2025
c66d18d
[dynamo][sac] Support functools partial context_fn for sac (#164308)
anijain2305 Sep 30, 2025
3787a5a
[export] Explicitly passing requires_grad to nn.Parameter() in deseri…
yiming0416 Oct 1, 2025
2a5ce2f
Add algorithm in header (#164295)
ahkush Oct 1, 2025
531f3bf
Adding check for square matrix for input tensor in matrix_exp backwar…
mansiag05 Oct 1, 2025
5919974
[BE][Easy]: Add prims common TypeGuard (#164263)
Skylion007 Oct 1, 2025
fa90090
Use dataclass features in two classes (#164221)
cyyever Oct 1, 2025
8bb71c0
Skip symmetric memory tests calling `_scaled_mm` on CCC < 8.9 (#164251)
Flamefire Oct 1, 2025
bd0907d
[BE][CI] Unify requirments (#163396)
malfet Oct 1, 2025
11ccb95
[PyTorch Pinned Allocator] Pinned memory stats and perf fixes around …
banitag1 Oct 1, 2025
6d4dfa0
[CI] Push `viable/strict/${time}` tags (#164183)
malfet Oct 1, 2025
9ddfc59
[BE] Delete stale non-ephemeral runners workarounds (#164285)
malfet Sep 30, 2025
96c3b9e
[dynamo] Use strings instead of modules for fqn info tracking (#164272)
anijain2305 Oct 1, 2025
cc8b14d
[2/N] Simplify "in" operation for containers of a single item (#164323)
cyyever Oct 1, 2025
590224f
Improve repeat op to a single copy (#163842)
haifeng-jin Oct 1, 2025
12d4cb0
Suppress `FutureWarning`s in `torch.distributed.algorithms.ddp_comm_h…
xuantengh Oct 1, 2025
eca6ac2
[BE][Easy] update CUDA and ROCm sources in nightly tool (#162324)
XuehaiPan Sep 30, 2025
17ab994
[Easy] Add notes for setting up dev venv with specific Python version…
XuehaiPan Sep 30, 2025
9fd53a2
Register MTIA kernel for all_all_out (#164293)
trirpi Oct 1, 2025
4dab208
Adds Issue#153109 as a test for CUDAPluggableAllocator (#163575)
syed-ahmed Oct 1, 2025
ed90040
Releases multicast object before releasing mapped buffers in CUDASymm…
syed-ahmed Oct 1, 2025
ac1bc51
[dynamo] do not pop from framelocals dict in Python 3.10 (#164316)
williamwen42 Sep 30, 2025
d9c80ef
Build and Install Arm Compute Library in manylinux docker image (#159…
robert-hardwick Sep 30, 2025
69fa26d
Triton 3.5.x pin update (#164268)
atalman Oct 1, 2025
70d1043
Fix non-TMA loads in grouped MM Triton kernel (#163895)
alexsamardzic Sep 30, 2025
e901866
Add a RECORD_FUNCTION for Python fallback so it shows in profile (#16…
ezyang Sep 29, 2025
31681bc
[PyTorch] Pull ARM's box-cox (#164152)
Nicoshev Oct 1, 2025
07d896f
Revert "CUDACachingHostAllocatorImpl skip event query during capture …
pytorchmergebot Oct 1, 2025
b103378
Use TMA loads always for Triton grouped MM kernel (#164256)
alexsamardzic Sep 30, 2025
2610746
Revert nccl upgrade back to 2.27.5 (#164352)
albanD Oct 1, 2025
36a37b8
Revert "[PP] Customize pipeline's submod name (#164037)"
pytorchmergebot Oct 1, 2025
59a86cb
Revert "[fx] Allow customization of submod name in split graph (#1640…
pytorchmergebot Oct 1, 2025
20edc5b
Revert "Add num_store to inductor_meta and use it to scale persistent…
pytorchmergebot Oct 1, 2025
5f868ca
[fx] Allow customization of submod name in split graph (#164035)
kwen2501 Sep 28, 2025
e419dc6
[PP] Customize pipeline's submod name (#164037)
kwen2501 Sep 28, 2025
f7ab8a2
[1/N] Fix ruff warnings (#164333)
cyyever Oct 1, 2025
80ed522
[export] support unbacked stack (#163867)
ColinPeppler Oct 1, 2025
1288c6d
Enable keep-going for trunk tags (#164307)
izaitsevfb Oct 1, 2025
3dab36b
[FSDP][Replicate] created ReplicateModule and changed replicate to us…
anshul-si Sep 30, 2025
69c5c08
Revert "[dynamo, 3.14] fix _detect_and_normalize_assert_statement for…
pytorchmergebot Oct 1, 2025
76ddbc2
Add option to FakeProcessGroup to raise error if comms are invoked. (…
ezyang Sep 29, 2025
ebd0707
[SymmMem] Add get_nbi the nonblocking version (#163540)
kwen2501 Sep 30, 2025
3ffaab3
[Replicate][Pipeline Parallelism] integration of new replicate functi…
anshul-si Sep 30, 2025
8dfc8ef
[export] Preserve nn_module_stack for aliased nn modules (#164311)
anijain2305 Oct 1, 2025
f63d16c
Make viable/strict updatable again (#164374)
malfet Oct 1, 2025
9357c31
[inductor] Fix constant shape for float constants (#164241)
isuruf Oct 1, 2025
8c590ca
[inductor] add a runtime assert for triton shapes (#164242)
isuruf Oct 1, 2025
315ffdc
[4/N] Apply ruff UP035 rule to python code (#164206)
cyyever Oct 1, 2025
7304b9e
[ROCm] fix carveout feature (#164303)
jeffdaily Oct 1, 2025
e5c0e6b
[testing] Better short job name during upload additional stats (#164287)
clee2000 Oct 1, 2025
7320f44
Skip windows unittest in fbcode (#164363)
yushangdi Oct 1, 2025
773c676
[CD][CUDA13][NCCL] Fix nccl version typo for cu13 (#164383)
nWEIdia Oct 1, 2025
b5c4f46
Add functions to setup PrivateUse1 as a python backend device. (#157859)
qihqi Oct 1, 2025
6eb8d96
Enable torch.nn.functional.batch_norm in test_export_opinfo (#164261)
yiming0416 Oct 1, 2025
9065364
Add xfailing test case for inplace mutation of local DTensor (#164355)
ezyang Oct 1, 2025
566ea4e
Work Around exposing statically linked libstdc++ CXX11 ABI strong sym…
atalman Oct 1, 2025
1a5d023
Add B200 to Operator Microbenchmark CI (#164288)
jainapurva Oct 1, 2025
ffda8e5
[inductor] log kernel autotuning result to a csv (#164191)
shunting314 Sep 30, 2025
a10207e
Revert "[DCP] Decrease checkpoint background process Gloo pg init tim…
pytorchmergebot Oct 2, 2025
723ba21
Speed up FP precision lookup (#164044)
lakshayg Oct 2, 2025
53860ef
Better error handling in torch/csrc/jit/codegen/* (#163948)
licy666 Oct 2, 2025
8b29c59
[CI][CUDA] Fix distributed tests for b200 (#164345)
Aidyn-A Oct 2, 2025
349e9e9
[cutass backend] remove cutlass presets (#164380)
henrylhtsang Oct 1, 2025
3e03dea
C++-accessible Placements via pybind11 (#163030)
swolchok Oct 1, 2025
5dbae1e
Fix unbacked replacement where LHS is purely backed expr and RHS is u…
ColinPeppler Oct 1, 2025
1443786
[torchfuzz] make fuzzer deterministic (#164397)
bobrenjc93 Oct 1, 2025
0fbe3f1
[torchfuzz] add matmuls (#164284)
bobrenjc93 Oct 1, 2025
39b31a6
[torchfuzz] keep track of operator stats (#164334)
bobrenjc93 Oct 1, 2025
702f6e7
[MTIA] Enable deserialization for FP8 checkpoint loading (#163559)
PatriceVignola Oct 2, 2025
14791ea
[inductor] teach bisector to look at pre_grad passes (#164250)
avikchaudhuri Oct 2, 2025
93e833d
[inductor] separate preamble from main work in compile_fx (#164169)
avikchaudhuri Oct 2, 2025
3924f78
unbacked reshape_copy (#164336)
laithsakka Oct 1, 2025
bcafea5
[vision hash update] update the pinned vision hash (#154694)
pytorchupdatebot Oct 2, 2025
a43c4c3
[5/N] Apply ruff UP035 rule (#164423)
cyyever Oct 2, 2025
27eb36d
DebugMode add ignore_compile_internals (#164205)
SherlockNoMad Oct 2, 2025
9697a7c
Better path handling for nightly setup tool (#164215)
XuehaiPan Sep 30, 2025
6bb586e
[PyTorch / Sigrid GPU] Fixes in pinned stats collection and add new O…
banitag1 Oct 2, 2025
00f0365
[torchfuzz] add test suite of fuzzer repros that we xfail (#164430)
bobrenjc93 Oct 2, 2025
2c2e126
[inductor] Handle patterns where input/output nodes are the same (#16…
angelayi Oct 2, 2025
0e5773b
[dynamo][export] Do not graph break on torch.autograd._profiler_enabl…
anijain2305 Oct 2, 2025
cfd46d1
Fix SAC + Flex issue (#164421)
drisspg Oct 2, 2025
39c340e
Add failing bitwise equivalence UT for aot_eager on rms_norm (#164280)
ezyang Oct 1, 2025
bac0f28
Add methods to access data and unpack_hook on SavedVariable (#164358)
soulitzer Oct 1, 2025
7cfecd7
Revert "Improve repeat op to a single copy (#163842)"
pytorchmergebot Oct 2, 2025
b098514
[DeviceMesh] Simplifying internal bookkeeping with CuTe layout (#163213)
fduwjj Oct 1, 2025
c632952
Revert "Add magic TORCH_MAKE_PYBIND_ENUM_FASTER macro (#163527)"
pytorchmergebot Oct 2, 2025
ac7b4e7
Stop parsing command line arguments every time common_utils is import…
AnthonyBarbier Oct 2, 2025
235b995
Make sure Windows CUDA 12.8 build follow same arches as Linux builds …
atalman Oct 2, 2025
3918959
Revert "Stop parsing command line arguments every time common_utils i…
pytorchmergebot Oct 2, 2025
f4cf756
Add CUDA release architecture matrix (#164471)
atalman Oct 2, 2025
0319556
Revert "[vision hash update] update the pinned vision hash (#154694)"
pytorchmergebot Oct 2, 2025
b9e73e6
Add provenance to inductor IR nodes created after graph.run (#164255)
yushangdi Oct 2, 2025
6bb021c
Revert "Use TMA loads always for Triton grouped MM kernel (#164256)"
pytorchmergebot Oct 2, 2025
e6d4b26
Update torch.rst (#164408)
parthava-adabala Oct 2, 2025
f6f7676
Revert "C++-accessible Placements via pybind11 (#163030)"
pytorchmergebot Oct 2, 2025
bf717ce
[AOTI win] Add ABI stable method for updating constant buffer (#163819)
yushangdi Oct 2, 2025
c6a6c80
Add Aidyn-A to CUDA codeowners (#164436)
Aidyn-A Oct 2, 2025
6a31f42
Fix NestedTensor max/min operations for integer dtypes. (#162273)
adabeyta Oct 2, 2025
4661200
[RELAND v2] Close some sources of fake tensors (#164372)
tugsbayasgalan Oct 2, 2025
22b1710
Use posix_fallocate() to reserve disk space for shared memory (#161910)
wenjianhn Oct 2, 2025
33b17bc
Remove old CUDA version checks (#164199)
cyyever Oct 2, 2025
c45d56d
typo corrected in ivalue.cpp's comment (#164485)
RajeshvShiyal Oct 2, 2025
8c54101
add tensor subclass printing support in fx/graph.py (#164403)
bobrenjc93 Oct 2, 2025
5f775bd
Fix THP_PyObject_VirtualFree return type (#163763)
guangyey Oct 2, 2025
115af42
Fix readibility checks in TIDY and apply them (#164475)
cyyever Oct 2, 2025
6b79701
[ROCm][CI] fix test_cudnn_convolution_relu_cuda (#164466)
jeffdaily Oct 2, 2025
5f18f24
Add initial suppressions for pyrefly (#164177)
maggiemoss Oct 2, 2025
2a7c486
Revert "Speed up FP precision lookup (#164044)"
pytorchmergebot Oct 2, 2025
cc71ab8
[DTensor] raise error if the local_tensor argument passed to DTensor.…
XilunWu Oct 2, 2025
6389658
Fix type hints in PrepareModuleInput and PrepareModuleInputOutput (#1…
RohitRathore1 Oct 2, 2025
a8edccf
[inductor] fix TestTemplateRender in select_algorithm (#164158)
isuruf Sep 30, 2025
f465ea6
[inductor] require shape in TritonCSEVariable (#162275)
isuruf Sep 30, 2025
a34797e
Revert "Add provenance to inductor IR nodes created after graph.run (…
pytorchmergebot Oct 2, 2025
ece5e0f
Fake process group Direct construction error (#163665)
ahkush Oct 2, 2025
bdc0a42
Stop parsing command line arguments every time common_utils is import…
AnthonyBarbier Oct 2, 2025
22e219d
Revert "[DeviceMesh] Simplifying internal bookkeeping with CuTe layou…
pytorchmergebot Oct 2, 2025
15c8bdc
Fix FloorDiv should not generate non integer rationals (due to sympy …
laithsakka Oct 2, 2025
43848b7
Improved support for autotuning in wrapper_fxir (#164132)
nandesuka Oct 2, 2025
dca7398
Support setting grad_dtype on leaf tensors (#162815)
soulitzer Oct 2, 2025
c7e30ae
MX: Remove redundant PLATFORM_SUPPORTS_MX_GEMM constant (#164320)
jagadish-amd Oct 2, 2025
95a0532
Fix vllm build issue (#164361)
yangw-dev Oct 2, 2025
f7082e9
[cuBLAS] update cuBLAS determinism docs, remove workspace requirement…
eqy Oct 3, 2025
18e1848
[6/N] Apply ruff UP035 rule (#164438)
cyyever Oct 3, 2025
86474ce
Update mask dtype (#164472)
eellison Oct 2, 2025
ef50c6e
[MPS] Add backward pass for `embedding_bag` (#163931)
kurtamohler Oct 2, 2025
4691fe6
remove unnecessary registration (#164481)
eellison Oct 2, 2025
91c4db7
fix flex attention eager: dont round down scores to low-precision (cl…
v0i0 Oct 2, 2025
d1cbb74
multimem reduce (#164517)
kwen2501 Oct 2, 2025
1051c1d
Add pyrefly suppressions 2/n (#164513)
maggiemoss Oct 3, 2025
6c209bf
[cutlass-4][take 2] upgrade to cutlass 4.2.1 (#164159)
henrylhtsang Oct 3, 2025
2a760dc
[DeviceMesh] Simplifying internal bookkeeping with CuTe layout (#163213)
fduwjj Oct 3, 2025
7617b11
[torchfuzz] Support EagerVsFullGraphDynamicCompileWithNumericsCheck (…
bobrenjc93 Oct 2, 2025
ddf8de2
Add Rocm to Operator Microbenchmark CI (#164173)
jainapurva Oct 3, 2025
eccf561
Move call to output generated code in inductor (#161615)
CWOA Oct 3, 2025
6c3c941
config for dcache + unit tests (#164512)
nmacchioni Oct 3, 2025
aed6624
[vllm hash update] update the pinned vllm hash (#164319)
pytorchupdatebot Oct 3, 2025
5743d73
Use torch.testing.test_close instead of torch.testing.test_allclose (…
cyyever Oct 3, 2025
5bb8f04
[torchfuzz] add nn functional ops (#164434)
bobrenjc93 Oct 2, 2025
3db2164
[torchfuzz] add norm operators (#164514)
bobrenjc93 Oct 3, 2025
e40fe63
Pin conda version for Docker builds (#164575)
atalman Oct 3, 2025
5656d45
forward fix #164481 (#164578)
jeffdaily Oct 3, 2025
fa5306b
Support partial _DynamoCacheEntries when not all backends available (…
jamesjwu Oct 2, 2025
3288fbf
Change default device to current acclerator (#164399)
drisspg Oct 2, 2025
2a11ce2
Support calling torch.compile inside non-strict export (#164171)
tugsbayasgalan Oct 2, 2025
5b0b4cd
[dtensor] avoid shape recompilations on DTensorSpec (#163820)
pianpwk Oct 3, 2025
3d9d41c
Remove old workaround in launch_logcumsumexp_cuda_kernel (#164567)
cyyever Oct 3, 2025
f39789c
[PyTorch Pinned Allocator] Add support of reserved pinned memory segm…
banitag1 Oct 3, 2025
319298e
Merge remote-tracking branch 'upstream/main' into rocm7.1_internal_te…
github-actions[bot] Oct 3, 2025
6782327
Fix merge conflicts
pragupta Oct 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 2 additions & 0 deletions .ci/aarch64_linux/aarch64_ci_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ fi
# Compress the fatbin with -compress-mode=size for CUDA 13
if [[ "$DESIRED_CUDA" == *"13"* ]]; then
export TORCH_NVCC_FLAGS="-compress-mode=size"
# Bundle ptxas into the cu13 wheel, see https://github.com/pytorch/pytorch/issues/163801
export BUILD_BUNDLE_PTXAS=1
fi

SCRIPTPATH="$( cd -- "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P )"
Expand Down
57 changes: 4 additions & 53 deletions .ci/aarch64_linux/aarch64_wheel_ci_build.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,49 +13,6 @@ def list_dir(path: str) -> list[str]:
return check_output(["ls", "-1", path]).decode().split("\n")


def build_ArmComputeLibrary() -> None:
"""
Using ArmComputeLibrary for aarch64 PyTorch
"""
print("Building Arm Compute Library")
acl_build_flags = [
"debug=0",
"neon=1",
"opencl=0",
"os=linux",
"openmp=1",
"cppthreads=0",
"arch=armv8a",
"multi_isa=1",
"fixed_format_kernels=1",
"build=native",
]
acl_install_dir = "/acl"
acl_checkout_dir = os.getenv("ACL_SOURCE_DIR", "ComputeLibrary")
if os.path.isdir(acl_install_dir):
shutil.rmtree(acl_install_dir)
if not os.path.isdir(acl_checkout_dir) or not len(os.listdir(acl_checkout_dir)):
check_call(
[
"git",
"clone",
"https://github.com/ARM-software/ComputeLibrary.git",
"-b",
"v25.02",
"--depth",
"1",
"--shallow-submodules",
]
)

check_call(
["scons", "Werror=1", f"-j{os.cpu_count()}"] + acl_build_flags,
cwd=acl_checkout_dir,
)
for d in ["arm_compute", "include", "utils", "support", "src", "build"]:
shutil.copytree(f"{acl_checkout_dir}/{d}", f"{acl_install_dir}/{d}")


def replace_tag(filename) -> None:
with open(filename) as f:
lines = f.readlines()
Expand Down Expand Up @@ -356,23 +313,17 @@ def parse_arguments():
build_vars += f"BUILD_TEST=0 PYTORCH_BUILD_VERSION={branch[1 : branch.find('-')]} PYTORCH_BUILD_NUMBER=1 "

if enable_mkldnn:
build_ArmComputeLibrary()
print("build pytorch with mkldnn+acl backend")
build_vars += (
"USE_MKLDNN=ON USE_MKLDNN_ACL=ON "
"ACL_ROOT_DIR=/acl "
"LD_LIBRARY_PATH=/pytorch/build/lib:/acl/build:$LD_LIBRARY_PATH "
"ACL_INCLUDE_DIR=/acl/build "
"ACL_LIBRARY=/acl/build "
)
build_vars += "USE_MKLDNN=ON USE_MKLDNN_ACL=ON "
build_vars += "ACL_ROOT_DIR=/acl "
if enable_cuda:
build_vars += "BLAS=NVPL "
else:
build_vars += "BLAS=OpenBLAS OpenBLAS_HOME=/OpenBLAS "
build_vars += "BLAS=OpenBLAS OpenBLAS_HOME=/opt/OpenBLAS "
else:
print("build pytorch without mkldnn backend")

os.system(f"cd /pytorch; {build_vars} python3 setup.py bdist_wheel")
os.system(f"cd /pytorch; {build_vars} python3 -m build --wheel --no-isolation")
if enable_cuda:
print("Updating Cuda Dependency")
filename = os.listdir("/pytorch/dist/")
Expand Down
60 changes: 15 additions & 45 deletions .ci/aarch64_linux/build_aarch64_wheel.py
Original file line number Diff line number Diff line change
Expand Up @@ -299,40 +299,6 @@ def install_condaforge_python(host: RemoteHost, python_version="3.8") -> None:
)


def build_OpenBLAS(host: RemoteHost, git_clone_flags: str = "") -> None:
print("Building OpenBLAS")
host.run_cmd(
f"git clone https://github.com/xianyi/OpenBLAS -b v0.3.28 {git_clone_flags}"
)
make_flags = "NUM_THREADS=64 USE_OPENMP=1 NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=ARMV8"
host.run_cmd(
f"pushd OpenBLAS && make {make_flags} -j8 && sudo make {make_flags} install && popd && rm -rf OpenBLAS"
)


def build_ArmComputeLibrary(host: RemoteHost, git_clone_flags: str = "") -> None:
print("Building Arm Compute Library")
acl_build_flags = " ".join(
[
"debug=0",
"neon=1",
"opencl=0",
"os=linux",
"openmp=1",
"cppthreads=0",
"arch=armv8a",
"multi_isa=1",
"fixed_format_kernels=1",
"build=native",
]
)
host.run_cmd(
f"git clone https://github.com/ARM-software/ComputeLibrary.git -b v25.02 {git_clone_flags}"
)

host.run_cmd(f"cd ComputeLibrary && scons Werror=1 -j8 {acl_build_flags}")


def embed_libgomp(host: RemoteHost, use_conda, wheel_name) -> None:
host.run_cmd("pip3 install auditwheel")
host.run_cmd(
Expand Down Expand Up @@ -442,7 +408,7 @@ def build_torchvision(
if host.using_docker():
build_vars += " CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000"

host.run_cmd(f"cd vision && {build_vars} python3 setup.py bdist_wheel")
host.run_cmd(f"cd vision && {build_vars} python3 -m build --wheel --no-isolation")
vision_wheel_name = host.list_dir("vision/dist")[0]
embed_libgomp(host, use_conda, os.path.join("vision", "dist", vision_wheel_name))

Expand Down Expand Up @@ -497,7 +463,7 @@ def build_torchdata(
if host.using_docker():
build_vars += " CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000"

host.run_cmd(f"cd data && {build_vars} python3 setup.py bdist_wheel")
host.run_cmd(f"cd data && {build_vars} python3 -m build --wheel --no-isolation")
wheel_name = host.list_dir("data/dist")[0]
embed_libgomp(host, use_conda, os.path.join("data", "dist", wheel_name))

Expand Down Expand Up @@ -553,7 +519,7 @@ def build_torchtext(
if host.using_docker():
build_vars += " CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000"

host.run_cmd(f"cd text && {build_vars} python3 setup.py bdist_wheel")
host.run_cmd(f"cd text && {build_vars} python3 -m build --wheel --no-isolation")
wheel_name = host.list_dir("text/dist")[0]
embed_libgomp(host, use_conda, os.path.join("text", "dist", wheel_name))

Expand Down Expand Up @@ -614,7 +580,7 @@ def build_torchaudio(
host.run_cmd(
f"cd audio && export FFMPEG_ROOT=$(pwd)/third_party/ffmpeg && export USE_FFMPEG=1 \
&& ./packaging/ffmpeg/build.sh \
&& {build_vars} python3 setup.py bdist_wheel"
&& {build_vars} python3 -m build --wheel --no-isolation"
)

wheel_name = host.list_dir("audio/dist")[0]
Expand Down Expand Up @@ -700,7 +666,6 @@ def start_build(
configure_system(
host, compiler=compiler, use_conda=use_conda, python_version=python_version
)
build_OpenBLAS(host, git_clone_flags)

if host.using_docker():
print("Move libgfortant.a into a standard location")
Expand All @@ -723,10 +688,12 @@ def start_build(
f"git clone --recurse-submodules -b {branch} https://github.com/pytorch/pytorch {git_clone_flags}"
)

host.run_cmd("pytorch/.ci/docker/common/install_openblas.sh")

print("Building PyTorch wheel")
build_opts = ""
if pytorch_build_number is not None:
build_opts += f" --build-number {pytorch_build_number}"
build_opts += f" -C--build-option=--build-number={pytorch_build_number}"
# Breakpad build fails on aarch64
build_vars = "USE_BREAKPAD=0 "
if branch == "nightly":
Expand All @@ -743,15 +710,18 @@ def start_build(
if host.using_docker():
build_vars += " CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000"
if enable_mkldnn:
build_ArmComputeLibrary(host, git_clone_flags)
host.run_cmd("pytorch/.ci/docker/common/install_acl.sh")
print("build pytorch with mkldnn+acl backend")
build_vars += " USE_MKLDNN=ON USE_MKLDNN_ACL=ON"
build_vars += " BLAS=OpenBLAS"
build_vars += " OpenBLAS_HOME=/opt/OpenBLAS"
build_vars += " ACL_ROOT_DIR=/acl"
host.run_cmd(
f"cd $HOME/pytorch && export ACL_ROOT_DIR=$HOME/ComputeLibrary && {build_vars} python3 setup.py bdist_wheel{build_opts}"
f"cd $HOME/pytorch && {build_vars} python3 -m build --wheel --no-isolation{build_opts}"
)
print("Repair the wheel")
pytorch_wheel_name = host.list_dir("pytorch/dist")[0]
ld_library_path = "$HOME/acl/build:$HOME/pytorch/build/lib"
ld_library_path = "/acl/build:$HOME/pytorch/build/lib"
host.run_cmd(
f"export LD_LIBRARY_PATH={ld_library_path} && auditwheel repair $HOME/pytorch/dist/{pytorch_wheel_name}"
)
Expand All @@ -763,7 +733,7 @@ def start_build(
else:
print("build pytorch without mkldnn backend")
host.run_cmd(
f"cd pytorch && {build_vars} python3 setup.py bdist_wheel{build_opts}"
f"cd pytorch && {build_vars} python3 -m build --wheel --no-isolation{build_opts}"
)

print("Deleting build folder")
Expand Down Expand Up @@ -907,7 +877,7 @@ def terminate_instances(instance_type: str) -> None:
def parse_arguments():
from argparse import ArgumentParser

parser = ArgumentParser("Builid and test AARCH64 wheels using EC2")
parser = ArgumentParser("Build and test AARCH64 wheels using EC2")
parser.add_argument("--key-name", type=str)
parser.add_argument("--debug", action="store_true")
parser.add_argument("--build-only", action="store_true")
Expand Down
3 changes: 2 additions & 1 deletion .ci/docker/almalinux/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@ RUN bash ./install_cuda.sh 13.0
ENV DESIRED_CUDA=13.0

FROM ${ROCM_IMAGE} as rocm
ENV PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
ARG PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
ADD ./common/install_mkl.sh install_mkl.sh
RUN bash ./install_mkl.sh && rm install_mkl.sh
ENV MKLROOT /opt/intel
Expand Down
6 changes: 6 additions & 0 deletions .ci/docker/almalinux/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,12 @@ case ${DOCKER_TAG_PREFIX} in
;;
rocm*)
BASE_TARGET=rocm
PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
# add gfx950 conditionally starting in ROCm 7.0
if [[ "$ROCM_VERSION" == *"7.0"* ]]; then
PYTORCH_ROCM_ARCH="${PYTORCH_ROCM_ARCH};gfx950"
fi
EXTRA_BUILD_ARGS="${EXTRA_BUILD_ARGS} --build-arg PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH}"
;;
*)
echo "ERROR: Unknown docker tag ${DOCKER_TAG_PREFIX}"
Expand Down
28 changes: 4 additions & 24 deletions .ci/docker/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,8 @@ fi
_UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152
_UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96
if [[ "$image" == *rocm* ]]; then
_UCX_COMMIT=cc312eaa4655c0cc5c2bcd796db938f90563bcf6
_UCC_COMMIT=0c0fc21559835044ab107199e334f7157d6a0d3d
_UCX_COMMIT=29831d319e6be55cb8c768ca61de335c934ca39e
_UCC_COMMIT=9f4b242cbbd8b1462cbc732eb29316cdfa124b77
fi

tag=$(echo $image | awk -F':' '{print $2}')
Expand Down Expand Up @@ -175,28 +175,17 @@ case "$tag" in
fi
GCC_VERSION=11
VISION=yes
ROCM_VERSION=6.4
ROCM_VERSION=7.0
NINJA_VERSION=1.9.0
TRITON=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
PYTORCH_ROCM_ARCH="gfx90a;gfx942;gfx950"
if [[ $tag =~ "benchmarks" ]]; then
INDUCTOR_BENCHMARKS=yes
fi
;;
pytorch-linux-noble-rocm-alpha-py3)
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=11
VISION=yes
ROCM_VERSION=7.0
NINJA_VERSION=1.9.0
TRITON=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
PYTORCH_ROCM_ARCH="gfx90a;gfx942;gfx950"
;;
pytorch-linux-jammy-xpu-n-1-py3)
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
Expand Down Expand Up @@ -456,12 +445,3 @@ elif [ "$HAS_TRITON" = "yes" ]; then
echo "expecting triton to not be installed, but it is"
exit 1
fi

# Sanity check cmake version. Executorch reinstalls cmake and I'm not sure if
# they support 4.0.0 yet, so exclude them from this check.
CMAKE_VERSION=$(drun cmake --version)
if [[ "$EXECUTORCH" != *yes* && "$CMAKE_VERSION" != *4.* ]]; then
echo "CMake version is not 4.0.0:"
drun cmake --version
exit 1
fi
2 changes: 1 addition & 1 deletion .ci/docker/ci_commit_pins/nccl-cu12.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
v2.27.5-1
v2.27.5-1
27 changes: 19 additions & 8 deletions .ci/docker/common/install_acl.sh
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,16 +1,27 @@
set -euo pipefail
#!/bin/bash
# Script used only in CD pipeline

readonly version=v25.02
readonly src_host=https://github.com/ARM-software
readonly src_repo=ComputeLibrary
set -eux

# Clone ACL
[[ ! -d ${src_repo} ]] && git clone ${src_host}/${src_repo}.git
cd ${src_repo}
ACL_VERSION=${ACL_VERSION:-"v25.02"}
ACL_INSTALL_DIR="/acl"

git checkout $version
# Clone ACL
git clone https://github.com/ARM-software/ComputeLibrary.git -b "${ACL_VERSION}" --depth 1 --shallow-submodules

ACL_CHECKOUT_DIR="ComputeLibrary"
# Build with scons
pushd $ACL_CHECKOUT_DIR
scons -j8 Werror=0 debug=0 neon=1 opencl=0 embed_kernels=0 \
os=linux arch=armv8a build=native multi_isa=1 \
fixed_format_kernels=1 openmp=1 cppthreads=0
popd

# Install ACL
sudo mkdir -p ${ACL_INSTALL_DIR}
for d in arm_compute include utils support src build
do
sudo cp -r ${ACL_CHECKOUT_DIR}/${d} ${ACL_INSTALL_DIR}/${d}
done

rm -rf $ACL_CHECKOUT_DIR
12 changes: 8 additions & 4 deletions .ci/docker/common/install_openblas.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,10 @@

set -ex

cd /
git clone https://github.com/OpenMathLib/OpenBLAS.git -b "${OPENBLAS_VERSION:-v0.3.30}" --depth 1 --shallow-submodules
OPENBLAS_VERSION=${OPENBLAS_VERSION:-"v0.3.30"}

# Clone OpenBLAS
git clone https://github.com/OpenMathLib/OpenBLAS.git -b "${OPENBLAS_VERSION}" --depth 1 --shallow-submodules

OPENBLAS_CHECKOUT_DIR="OpenBLAS"
OPENBLAS_BUILD_FLAGS="
Expand All @@ -17,5 +19,7 @@ CFLAGS=-O3
BUILD_BFLOAT16=1
"

make -j8 ${OPENBLAS_BUILD_FLAGS} -C ${OPENBLAS_CHECKOUT_DIR}
make -j8 ${OPENBLAS_BUILD_FLAGS} install -C ${OPENBLAS_CHECKOUT_DIR}
make -j8 ${OPENBLAS_BUILD_FLAGS} -C $OPENBLAS_CHECKOUT_DIR
sudo make install -C $OPENBLAS_CHECKOUT_DIR

rm -rf $OPENBLAS_CHECKOUT_DIR
6 changes: 0 additions & 6 deletions .ci/docker/common/install_rocm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,6 @@ EOF
rocm_baseurl="http://repo.radeon.com/rocm/apt/${ROCM_VERSION}"
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${ROCM_VERSION}/ubuntu"

# Special case for ROCM_VERSION == 7.0
if [[ $(ver "$ROCM_VERSION") -eq $(ver 7.0) ]]; then
rocm_baseurl="https://repo.radeon.com/rocm/apt/7.0_alpha2"
amdgpu_baseurl="https://repo.radeon.com/amdgpu/30.10_alpha2/ubuntu"
fi

# Add amdgpu repository
UBUNTU_VERSION_NAME=`cat /etc/os-release | grep UBUNTU_CODENAME | awk -F= '{print $2}'`
echo "deb [arch=amd64] ${amdgpu_baseurl} ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/amdgpu.list
Expand Down
4 changes: 2 additions & 2 deletions .ci/docker/common/install_rocm_magma.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ function do_install() {

rocm_version_nodot=${rocm_version//./}

# Version 2.7.2 + ROCm related updates
MAGMA_VERSION=a1625ff4d9bc362906bd01f805dbbe12612953f6
# https://github.com/icl-utk-edu/magma/pull/65
MAGMA_VERSION=d6e4117bc88e73f06d26c6c2e14f064e8fc3d1ec
magma_archive="magma-rocm${rocm_version_nodot}-${MAGMA_VERSION}-1.tar.bz2"

rocm_dir="/opt/rocm"
Expand Down
Loading