Skip to content

Commit 5c2584a

Browse files
iupaikov-amdjeffdaily
authored andcommitted
[ROCm] Enable inductor GEMM lowering for gfx11 (pytorch#141687)
This check doesn't make sense for some of the AMD gpus since they have the right amount of CUs but multi_processor_count returns WGPs on RDNA while still performing adequately. A lot of tests fail on modern archs due to this check defaulting them to not using the GEMMs backend. Pull Request resolved: pytorch#141687 Approved by: https://github.com/pruthvistony, https://github.com/jeffdaily, https://github.com/malfet Co-authored-by: Jeff Daily <[email protected]>
1 parent 1f3d889 commit 5c2584a

File tree

1 file changed

+11
-1
lines changed

1 file changed

+11
-1
lines changed

torch/_inductor/utils.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1114,8 +1114,18 @@ def _new_line(self, line: str) -> DelayReplaceLine:
11141114

11151115
@functools.lru_cache(None)
11161116
def is_big_gpu(index) -> bool:
1117+
prop = torch.cuda.get_device_properties(index)
1118+
1119+
# SM logic is not relevant to ROCm gpus
1120+
# Arbitrarily skipping the older models
1121+
if torch.version.hip:
1122+
if prop.major < 9 or prop.major == 10:
1123+
log.warning("GPU arch does not support max_autotune_gemm mode usage")
1124+
return False
1125+
return True
1126+
11171127
min_sms = 68 # 3080
1118-
avail_sms = torch.cuda.get_device_properties(index).multi_processor_count
1128+
avail_sms = prop.multi_processor_count
11191129
if avail_sms < min_sms:
11201130
log.warning(
11211131
"Not enough SMs to use max_autotune_gemm mode",

0 commit comments

Comments
 (0)