Skip to content

[Issue]: tune_gemm, rocm 7.0: Unable to run on gfx950 #888

@matthiasdiener

Description

@matthiasdiener

Problem Description

When trying to run tune_gemm on a gfx950 using Rocm 7.0, I get the following error:

# ./tune_gemm.py --gemm_size_file t.yaml
Tuning 1 gemm sizes starts at: 2025-10-06 20:43:27.298307
SIZE: 8192 8192 8192 NN nConfigs: 1152 multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/dockerx/triton/python/perf-kernels/tools/tune_gemm/./tune_gemm.py", line 184, in extract_kernel_time
    first_value = df['DurationNs'].iloc[0]
                  ~~~~~~~~~~~~~~~~~~~~~^^^
  File "/opt/venv/lib/python3.12/site-packages/pandas/core/indexing.py", line 1191, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/pandas/core/indexing.py", line 1752, in _getitem_axis
    self._validate_integer(key, axis)
  File "/opt/venv/lib/python3.12/site-packages/pandas/core/indexing.py", line 1685, in _validate_integer
    raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/dockerx/triton/python/perf-kernels/tools/tune_gemm/./tune_gemm.py", line 725, in <module>
    sys.exit(main())
             ^^^^^^
  File "/dockerx/triton/python/perf-kernels/tools/tune_gemm/./tune_gemm.py", line 655, in main
    minTime, bestConfig, compile_time, profile_time, post_time = tune_gemm_config(
                                                                 ^^^^^^^^^^^^^^^^^
  File "/dockerx/triton/python/perf-kernels/tools/tune_gemm/./tune_gemm.py", line 264, in tune_gemm_config
    config, myTime = task.get()
                     ^^^^^^^^^^
  File "/usr/lib/python3.12/multiprocessing/pool.py", line 774, in get
    raise self._value
IndexError: single positional indexer is out-of-bounds

It does run correctly with the following workaround applied:

diff --git a/python/perf-kernels/tools/tune_gemm/tune_gemm.py b/python/perf-kernels/tools/tune_gemm/tune_gemm.py
index 7d51575e4..4ae63f04e 100755
--- a/python/perf-kernels/tools/tune_gemm/tune_gemm.py
+++ b/python/perf-kernels/tools/tune_gemm/tune_gemm.py
@@ -64,7 +64,7 @@ def get_full_tuning_space():
     num_stage_range = [2]
     waves_per_eu_range = [0]
     matrix_instr_nonkdim_range = [16, 32]
-    kpack_range = [1, 2]
+    kpack_range = [1]
     schedule_hints = ["none"]

     space = itertools.product(block_mn_range, block_mn_range, block_k_range, num_warps_range, group_m_range,

Operating System

Ubuntu 24.04.3 LTS

CPU

AMD Ryzen Threadripper PRO 7985WX 64-Cores

GPU

AMD Instinct MI350X

ROCm Version

7.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions