Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: rocblas_gemm_ex returns rocblas_status_internal_error in specific order #1529

Open
et16kr opened this issue Dec 16, 2024 · 3 comments
Assignees

Comments

@et16kr
Copy link

et16kr commented Dec 16, 2024

Describe the bug

  • When executed in a certain order, rocblas_gemm returns rocblas_status_internal_error.
  • Each works fine individually or in a different order.
  • There were other solution_ids occurring under similar conditions besides the provided reproducible example.

To Reproduce

  • install rocm :
wget https://repo.radeon.com/amdgpu-install/6.2/ubuntu/jammy/amdgpu-install_6.2.60200-1_all.deb
apt-get install -y ./amdgpu-install_6.2.60200-1_all.deb
amdgpu-install --usecase=rocm,rocmdev,rocmdevtools,rocmdevtools,opencl,openclsdk,hip,hiplibsdk,openmpsdk,mllib,mlsdk -y
  • rocblas version: 4.2.0.54f305c1-dirty (/opt/rocm/include/rocblas/internal/rocblas-version.h)
  • compile
hipcc -o example example.cpp -lrocblas
  • run
./example
  • reproducible code (example.cpp)
#include <iostream>
#include <hip/hip_runtime.h>
#include <rocblas/rocblas.h>

void run_gemm(rocblas_handle handle, ulong solution_id,
	      int B, int M, int N, int K, bool trans_a, bool trans_b,
              float alpha = 1.0f, float beta = 0.0f) {

  auto transA = trans_a ? rocblas_operation_transpose : rocblas_operation_none;
  auto transB = trans_b ? rocblas_operation_transpose : rocblas_operation_none;

  float *dA, *dB, *dC;
  hipMalloc(&dA, B * M * K * sizeof(float));
  hipMalloc(&dB, B * N * K * sizeof(float));
  hipMalloc(&dC, B * M * N * sizeof(float));

  auto dtype = rocblas_datatype_bf16_r;
  rocblas_gemm_algo algo = solution_id > 0 ? rocblas_gemm_algo_solution_index : rocblas_gemm_algo_standard;

  rocblas_int lda = trans_a ? K : M;
  rocblas_int ldb = trans_b ? N : K;
  rocblas_int ldc = M;

  rocblas_status status;

  if ( B > 1 ) {
    rocblas_stride strideA = K * M;
    rocblas_stride strideB = N * K;
    rocblas_stride strideC = M * N;
    status = rocblas_gemm_strided_batched_ex(handle,
					     transA,
					     transB,
					     M, N, K,
					     &alpha,
					     dA, dtype, lda, strideA,
					     dB, dtype, ldb, strideB,
					     &beta,
					     dC, dtype, ldc, strideC,
					     dC, dtype, ldc, strideC,
					     B,
					     rocblas_datatype_f32_r,
					     algo, solution_id, rocblas_gemm_flags_none);
  } else {
    status = rocblas_gemm_ex(handle,
			     transA,
			     transB,
			     M, N, K,
			     &alpha,
			     dA, dtype, lda,
			     dB, dtype, ldb,
			     &beta,
			     dC, dtype, ldc,
			     dC, dtype, ldc,
			     rocblas_datatype_f32_r,
			     algo, solution_id, rocblas_gemm_flags_none);
  }

  std::cout << "rocblas_gemm_ex solution_id: " << solution_id;
  if (status != rocblas_status_success) {
    std::cerr << " failed - " << rocblas_status_to_string(status) << std::endl;
  } else {
    std::cout << " succeeded" << std::endl;
  }

  hipFree(dA);
  hipFree(dB);
  hipFree(dC);
}

int main() {
  rocblas_handle handle;
  rocblas_create_handle(&handle);

  run_gemm(handle, 685, 4, 1024, 64, 1, false, false);
  run_gemm(handle, 621285612, 32, 128, 1024, 1024, false, false);
  run_gemm(handle, 621285596, 1, 36864, 4096, 4608, false, false);

  rocblas_destroy_handle(handle);

  return 0;
}
  • The success or failure depends on the order of gemm calls
  • success
$ ./example
rocblas_gemm_ex solution_id: 621285612 succeeded
rocblas_gemm_ex solution_id: 621285596 succeeded
rocblas_gemm_ex solution_id: 685 succeeded
  • failure
$ ./example
rocblas_gemm_ex solution_id: 685 succeeded
rocblas_gemm_ex solution_id: 621285612 failed - rocblas_status_internal_error
rocblas_gemm_ex solution_id: 621285596 failed - rocblas_status_internal_error

Expected behavior

  • It should succeed regardless of the gemm call order.
$ ./example
rocblas_gemm_ex solution_id: 685 succeeded
rocblas_gemm_ex solution_id: 621285612 succeeded
rocblas_gemm_ex solution_id: 621285596 succeeded

Log-files

There are no logs.

Environment

Hardware description
CPU AMD EPYC 9474F 48-Core Processor
GPU AMD Instinct MI300X
Software version
rocm-core 6.2.0.60200-66~22.04
rocblas 4.2.0.60200-66~22.04

Attach environment.txt
environment.txt

Additional context

  • The solution_id was chosen from a list obtained using functions like rocblas_gemm_ex_get_solutions(), rocblas_gemm_strided_batched_ex_get_solutions(), or rocblas_gemm_ex_get_solutions_by_type(), based on the shortest duration.
@ppanchad-amd
Copy link

Hi @et16kr. Internal ticket has been created to investigate your issue. Thanks!

@NaveenElumalaiAMD
Copy link
Contributor

Hi @et16kr, thanks for reporting the issue. Could you also provide the ROCBLAS_TENSILE_COMMIT_ID (below the ROCBLAS_VERSION_TWEAK) in /opt/rocm/include/rocblas/internal/rocblas-version.h?

@et16kr
Copy link
Author

et16kr commented Dec 20, 2024

Hi @et16kr, thanks for reporting the issue. Could you also provide the ROCBLAS_TENSILE_COMMIT_ID (below the ROCBLAS_VERSION_TWEAK) in /opt/rocm/include/rocblas/internal/rocblas-version.h?

  • tensile commit id
    • "54f305c18f0d509466557e106ac4b1d7e42c85a5" , "dbc2062dced66e4cbee8e0591d76e0a1588a4c70"
/opt/rocm/include/rocblas/internal$ cat rocblas-version.h | grep TENSILE
#define ROCBLAS_TENSILE_COMMIT_ID   "54f305c18f0d509466557e106ac4b1d7e42c85a5" , "dbc2062dced66e4cbee8e0591d76e0a1588a4c70"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants