cuGetProcAddress not implement #3

chaunceyjiang · 2024-03-05T03:45:40Z

cuGetProcAddress not implement

I encountered the aforementioned error, and I also tried to implement it myself, but was continuously unsuccessful.

If you have some spare time, could you help implement this function? Thank you very much.

server

// cuda.cc


void Cuda::DispatchCuGetProcAddress(CudaRequest* request, RenderResponse* response) {
  CL_ASSERT(request->param_count == 5);
  auto symbol = (const char*)CUDA_REQUEST_EXTEND(request);
  CL_LOG("params1 :%s", symbol);
  CL_LOG("params2 :%d", (int)request->params[2]);
  CL_LOG("params3 :%d", (cuuint64_t)request->params[3]);

  response->header.size = sizeof(void*) + sizeof(CUdriverProcAddressQueryResult);
  response->data.resize(response->header.size);

  void** fn = (void**)response->data.data();
  CUdriverProcAddressQueryResult* ret = (CUdriverProcAddressQueryResult*)FIELD_OFFSET(fn);

  response->header.result = cuGetProcAddress(symbol, bin, (int)request->params[2], (cuuint64_t)request->params[3], ret);
    CL_LOG("-----");
  CL_LOG("ret1: %d", fn);
  CL_LOG("ret1: %d", *fn);
  CL_LOG("ret2: %d", *ret);
}




---
void Cuda::Dispatch(WorkerItem* item) {
  auto request = (CudaRequest*)item->request.data.data();
  CL_LOG("call api=%s param_count=%d", GetCudaFunctionName(request->api_index), request->param_count);
  
  CL_ASSERT(request->version == version_);
  Render::Dispatch(item);

  switch (request->api_index) {
    case CUGETPROCADDRESS:
        DispatchCuGetProcAddress(request, &item->response);
        break;
...
...

client

// render.cpp

CUresult Render::PrepareRequest(RenderRequest* request) {
  auto cuda = (CudaRequest*)request->datas[0].data();

  switch (cuda->api_index) {
  case CUGETPROCADDRESS: {
      auto symbol = (char*)cuda->params[0];
      if (symbol) {
          std::string_view name_sv(symbol, strlen(symbol) + 1);
          request->header.size += (uint32_t)name_sv.size();
          request->datas.emplace_back(std::move(name_sv));
      }
      break;
  }


// -----

CUresult Render::HandleResponse(RenderRequest* request, RenderResponse* response) {
  auto cuda = (CudaRequest*)request->datas[0].data();
  auto result = (CUresult)response->header.result;

  if (result != CUDA_SUCCESS) {
    CL_ERROR("handle error for api=%s result=%d", GetCudaFunctionName(cuda->api_index), result);
    goto end;
  }

  switch (cuda->api_index) {
    case CUGETPROCADDRESS:
      auto ret = (uint64_t*)response->data.data();
      auto fn = (void*)ret[0];
      auto size = (CUdriverProcAddressQueryResult)ret[1];
      *(void**)cuda->params[1] = fn;
      if (cuda->params[4]) {
        *(CUdriverProcAddressQueryResult*)cuda->params[4] = size;
      }
     break;

chaunceyjiang · 2024-03-05T07:59:09Z

My question is, the server returns to the client a pointer (fn**) pointing to a pointer (fn*). This pointer belongs to the server and cannot be used by the client.

chaunceyjiang · 2024-03-05T10:53:17Z

https://developer.nvidia.com/blog/exploring-the-new-features-of-cuda-11-3/

CUDA 11.3 also introduces a new driver and runtime API to query memory addresses for driver API functions. Previously, there was no direct way to obtain function pointers to the CUDA driver symbols. To do so, you had to call into dlopen, dlsym, or GetProcAddress. This feature implements a new driver API, cuGetProcAddress, and the corresponding new runtime API cudaGetDriverEntryPoint.

nooodles2023 · 2024-05-29T10:02:21Z

Sorry for my late reply.
If you want to implement cuGetProcAddress, do it on the local side, no need to pass it to server.
For example, when you hook cuGetProcAddress, analyse the params of it when it be called, and return the target function address of what you have implemented in clink.

nooodles2023 · 2024-05-29T10:12:58Z

I have completed a new project related to remote CUDA. It is much faster than Clink. I patched the kernel function parameters for cuLaunchKernel and implemented a new protocol to transfer CUDA requests and responses.
It is set to launch soon. Please stay tuned for updates and announcements

chaunceyjiang · 2024-05-29T10:18:24Z

I have completed a new project related to remote CUDA. It is much faster than Clink. I patched the kernel function parameters for cuLaunchKernel and implemented a new protocol to transfer CUDA requests and responses.

Amazing!! I will keep an eye on it. Also, are you going to make it open source? If possible, I would like to contribute as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuGetProcAddress not implement #3

cuGetProcAddress not implement #3

chaunceyjiang commented Mar 5, 2024 •

edited

Loading

chaunceyjiang commented Mar 5, 2024

chaunceyjiang commented Mar 5, 2024

nooodles2023 commented May 29, 2024

nooodles2023 commented May 29, 2024

chaunceyjiang commented May 29, 2024

cuGetProcAddress not implement #3

cuGetProcAddress not implement #3

Comments

chaunceyjiang commented Mar 5, 2024 • edited Loading

chaunceyjiang commented Mar 5, 2024

chaunceyjiang commented Mar 5, 2024

nooodles2023 commented May 29, 2024

nooodles2023 commented May 29, 2024

chaunceyjiang commented May 29, 2024

chaunceyjiang commented Mar 5, 2024 •

edited

Loading