Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuGetProcAddress not implement #3

Open
chaunceyjiang opened this issue Mar 5, 2024 · 5 comments
Open

cuGetProcAddress not implement #3

chaunceyjiang opened this issue Mar 5, 2024 · 5 comments

Comments

@chaunceyjiang
Copy link

chaunceyjiang commented Mar 5, 2024

cuGetProcAddress not implement

I encountered the aforementioned error, and I also tried to implement it myself, but was continuously unsuccessful.

If you have some spare time, could you help implement this function? Thank you very much.

server

// cuda.cc


void Cuda::DispatchCuGetProcAddress(CudaRequest* request, RenderResponse* response) {
  CL_ASSERT(request->param_count == 5);
  auto symbol = (const char*)CUDA_REQUEST_EXTEND(request);
  CL_LOG("params1 :%s", symbol);
  CL_LOG("params2 :%d", (int)request->params[2]);
  CL_LOG("params3 :%d", (cuuint64_t)request->params[3]);

  response->header.size = sizeof(void*) + sizeof(CUdriverProcAddressQueryResult);
  response->data.resize(response->header.size);

  void** fn = (void**)response->data.data();
  CUdriverProcAddressQueryResult* ret = (CUdriverProcAddressQueryResult*)FIELD_OFFSET(fn);

  response->header.result = cuGetProcAddress(symbol, bin, (int)request->params[2], (cuuint64_t)request->params[3], ret);
    CL_LOG("-----");
  CL_LOG("ret1: %d", fn);
  CL_LOG("ret1: %d", *fn);
  CL_LOG("ret2: %d", *ret);
}




---
void Cuda::Dispatch(WorkerItem* item) {
  auto request = (CudaRequest*)item->request.data.data();
  CL_LOG("call api=%s param_count=%d", GetCudaFunctionName(request->api_index), request->param_count);
  
  CL_ASSERT(request->version == version_);
  Render::Dispatch(item);

  switch (request->api_index) {
    case CUGETPROCADDRESS:
        DispatchCuGetProcAddress(request, &item->response);
        break;
...
...

client

// render.cpp

CUresult Render::PrepareRequest(RenderRequest* request) {
  auto cuda = (CudaRequest*)request->datas[0].data();

  switch (cuda->api_index) {
  case CUGETPROCADDRESS: {
      auto symbol = (char*)cuda->params[0];
      if (symbol) {
          std::string_view name_sv(symbol, strlen(symbol) + 1);
          request->header.size += (uint32_t)name_sv.size();
          request->datas.emplace_back(std::move(name_sv));
      }
      break;
  }


// -----

CUresult Render::HandleResponse(RenderRequest* request, RenderResponse* response) {
  auto cuda = (CudaRequest*)request->datas[0].data();
  auto result = (CUresult)response->header.result;

  if (result != CUDA_SUCCESS) {
    CL_ERROR("handle error for api=%s result=%d", GetCudaFunctionName(cuda->api_index), result);
    goto end;
  }

  switch (cuda->api_index) {
    case CUGETPROCADDRESS:
      auto ret = (uint64_t*)response->data.data();
      auto fn = (void*)ret[0];
      auto size = (CUdriverProcAddressQueryResult)ret[1];
      *(void**)cuda->params[1] = fn;
      if (cuda->params[4]) {
        *(CUdriverProcAddressQueryResult*)cuda->params[4] = size;
      }
     break;
@chaunceyjiang
Copy link
Author

My question is, the server returns to the client a pointer (fn**) pointing to a pointer (fn*). This pointer belongs to the server and cannot be used by the client.

@chaunceyjiang
Copy link
Author

https://developer.nvidia.com/blog/exploring-the-new-features-of-cuda-11-3/

CUDA 11.3 also introduces a new driver and runtime API to query memory addresses for driver API functions. Previously, there was no direct way to obtain function pointers to the CUDA driver symbols. To do so, you had to call into dlopen, dlsym, or GetProcAddress. This feature implements a new driver API, cuGetProcAddress, and the corresponding new runtime API cudaGetDriverEntryPoint.

@nooodles2023
Copy link
Collaborator

Sorry for my late reply.
If you want to implement cuGetProcAddress, do it on the local side, no need to pass it to server.
For example, when you hook cuGetProcAddress, analyse the params of it when it be called, and return the target function address of what you have implemented in clink.

@nooodles2023
Copy link
Collaborator

I have completed a new project related to remote CUDA. It is much faster than Clink. I patched the kernel function parameters for cuLaunchKernel and implemented a new protocol to transfer CUDA requests and responses.
It is set to launch soon. Please stay tuned for updates and announcements

@chaunceyjiang
Copy link
Author

I have completed a new project related to remote CUDA. It is much faster than Clink. I patched the kernel function parameters for cuLaunchKernel and implemented a new protocol to transfer CUDA requests and responses.

Amazing!! I will keep an eye on it. Also, are you going to make it open source? If possible, I would like to contribute as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants