Skip to content

GPU-Aware Neighbor Collectives with Locality-Aware Aggregation #32

@bienz2

Description

@bienz2

Locality-aware neighbor collectives create a separate buffer for each level of the aggregated message. These are currently hardcoded to be allocated on the CPU with malloc here:

data->buffer = (char*)malloc(data->size_msgs*data->datatype_size*sizeof(char));

A few issues:

  • These buffers should probably be in the request object, not within the comm_pkg
  • The buffers should be allocated on the GPU when the original sendbuf/recvbuf are on the GPU.

TODO:

  1. Dynamically check if sendbuf/recvbuf are on the device or host (
    void get_mem_types(const void* sendbuf, const void* recvbuf,
    )
  2. If sendbuf is on the device, call gpuMalloc for local_S buffers, local_L send buffer, and global send buffer
  3. If recvbuf is on the device, call gpuMalloc for local_R buffers, local_L recv_buffer, and global recv buffer

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions