GPU-Aware Neighbor Collectives with Locality-Aware Aggregation

Locality-aware neighbor collectives create a separate buffer for each level of the aggregated message.  These are currently hardcoded to be allocated on the CPU with malloc here: https://github.com/mpi-advance/locality_aware/blob/bcf92c22b45be3342fb27eb66274e0c10d2ca48f/src/communicator/comm_data.c#L49

A few issues:
- These buffers should probably be in the request object, not within the comm_pkg
- The buffers should be allocated on the GPU when the original sendbuf/recvbuf are on the GPU.  

TODO: 
1. Dynamically check if sendbuf/recvbuf are on the device or host (https://github.com/mpi-advance/locality_aware/blob/bcf92c22b45be3342fb27eb66274e0c10d2ca48f/src/utils/utils.cpp#L138)
2. If sendbuf is on the device, call gpuMalloc for local_S buffers, local_L send buffer, and global send buffer
3. If recvbuf is on the device, call gpuMalloc for local_R buffers, local_L recv_buffer, and global recv buffer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPU-Aware Neighbor Collectives with Locality-Aware Aggregation #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPU-Aware Neighbor Collectives with Locality-Aware Aggregation #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions