-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
Locality-aware neighbor collectives create a separate buffer for each level of the aggregated message. These are currently hardcoded to be allocated on the CPU with malloc here:
| data->buffer = (char*)malloc(data->size_msgs*data->datatype_size*sizeof(char)); |
A few issues:
- These buffers should probably be in the request object, not within the comm_pkg
- The buffers should be allocated on the GPU when the original sendbuf/recvbuf are on the GPU.
TODO:
- Dynamically check if sendbuf/recvbuf are on the device or host ()
locality_aware/src/utils/utils.cpp
Line 138 in bcf92c2
void get_mem_types(const void* sendbuf, const void* recvbuf, - If sendbuf is on the device, call gpuMalloc for local_S buffers, local_L send buffer, and global send buffer
- If recvbuf is on the device, call gpuMalloc for local_R buffers, local_L recv_buffer, and global recv buffer
Metadata
Metadata
Assignees
Labels
No labels