feat: support mlu mla pd pull with mooncake transfer.#1167
feat: support mlu mla pd pull with mooncake transfer.#1167phantomlei3 wants to merge 1 commit intojd-opensource:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request enables Disaggregated Prefill/Decode (PD) support for the MLU backend by integrating and refactoring the Mooncake transfer engine. Key changes include the introduction of a descriptor-based memory registration system (using BufferDesc and MemInfo) and a new GetRegisteredMemory RPC to synchronize memory layouts between nodes, which is essential for supporting complex architectures like MLA. Additionally, the PR enhances validation logic in LLMEngine::pull_kv_blocks, automates device_ip resolution, and adds comprehensive unit tests for the transfer engine. Review feedback identifies several opportunities to align with the project's style guide, specifically regarding the use of emplace_back for vector operations and the annotation of constant arguments in function calls.
| buffers.reserve(num); | ||
| for (size_t i = 0; i < num; i++) { | ||
| buffers.push_back(BufferEntry{(void*)addrs[i], lens[i]}); | ||
| buffers.push_back(BufferEntry{addrs[i], lens[i]}); |
There was a problem hiding this comment.
According to the style guide (rule 291), emplace_back should be preferred over push_back to construct elements in-place and avoid unnecessary copies.
| buffers.push_back(BufferEntry{addrs[i], lens[i]}); | |
| buffers.emplace_back(addrs[i], lens[i]); |
References
- Rule 291 states to prefer
emplace_backoverpush_backfor in-place construction and to avoid copies. (link)
| entries.push_back(BufferEntry{ | ||
| reinterpret_cast<void*>(buffer_desc.addr), | ||
| static_cast<size_t>(buffer_desc.len), | ||
| }); |
There was a problem hiding this comment.
Per style guide rule 291, please use emplace_back instead of push_back to construct elements in-place.
entries.emplace_back(reinterpret_cast<void*>(buffer_desc.addr),
static_cast<size_t>(buffer_desc.len));References
- Rule 291 states to prefer
emplace_backoverpush_backfor in-place construction and to avoid copies. (link)
| TransferRequest entry; | ||
| entry.opcode = opcode; | ||
| entry.length = len; | ||
| entry.source = static_cast<void*>(local_base + local_bias); | ||
| entry.target_id = remote_handle; | ||
| entry.target_offset = | ||
| reinterpret_cast<uint64_t>(remote_base + remote_bias); | ||
| entry.advise_retry_cnt = 0; | ||
| entries->push_back(entry); |
There was a problem hiding this comment.
According to the style guide (rule 291), emplace_back should be preferred over push_back to construct elements in-place and avoid unnecessary copies. You can construct the TransferRequest directly in the vector.
| TransferRequest entry; | |
| entry.opcode = opcode; | |
| entry.length = len; | |
| entry.source = static_cast<void*>(local_base + local_bias); | |
| entry.target_id = remote_handle; | |
| entry.target_offset = | |
| reinterpret_cast<uint64_t>(remote_base + remote_bias); | |
| entry.advise_retry_cnt = 0; | |
| entries->push_back(entry); | |
| TransferRequest& entry = entries->emplace_back(); | |
| entry.opcode = opcode; | |
| entry.length = len; | |
| entry.source = static_cast<void*>(local_base + local_bias); | |
| entry.target_id = remote_handle; | |
| entry.target_offset = | |
| reinterpret_cast<uint64_t>(remote_base + remote_bias); | |
| entry.advise_retry_cnt = 0; |
References
- Rule 291 states to prefer
emplace_backoverpush_backfor in-place construction and to avoid copies. (link)
| TransferRequest entry; | ||
| entry.opcode = opcode; | ||
| entry.length = transfer_size; | ||
| entry.source = (void*)(local_base + src_offsets[i]); | ||
| entry.source = (void*)(local_base + local_offset); | ||
| entry.target_id = remote_handle; | ||
| entry.target_offset = (uint64_t)(remote_base + dst_offsets[i]); | ||
| entry.target_offset = (uint64_t)(remote_base + remote_offset); | ||
| entry.advise_retry_cnt = 0; | ||
| entries.push_back(entry); |
There was a problem hiding this comment.
According to the style guide (rule 291), emplace_back should be preferred over push_back to construct elements in-place and avoid unnecessary copies. You can construct the TransferRequest directly in the vector.
| TransferRequest entry; | |
| entry.opcode = opcode; | |
| entry.length = transfer_size; | |
| entry.source = (void*)(local_base + src_offsets[i]); | |
| entry.source = (void*)(local_base + local_offset); | |
| entry.target_id = remote_handle; | |
| entry.target_offset = (uint64_t)(remote_base + dst_offsets[i]); | |
| entry.target_offset = (uint64_t)(remote_base + remote_offset); | |
| entry.advise_retry_cnt = 0; | |
| entries.push_back(entry); | |
| TransferRequest& entry = entries.emplace_back(); | |
| entry.opcode = opcode; | |
| entry.length = transfer_size; | |
| entry.source = (void*)(local_base + local_offset); | |
| entry.target_id = remote_handle; | |
| entry.target_offset = (uint64_t)(remote_base + remote_offset); | |
| entry.advise_retry_cnt = 0; |
References
- Rule 291 states to prefer
emplace_backoverpush_backfor in-place construction and to avoid copies. (link)
| local_info.layers[0].key = | ||
| mk_desc(0, proto::BUFFER_KIND_KEY, 0x1000, 0x400, 0x40); | ||
| local_info.layers[0].index = | ||
| mk_desc(0, proto::BUFFER_KIND_INDEX, 0x3000, 0x80, 0x08); | ||
| remote_info.layers[0].key = | ||
| mk_desc(0, proto::BUFFER_KIND_KEY, 0x5000, 0x400, 0x40); | ||
| remote_info.layers[0].index = | ||
| mk_desc(0, proto::BUFFER_KIND_INDEX, 0x7000, 0x80, 0x08); |
There was a problem hiding this comment.
According to the style guide (rule 260), constant arguments in function calls should be annotated with comments indicating the parameter name. This improves readability, especially for functions with multiple arguments of similar types. Please add annotations to the calls to mk_desc.
| local_info.layers[0].key = | |
| mk_desc(0, proto::BUFFER_KIND_KEY, 0x1000, 0x400, 0x40); | |
| local_info.layers[0].index = | |
| mk_desc(0, proto::BUFFER_KIND_INDEX, 0x3000, 0x80, 0x08); | |
| remote_info.layers[0].key = | |
| mk_desc(0, proto::BUFFER_KIND_KEY, 0x5000, 0x400, 0x40); | |
| remote_info.layers[0].index = | |
| mk_desc(0, proto::BUFFER_KIND_INDEX, 0x7000, 0x80, 0x08); | |
| local_info.layers[0].key = | |
| mk_desc(/*layer_id=*/0, /*kind=*/proto::BUFFER_KIND_KEY, /*addr=*/0x1000, /*len=*/0x400, /*bytes_per_block=*/0x40); | |
| local_info.layers[0].index = | |
| mk_desc(/*layer_id=*/0, /*kind=*/proto::BUFFER_KIND_INDEX, /*addr=*/0x3000, /*len=*/0x80, /*bytes_per_block=*/0x08); | |
| remote_info.layers[0].key = | |
| mk_desc(/*layer_id=*/0, /*kind=*/proto::BUFFER_KIND_KEY, /*addr=*/0x5000, /*len=*/0x400, /*bytes_per_block=*/0x40); | |
| remote_info.layers[0].index = | |
| mk_desc(/*layer_id=*/0, /*kind=*/proto::BUFFER_KIND_INDEX, /*addr=*/0x7000, /*len=*/0x80, /*bytes_per_block=*/0x08); |
References
- Rule 260 requires annotating constant arguments in function calls with comments indicating the parameter name to improve readability. (link)
| local_info.layers[0].key = | ||
| mk_desc(0, proto::BUFFER_KIND_KEY, 0x1000, 0x400, 0x40); | ||
| local_info.layers[1].key = | ||
| mk_desc(1, proto::BUFFER_KIND_KEY, 0x2000, 0x400, 0x40); | ||
| local_info.layers[1].index = | ||
| mk_desc(1, proto::BUFFER_KIND_INDEX, 0x3000, 0x80, 0x08); | ||
| remote_info.layers[0].key = | ||
| mk_desc(0, proto::BUFFER_KIND_KEY, 0x5000, 0x400, 0x40); | ||
| remote_info.layers[1].key = | ||
| mk_desc(1, proto::BUFFER_KIND_KEY, 0x6000, 0x400, 0x40); | ||
| remote_info.layers[1].index = | ||
| mk_desc(1, proto::BUFFER_KIND_INDEX, 0x7000, 0x80, 0x08); |
There was a problem hiding this comment.
The calls to mk_desc are missing annotations for constant arguments, which is required by the style guide (rule 260) to improve code readability.
| local_info.layers[0].key = | |
| mk_desc(0, proto::BUFFER_KIND_KEY, 0x1000, 0x400, 0x40); | |
| local_info.layers[1].key = | |
| mk_desc(1, proto::BUFFER_KIND_KEY, 0x2000, 0x400, 0x40); | |
| local_info.layers[1].index = | |
| mk_desc(1, proto::BUFFER_KIND_INDEX, 0x3000, 0x80, 0x08); | |
| remote_info.layers[0].key = | |
| mk_desc(0, proto::BUFFER_KIND_KEY, 0x5000, 0x400, 0x40); | |
| remote_info.layers[1].key = | |
| mk_desc(1, proto::BUFFER_KIND_KEY, 0x6000, 0x400, 0x40); | |
| remote_info.layers[1].index = | |
| mk_desc(1, proto::BUFFER_KIND_INDEX, 0x7000, 0x80, 0x08); | |
| local_info.layers[0].key = | |
| mk_desc(/*layer_id=*/0, /*kind=*/proto::BUFFER_KIND_KEY, /*addr=*/0x1000, /*len=*/0x400, /*bytes_per_block=*/0x40); | |
| local_info.layers[1].key = | |
| mk_desc(/*layer_id=*/1, /*kind=*/proto::BUFFER_KIND_KEY, /*addr=*/0x2000, /*len=*/0x400, /*bytes_per_block=*/0x40); | |
| local_info.layers[1].index = | |
| mk_desc(/*layer_id=*/1, /*kind=*/proto::BUFFER_KIND_INDEX, /*addr=*/0x3000, /*len=*/0x80, /*bytes_per_block=*/0x08); | |
| remote_info.layers[0].key = | |
| mk_desc(/*layer_id=*/0, /*kind=*/proto::BUFFER_KIND_KEY, /*addr=*/0x5000, /*len=*/0x400, /*bytes_per_block=*/0x40); | |
| remote_info.layers[1].key = | |
| mk_desc(/*layer_id=*/1, /*kind=*/proto::BUFFER_KIND_KEY, /*addr=*/0x6000, /*len=*/0x400, /*bytes_per_block=*/0x40); | |
| remote_info.layers[1].index = | |
| mk_desc(/*layer_id=*/1, /*kind=*/proto::BUFFER_KIND_INDEX, /*addr=*/0x7000, /*len=*/0x80, /*bytes_per_block=*/0x08); |
References
- Rule 260 requires annotating constant arguments in function calls with comments indicating the parameter name to improve readability. (link)
No description provided.