Skip to content

feat: support mlu mla pd pull with mooncake transfer.#1167

Open
phantomlei3 wants to merge 1 commit intojd-opensource:mainfrom
phantomlei3:feat/pd-support-mlu
Open

feat: support mlu mla pd pull with mooncake transfer.#1167
phantomlei3 wants to merge 1 commit intojd-opensource:mainfrom
phantomlei3:feat/pd-support-mlu

Conversation

@phantomlei3
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables Disaggregated Prefill/Decode (PD) support for the MLU backend by integrating and refactoring the Mooncake transfer engine. Key changes include the introduction of a descriptor-based memory registration system (using BufferDesc and MemInfo) and a new GetRegisteredMemory RPC to synchronize memory layouts between nodes, which is essential for supporting complex architectures like MLA. Additionally, the PR enhances validation logic in LLMEngine::pull_kv_blocks, automates device_ip resolution, and adds comprehensive unit tests for the transfer engine. Review feedback identifies several opportunities to align with the project's style guide, specifically regarding the use of emplace_back for vector operations and the annotation of constant arguments in function calls.

buffers.reserve(num);
for (size_t i = 0; i < num; i++) {
buffers.push_back(BufferEntry{(void*)addrs[i], lens[i]});
buffers.push_back(BufferEntry{addrs[i], lens[i]});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

According to the style guide (rule 291), emplace_back should be preferred over push_back to construct elements in-place and avoid unnecessary copies.

Suggested change
buffers.push_back(BufferEntry{addrs[i], lens[i]});
buffers.emplace_back(addrs[i], lens[i]);
References
  1. Rule 291 states to prefer emplace_back over push_back for in-place construction and to avoid copies. (link)

Comment on lines +533 to +536
entries.push_back(BufferEntry{
reinterpret_cast<void*>(buffer_desc.addr),
static_cast<size_t>(buffer_desc.len),
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Per style guide rule 291, please use emplace_back instead of push_back to construct elements in-place.

    entries.emplace_back(reinterpret_cast<void*>(buffer_desc.addr),
        static_cast<size_t>(buffer_desc.len));
References
  1. Rule 291 states to prefer emplace_back over push_back for in-place construction and to avoid copies. (link)

Comment on lines +1149 to +1157
TransferRequest entry;
entry.opcode = opcode;
entry.length = len;
entry.source = static_cast<void*>(local_base + local_bias);
entry.target_id = remote_handle;
entry.target_offset =
reinterpret_cast<uint64_t>(remote_base + remote_bias);
entry.advise_retry_cnt = 0;
entries->push_back(entry);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

According to the style guide (rule 291), emplace_back should be preferred over push_back to construct elements in-place and avoid unnecessary copies. You can construct the TransferRequest directly in the vector.

Suggested change
TransferRequest entry;
entry.opcode = opcode;
entry.length = len;
entry.source = static_cast<void*>(local_base + local_bias);
entry.target_id = remote_handle;
entry.target_offset =
reinterpret_cast<uint64_t>(remote_base + remote_bias);
entry.advise_retry_cnt = 0;
entries->push_back(entry);
TransferRequest& entry = entries->emplace_back();
entry.opcode = opcode;
entry.length = len;
entry.source = static_cast<void*>(local_base + local_bias);
entry.target_id = remote_handle;
entry.target_offset =
reinterpret_cast<uint64_t>(remote_base + remote_bias);
entry.advise_retry_cnt = 0;
References
  1. Rule 291 states to prefer emplace_back over push_back for in-place construction and to avoid copies. (link)

Comment on lines 1219 to 1226
TransferRequest entry;
entry.opcode = opcode;
entry.length = transfer_size;
entry.source = (void*)(local_base + src_offsets[i]);
entry.source = (void*)(local_base + local_offset);
entry.target_id = remote_handle;
entry.target_offset = (uint64_t)(remote_base + dst_offsets[i]);
entry.target_offset = (uint64_t)(remote_base + remote_offset);
entry.advise_retry_cnt = 0;
entries.push_back(entry);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

According to the style guide (rule 291), emplace_back should be preferred over push_back to construct elements in-place and avoid unnecessary copies. You can construct the TransferRequest directly in the vector.

Suggested change
TransferRequest entry;
entry.opcode = opcode;
entry.length = transfer_size;
entry.source = (void*)(local_base + src_offsets[i]);
entry.source = (void*)(local_base + local_offset);
entry.target_id = remote_handle;
entry.target_offset = (uint64_t)(remote_base + dst_offsets[i]);
entry.target_offset = (uint64_t)(remote_base + remote_offset);
entry.advise_retry_cnt = 0;
entries.push_back(entry);
TransferRequest& entry = entries.emplace_back();
entry.opcode = opcode;
entry.length = transfer_size;
entry.source = (void*)(local_base + local_offset);
entry.target_id = remote_handle;
entry.target_offset = (uint64_t)(remote_base + remote_offset);
entry.advise_retry_cnt = 0;
References
  1. Rule 291 states to prefer emplace_back over push_back for in-place construction and to avoid copies. (link)

Comment on lines +160 to +167
local_info.layers[0].key =
mk_desc(0, proto::BUFFER_KIND_KEY, 0x1000, 0x400, 0x40);
local_info.layers[0].index =
mk_desc(0, proto::BUFFER_KIND_INDEX, 0x3000, 0x80, 0x08);
remote_info.layers[0].key =
mk_desc(0, proto::BUFFER_KIND_KEY, 0x5000, 0x400, 0x40);
remote_info.layers[0].index =
mk_desc(0, proto::BUFFER_KIND_INDEX, 0x7000, 0x80, 0x08);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

According to the style guide (rule 260), constant arguments in function calls should be annotated with comments indicating the parameter name. This improves readability, especially for functions with multiple arguments of similar types. Please add annotations to the calls to mk_desc.

Suggested change
local_info.layers[0].key =
mk_desc(0, proto::BUFFER_KIND_KEY, 0x1000, 0x400, 0x40);
local_info.layers[0].index =
mk_desc(0, proto::BUFFER_KIND_INDEX, 0x3000, 0x80, 0x08);
remote_info.layers[0].key =
mk_desc(0, proto::BUFFER_KIND_KEY, 0x5000, 0x400, 0x40);
remote_info.layers[0].index =
mk_desc(0, proto::BUFFER_KIND_INDEX, 0x7000, 0x80, 0x08);
local_info.layers[0].key =
mk_desc(/*layer_id=*/0, /*kind=*/proto::BUFFER_KIND_KEY, /*addr=*/0x1000, /*len=*/0x400, /*bytes_per_block=*/0x40);
local_info.layers[0].index =
mk_desc(/*layer_id=*/0, /*kind=*/proto::BUFFER_KIND_INDEX, /*addr=*/0x3000, /*len=*/0x80, /*bytes_per_block=*/0x08);
remote_info.layers[0].key =
mk_desc(/*layer_id=*/0, /*kind=*/proto::BUFFER_KIND_KEY, /*addr=*/0x5000, /*len=*/0x400, /*bytes_per_block=*/0x40);
remote_info.layers[0].index =
mk_desc(/*layer_id=*/0, /*kind=*/proto::BUFFER_KIND_INDEX, /*addr=*/0x7000, /*len=*/0x80, /*bytes_per_block=*/0x08);
References
  1. Rule 260 requires annotating constant arguments in function calls with comments indicating the parameter name to improve readability. (link)

Comment on lines +332 to +343
local_info.layers[0].key =
mk_desc(0, proto::BUFFER_KIND_KEY, 0x1000, 0x400, 0x40);
local_info.layers[1].key =
mk_desc(1, proto::BUFFER_KIND_KEY, 0x2000, 0x400, 0x40);
local_info.layers[1].index =
mk_desc(1, proto::BUFFER_KIND_INDEX, 0x3000, 0x80, 0x08);
remote_info.layers[0].key =
mk_desc(0, proto::BUFFER_KIND_KEY, 0x5000, 0x400, 0x40);
remote_info.layers[1].key =
mk_desc(1, proto::BUFFER_KIND_KEY, 0x6000, 0x400, 0x40);
remote_info.layers[1].index =
mk_desc(1, proto::BUFFER_KIND_INDEX, 0x7000, 0x80, 0x08);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The calls to mk_desc are missing annotations for constant arguments, which is required by the style guide (rule 260) to improve code readability.

Suggested change
local_info.layers[0].key =
mk_desc(0, proto::BUFFER_KIND_KEY, 0x1000, 0x400, 0x40);
local_info.layers[1].key =
mk_desc(1, proto::BUFFER_KIND_KEY, 0x2000, 0x400, 0x40);
local_info.layers[1].index =
mk_desc(1, proto::BUFFER_KIND_INDEX, 0x3000, 0x80, 0x08);
remote_info.layers[0].key =
mk_desc(0, proto::BUFFER_KIND_KEY, 0x5000, 0x400, 0x40);
remote_info.layers[1].key =
mk_desc(1, proto::BUFFER_KIND_KEY, 0x6000, 0x400, 0x40);
remote_info.layers[1].index =
mk_desc(1, proto::BUFFER_KIND_INDEX, 0x7000, 0x80, 0x08);
local_info.layers[0].key =
mk_desc(/*layer_id=*/0, /*kind=*/proto::BUFFER_KIND_KEY, /*addr=*/0x1000, /*len=*/0x400, /*bytes_per_block=*/0x40);
local_info.layers[1].key =
mk_desc(/*layer_id=*/1, /*kind=*/proto::BUFFER_KIND_KEY, /*addr=*/0x2000, /*len=*/0x400, /*bytes_per_block=*/0x40);
local_info.layers[1].index =
mk_desc(/*layer_id=*/1, /*kind=*/proto::BUFFER_KIND_INDEX, /*addr=*/0x3000, /*len=*/0x80, /*bytes_per_block=*/0x08);
remote_info.layers[0].key =
mk_desc(/*layer_id=*/0, /*kind=*/proto::BUFFER_KIND_KEY, /*addr=*/0x5000, /*len=*/0x400, /*bytes_per_block=*/0x40);
remote_info.layers[1].key =
mk_desc(/*layer_id=*/1, /*kind=*/proto::BUFFER_KIND_KEY, /*addr=*/0x6000, /*len=*/0x400, /*bytes_per_block=*/0x40);
remote_info.layers[1].index =
mk_desc(/*layer_id=*/1, /*kind=*/proto::BUFFER_KIND_INDEX, /*addr=*/0x7000, /*len=*/0x80, /*bytes_per_block=*/0x08);
References
  1. Rule 260 requires annotating constant arguments in function calls with comments indicating the parameter name to improve readability. (link)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants