feat: decode entrance support mtp #245

SJTUGavinLiu · 2025-10-16T08:49:10Z

feat: decode entrance support mtp

Copilot

Pull Request Overview

This PR adds MTP (Multi-Token Prediction) support to the RPC decode entrance by implementing speculative execution and propose token handling. The changes enable the decode service to receive and process propose tokens from prefill operations and configure MTP-specific parameters.

Implements MTP support in decode entrance with propose token handling
Adds MTP cache mapping logic for multi-model scenarios
Configures speculative execution based on MTP availability

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
model_rpc_service.proto	Updated protobuf schema to support propose tokens and simplified field numbering
PrefillRpcServerNew.cc	Added MTP propose token extraction and cache mapping logic for multi-model support
PrefillGenerateContextNew.cc	Enabled MTP-based speculative execution configuration
DecodeRpcServerNew.cc	Implemented propose token handling and context position ID setup for decode operations

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-16T08:50:13Z

rtp_llm/cpp/model_rpc/PrefillRpcServerNew.cc

+            prefill_context.response->mutable_propose_token_ids()->CopyFrom(
+                {propose_tokens.begin(), propose_tokens.end()});


The CopyFrom operation with initializer list creates unnecessary temporary objects. Use RepeatedField::Add() in a loop or assign directly to avoid the temporary vector creation.

Copilot · 2025-10-16T08:50:13Z

rtp_llm/cpp/model_rpc/PrefillRpcServerNew.cc

+            prefill_context.response->mutable_context_position_ids()->CopyFrom(
+                {context_position_ids->data<int32_t>(),
+                 context_position_ids->data<int32_t>() + context_position_ids->size()});


Similar to the propose_token_ids, this CopyFrom with iterator range creates a temporary vector. Consider using RepeatedField methods for better performance.

Suggested change

prefill_context.response->mutable_context_position_ids()->CopyFrom(

{context_position_ids->data<int32_t>(),

context_position_ids->data<int32_t>() + context_position_ids->size()});

prefill_context.response->mutable_context_position_ids()->Add(

context_position_ids->data<int32_t>(), context_position_ids->size());

Copilot · 2025-10-16T08:50:14Z

rtp_llm/cpp/model_rpc/DecodeRpcServerNew.cc

+            std::vector<int> propose_tokens;
+            propose_tokens.assign(response.propose_token_ids().begin(), response.propose_token_ids().end());
+            generate_stream->setProposeToken(propose_tokens);
+            RTP_LLM_LOG_DEBUG("request [%s] received %d propose tokens from prefill", 
+                              decode_context.request_key.c_str(), 
+                              propose_tokens.size());


Creating a temporary vector and copying data is inefficient. If setProposeToken accepts a RepeatedField or can be modified to accept iterators, pass the protobuf data directly to avoid the copy.

Suggested change

std::vector<int> propose_tokens;

propose_tokens.assign(response.propose_token_ids().begin(), response.propose_token_ids().end());

generate_stream->setProposeToken(propose_tokens);

RTP_LLM_LOG_DEBUG("request [%s] received %d propose tokens from prefill",

decode_context.request_key.c_str(),

propose_tokens.size());

generate_stream->setProposeToken(response.propose_token_ids().begin(), response.propose_token_ids().end());

RTP_LLM_LOG_DEBUG("request [%s] received %d propose tokens from prefill",

decode_context.request_key.c_str(),

response.propose_token_ids_size());

rtp_llm/cpp/model_rpc/proto/model_rpc_service.proto

rtp_llm/cpp/model_rpc/DecodeRpcServerNew.cc

zhangchicc · 2025-10-16T09:23:23Z

rtp_llm/cpp/model_rpc/PrefillRpcServerNew.cc

            response_output, &(result.value()), maga_init_params_.gpt_init_parameter.misc_config.aux_string);
        // should only generate one token
-        break;
+        if (stream->queryPdSep() && stream->waitForRemoteGenerate()) {


why will decode entrance prefill call waitForRemoteGenerate

SpeculativeEngine::prefillMtpStep will setNeedRemoteGenerate(true) after filling stream's propose tokens. if prefill instance not waitForRemoteGenerate here, stream's propose tokens might be empty.

zhangchicc · 2025-10-16T09:26:11Z

rtp_llm/cpp/model_rpc/PrefillRpcServerNew.cc

+        }
+
+        if (stream->getContextPositionIds()) {
+            auto context_position_ids = stream->getContextPositionIds();


context_position not exist in proto

already delete it

Copilot

Pull Request Overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-17T02:14:17Z

rtp_llm/cpp/model_rpc/PrefillRpcServerNew.cc

+        if (stream->queryPdSep() && stream->waitForRemoteGenerate()) {
+            break;
+        }


The closing brace on line 250 appears to be orphaned. This suggests a control flow structure may have been modified incorrectly, potentially breaking the loop logic.

Copilot · 2025-10-17T02:14:17Z

rtp_llm/cpp/model_rpc/PrefillRpcServerNew.cc

+        if (stream->getContextPositionIds()) {
+            auto context_position_ids = stream->getContextPositionIds();
+            prefill_context.response->mutable_context_position_ids()->CopyFrom(
+                {context_position_ids->data<int32_t>(),
+                 context_position_ids->data<int32_t>() + context_position_ids->size()});
+        }


The context_position_ids field is being set in the response but is not defined in the protocol buffer schema. This will cause compilation errors since mutable_context_position_ids() method doesn't exist.

Copilot

Pull Request Overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-19T06:18:28Z

rtp_llm/cpp/speculative_engine/SpeculativeEngine.cc

                RTP_LLM_LOG_DEBUG("stream [%ld] set setNeedRemoteGenerate", stream->streamId());
                stream->setNeedRemoteGenerate(true);
            }
+            stream->setPreillMtpReady(true);


Corrected spelling of 'Preill' to 'Prefill' in method name.

Suggested change

stream->setPreillMtpReady(true);

stream->setPrefillMtpReady(true);

rtp_llm/cpp/engine_base/stream/GenerateStream.h

Copilot · 2025-10-19T06:18:29Z

rtp_llm/cpp/engine_base/stream/GenerateStream.cc

+bool GenerateStream::waitForPreillMtpReady() {
+    std::unique_lock<std::mutex> lock(*output_mutex_);
+
+    cv_->wait(lock, [this] {
+        return preill_mtp_ready_ || generate_status_->status == StreamState::STOPPED 
+            || generate_status_->status == StreamState::FINISHED;
+    });
+
+    if(!preill_mtp_ready_ && generate_status_->status == StreamState::STOPPED) {
+        RTP_LLM_LOG_WARNING("waitForPreillMtpReady exits due to stream [%ld] stopped, error: %s",
+                            streamId(),
+                            generate_status_->error_info.ToString().c_str());
+    }
+
+    return preill_mtp_ready_;


Corrected spelling of 'Preill' to 'Prefill' in method name, variable names, and log message.

Suggested change

bool GenerateStream::waitForPreillMtpReady() {

std::unique_lock<std::mutex> lock(*output_mutex_);

cv_->wait(lock, [this] {

return preill_mtp_ready_ || generate_status_->status == StreamState::STOPPED

|| generate_status_->status == StreamState::FINISHED;

});

if(!preill_mtp_ready_ && generate_status_->status == StreamState::STOPPED) {

RTP_LLM_LOG_WARNING("waitForPreillMtpReady exits due to stream [%ld] stopped, error: %s",

streamId(),

generate_status_->error_info.ToString().c_str());

}

return preill_mtp_ready_;

bool GenerateStream::waitForPrefillMtpReady() {

std::unique_lock<std::mutex> lock(*output_mutex_);

cv_->wait(lock, [this] {

return prefill_mtp_ready_ || generate_status_->status == StreamState::STOPPED

|| generate_status_->status == StreamState::FINISHED;

});

if(!prefill_mtp_ready_ && generate_status_->status == StreamState::STOPPED) {

RTP_LLM_LOG_WARNING("waitForPrefillMtpReady exits due to stream [%ld] stopped, error: %s",

streamId(),

generate_status_->error_info.ToString().c_str());

}

return prefill_mtp_ready_;

Copilot · 2025-10-19T06:18:30Z

rtp_llm/cpp/engine_base/stream/GenerateStream.cc

+bool GenerateStream::waitForPreillMtpReady() {
+    std::unique_lock<std::mutex> lock(*output_mutex_);
+
+    cv_->wait(lock, [this] {
+        return preill_mtp_ready_ || generate_status_->status == StreamState::STOPPED 
+            || generate_status_->status == StreamState::FINISHED;
+    });
+
+    if(!preill_mtp_ready_ && generate_status_->status == StreamState::STOPPED) {
+        RTP_LLM_LOG_WARNING("waitForPreillMtpReady exits due to stream [%ld] stopped, error: %s",
+                            streamId(),
+                            generate_status_->error_info.ToString().c_str());
+    }
+
+    return preill_mtp_ready_;


Corrected spelling of 'Preill' to 'Prefill' in method name, variable names, and log message.

Suggested change

bool GenerateStream::waitForPreillMtpReady() {

std::unique_lock<std::mutex> lock(*output_mutex_);

cv_->wait(lock, [this] {

return preill_mtp_ready_ || generate_status_->status == StreamState::STOPPED

|| generate_status_->status == StreamState::FINISHED;

});

if(!preill_mtp_ready_ && generate_status_->status == StreamState::STOPPED) {

RTP_LLM_LOG_WARNING("waitForPreillMtpReady exits due to stream [%ld] stopped, error: %s",

streamId(),

generate_status_->error_info.ToString().c_str());

}

return preill_mtp_ready_;

bool GenerateStream::waitForPrefillMtpReady() {

std::unique_lock<std::mutex> lock(*output_mutex_);

cv_->wait(lock, [this] {

return prefill_mtp_ready_ || generate_status_->status == StreamState::STOPPED

|| generate_status_->status == StreamState::FINISHED;

});

if(!prefill_mtp_ready_ && generate_status_->status == StreamState::STOPPED) {

RTP_LLM_LOG_WARNING("waitForPrefillMtpReady exits due to stream [%ld] stopped, error: %s",

streamId(),

generate_status_->error_info.ToString().c_str());

}

return prefill_mtp_ready_;

Copilot · 2025-10-19T06:18:30Z

rtp_llm/cpp/engine_base/stream/GenerateStream.cc

+bool GenerateStream::waitForPreillMtpReady() {
+    std::unique_lock<std::mutex> lock(*output_mutex_);
+
+    cv_->wait(lock, [this] {
+        return preill_mtp_ready_ || generate_status_->status == StreamState::STOPPED 
+            || generate_status_->status == StreamState::FINISHED;
+    });
+
+    if(!preill_mtp_ready_ && generate_status_->status == StreamState::STOPPED) {
+        RTP_LLM_LOG_WARNING("waitForPreillMtpReady exits due to stream [%ld] stopped, error: %s",
+                            streamId(),
+                            generate_status_->error_info.ToString().c_str());
+    }
+
+    return preill_mtp_ready_;


Corrected spelling of 'Preill' to 'Prefill' in method name, variable names, and log message.

Suggested change

bool GenerateStream::waitForPreillMtpReady() {

std::unique_lock<std::mutex> lock(*output_mutex_);

cv_->wait(lock, [this] {

return preill_mtp_ready_ || generate_status_->status == StreamState::STOPPED

|| generate_status_->status == StreamState::FINISHED;

});

if(!preill_mtp_ready_ && generate_status_->status == StreamState::STOPPED) {

RTP_LLM_LOG_WARNING("waitForPreillMtpReady exits due to stream [%ld] stopped, error: %s",

streamId(),

generate_status_->error_info.ToString().c_str());

}

return preill_mtp_ready_;

bool GenerateStream::waitForPrefillMtpReady() {

std::unique_lock<std::mutex> lock(*output_mutex_);

cv_->wait(lock, [this] {

return prefill_mtp_ready_ || generate_status_->status == StreamState::STOPPED

|| generate_status_->status == StreamState::FINISHED;

});

if(!prefill_mtp_ready_ && generate_status_->status == StreamState::STOPPED) {

RTP_LLM_LOG_WARNING("waitForPrefillMtpReady exits due to stream [%ld] stopped, error: %s",

streamId(),

generate_status_->error_info.ToString().c_str());

}

return prefill_mtp_ready_;

Copilot · 2025-10-19T06:18:30Z

rtp_llm/cpp/engine_base/stream/GenerateStream.cc

+bool GenerateStream::waitForPreillMtpReady() {
+    std::unique_lock<std::mutex> lock(*output_mutex_);
+
+    cv_->wait(lock, [this] {
+        return preill_mtp_ready_ || generate_status_->status == StreamState::STOPPED 
+            || generate_status_->status == StreamState::FINISHED;
+    });
+
+    if(!preill_mtp_ready_ && generate_status_->status == StreamState::STOPPED) {
+        RTP_LLM_LOG_WARNING("waitForPreillMtpReady exits due to stream [%ld] stopped, error: %s",
+                            streamId(),
+                            generate_status_->error_info.ToString().c_str());
+    }
+
+    return preill_mtp_ready_;


Corrected spelling of 'Preill' to 'Prefill' in method name, variable names, and log message.

Suggested change

bool GenerateStream::waitForPreillMtpReady() {

std::unique_lock<std::mutex> lock(*output_mutex_);

cv_->wait(lock, [this] {

return preill_mtp_ready_ || generate_status_->status == StreamState::STOPPED

|| generate_status_->status == StreamState::FINISHED;

});

if(!preill_mtp_ready_ && generate_status_->status == StreamState::STOPPED) {

RTP_LLM_LOG_WARNING("waitForPreillMtpReady exits due to stream [%ld] stopped, error: %s",

streamId(),

generate_status_->error_info.ToString().c_str());

}

return preill_mtp_ready_;

bool GenerateStream::waitForPrefillMtpReady() {

std::unique_lock<std::mutex> lock(*output_mutex_);

cv_->wait(lock, [this] {

return prefill_mtp_ready_ || generate_status_->status == StreamState::STOPPED

|| generate_status_->status == StreamState::FINISHED;

});

if(!prefill_mtp_ready_ && generate_status_->status == StreamState::STOPPED) {

RTP_LLM_LOG_WARNING("waitForPrefillMtpReady exits due to stream [%ld] stopped, error: %s",

streamId(),

generate_status_->error_info.ToString().c_str());

}

return prefill_mtp_ready_;

Copilot · 2025-10-19T06:18:31Z

rtp_llm/cpp/model_rpc/PrefillRpcServerNew.cc

-        break;
+
+        if (engine_->isMTPEagle()) {
+            stream->waitForPreillMtpReady();


Corrected spelling of 'Preill' to 'Prefill' in method call.

Suggested change

stream->waitForPreillMtpReady();

stream->waitForPrefillMtpReady();

Copilot · 2025-10-19T06:18:31Z

rtp_llm/cpp/engine_base/stream/GenerateStream.cc

+            || generate_status_->status == StreamState::FINISHED;
+    });
+
+    if(!preill_mtp_ready_ && generate_status_->status == StreamState::STOPPED) {


Missing space after 'if' keyword. Should be 'if (' instead of 'if('.

Suggested change

if(!preill_mtp_ready_ && generate_status_->status == StreamState::STOPPED) {

if (!preill_mtp_ready_ && generate_status_->status == StreamState::STOPPED) {

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull Request Overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-20T08:49:56Z

rtp_llm/cpp/model_rpc/PrefillRpcServerNew.cc

+
+        if (engine_->isMTPEagle()) {
+            stream->waitForPrefillMtpReady();
+            break;


The break statement causes the loop to exit early only for MTP streams, potentially skipping output processing for non-finished MTP streams. The loop should continue processing outputs until the stream is finished, regardless of MTP status.

Suggested change

break;

Copilot · 2025-10-20T08:49:56Z

rtp_llm/cpp/model_rpc/PrefillRpcServerNew.cc

    auto             modified_request = const_cast<RemoteGenerateRequestPBNew*>(request);
    GenerateInputPB* mutable_input    = modified_request->mutable_input();
+



[nitpick] Unnecessary blank line added without purpose.

Suggested change

Copilot · 2025-10-20T08:49:56Z

rtp_llm/cpp/model_rpc/PrefillRpcServerNew.cc

 ErrorInfo PrefillRpcServerNew::generateFirstToken(PrefillGenerateContextNew& prefill_context) {
    auto stream = prefill_context.getStream();
    engine_->enqueue(stream);
+


[nitpick] Unnecessary trailing whitespace added.

Suggested change

Copilot AI review requested due to automatic review settings October 16, 2025 08:49

SJTUGavinLiu requested a review from LLLLKKKK as a code owner October 16, 2025 08:49

Copilot AI reviewed Oct 16, 2025

View reviewed changes

SJTUGavinLiu force-pushed the develop/chanyin/decode_entrance_mtp branch from 0b8acd7 to 5396b35 Compare October 16, 2025 09:12

zhangchicc reviewed Oct 16, 2025

View reviewed changes

Copilot AI review requested due to automatic review settings October 17, 2025 02:13

SJTUGavinLiu force-pushed the develop/chanyin/decode_entrance_mtp branch from 5396b35 to d2e25e2 Compare October 17, 2025 02:13

Copilot AI reviewed Oct 17, 2025

View reviewed changes

SJTUGavinLiu force-pushed the develop/chanyin/decode_entrance_mtp branch from d2e25e2 to db0bca7 Compare October 17, 2025 02:40

Copilot AI review requested due to automatic review settings October 19, 2025 06:17

SJTUGavinLiu force-pushed the develop/chanyin/decode_entrance_mtp branch from db0bca7 to 4eeeb3f Compare October 19, 2025 06:17

Copilot AI reviewed Oct 19, 2025

View reviewed changes

SJTUGavinLiu force-pushed the develop/chanyin/decode_entrance_mtp branch from 4eeeb3f to 1d982c5 Compare October 19, 2025 10:25

Copilot AI review requested due to automatic review settings October 20, 2025 02:23

Copilot AI reviewed Oct 20, 2025

View reviewed changes

SJTUGavinLiu force-pushed the develop/chanyin/decode_entrance_mtp branch from 45f8b20 to df4c0b3 Compare October 20, 2025 08:18

feat: decode entrance support mtp

040e6fb

Copilot AI review requested due to automatic review settings October 20, 2025 08:49

SJTUGavinLiu force-pushed the develop/chanyin/decode_entrance_mtp branch from df4c0b3 to 040e6fb Compare October 20, 2025 08:49

Copilot AI reviewed Oct 20, 2025

View reviewed changes

		prefill_context.response->mutable_propose_token_ids()->CopyFrom(
		{propose_tokens.begin(), propose_tokens.end()});

	stream->setPreillMtpReady(true);
	stream->setPrefillMtpReady(true);

	stream->waitForPreillMtpReady();
	stream->waitForPrefillMtpReady();

	if(!preill_mtp_ready_ && generate_status_->status == StreamState::STOPPED) {
	if (!preill_mtp_ready_ && generate_status_->status == StreamState::STOPPED) {

		auto modified_request = const_cast<RemoteGenerateRequestPBNew*>(request);
		GenerateInputPB* mutable_input = modified_request->mutable_input();

feat: decode entrance support mtp #245

Are you sure you want to change the base?

feat: decode entrance support mtp #245

Uh oh!

Conversation

SJTUGavinLiu commented Oct 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zhangchicc Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

SJTUGavinLiu Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

zhangchicc Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

SJTUGavinLiu Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment