feature - adapt deepseek in model py #207

Nancheng-11 · 2025-10-10T03:28:20Z

feature - add fmha ut & fix build

feature - add torch mla in pymodel

fix - align deepseekv2 output using hack layer!!

fix - align deeseek v2 output using lite-chat

feature - support prefill & decode mla cpp ops

refactor - mv flashinfer mla ops to fmha.py

fix - add deps in BUILD

CLAassistant · 2025-10-10T03:28:27Z

All committers have signed the CLA.

rtp_llm/models_py/modules/mla/flashinfer_mla.py

LLLLKKKK · 2025-10-10T12:38:43Z

需要 smoke 测试

Nancheng-11 · 2025-10-10T12:51:37Z

需要 smoke 测试

smoke test和一些镜像依赖包后续一并提交到main-internal分支

rtp_llm/models_py/model_desc/deepseek_v2.py

rtp_llm/models_py/bindings/cuda/FlashInferMlaParams.cc

LLLLKKKK · 2025-10-13T05:54:46Z

需要 smoke 测试

smoke test和一些镜像依赖包后续一并提交到main-internal分支

提交到 open_merge 分支一起跑 ci。

rtp_llm/models_py/bindings/OpDefs.cc

rtp_llm/models_py/bindings/cuda/FlashInferMlaParams.cc

rtp_llm/models_py/test/mla_test.py

rtp_llm/ops/libth_transformer.pyi

rtp_llm/models_py/modules/mla/flashinfer_mla.py

rtp_llm/models_py/model_desc/deepseek_v2.py

rtp_llm/models_py/modules/mla/flashinfer_mla.py

rtp_llm/models_py/modules/fmha.py

LLLLKKKK · 2025-10-16T07:29:28Z

rtp_llm/models_py/modules/mla/rotary_emb.py

+        self.token_per_block = token_per_block
+
+    def prepare(self, attention_inputs: PyAttentionInputs):
+        return rtp_llm_ops.FlashInferMlaAttnParams().fill_mla_params(


rtp_llm_ops.fill_mla_params 或者 rtp_llm_ops.mla.fill_mla_params 这样就行？

试过不行诶，参考的这个代码
pybind11::class_<FlashInferPrefillOp>(m, "FlashInferPrefillOp") .def(pybind11::init<GptInitParameter>(), py::arg("gpt_init_parameter")) .def("support", &FlashInferPrefillOp::support, py::arg("attn_inputs")) .def("prepare", &FlashInferPrefillOp::prepare, py::arg("attn_inputs")) .def("forward", &FlashInferPrefillOp::forward, py::arg("q"), py::arg("kv_cache"), py::arg("params"));
在python侧也是先生成FlashInferPrefillOp(config.gpt_init_params)类，再调用的fmha_impl.forward
除非把这个FlashInferMlaAttnParams类变成一个function，之前是这么写的，后来改成的类

LLLLKKKK · 2025-10-16T07:31:56Z

rtp_llm/models_py/modules/fmha.py


    PREFILL_MHA_IMPS.append(FlashInferPrefillImpl)
+
+    class MlaFlashInferPrefillImpl(FMHAPrefillImplBase):


一般情况 prefill 应该直接用 flashattention ？

MLA的prefill在flashinfer中推荐是用BatchPrefillWithRaggedKVCacheWrapper这个，之前C++的MLA prefill是用的TRTV2，也可以直接把参数前处理包一层用TRTV2
这俩哪个合适，可能需要做性能对比

LLLLKKKK · 2025-10-16T07:34:30Z

rtp_llm/models_py/bindings/cuda/FlashInferMlaParams.cc

+namespace rtp_llm {
+
+MlaParams
+FlashInferMlaAttnParams::fillParams(torch::Tensor t_prefix_lengths,


这个应该也不是 MLA 专用的？是通用的 flashinfer params ？

是在通用的基础上改的，抽成MlaAttnParams的想法是后面如果Mla params有特殊修改的话，就在这个单独的FlashInferMlaAttnParams里修改

Nancheng-11 force-pushed the feature/pymodel_deepseek branch from 1074bfb to bf3806f Compare October 10, 2025 08:16

LLLLKKKK requested changes Oct 10, 2025

View reviewed changes

rtp_llm/models_py/modules/mla/flashinfer_mla.py Outdated Show resolved Hide resolved

Nancheng-11 force-pushed the feature/pymodel_deepseek branch from bf3806f to f0b8cc7 Compare October 10, 2025 12:39

Nancheng-11 force-pushed the feature/pymodel_deepseek branch from f0b8cc7 to 7c8e9da Compare October 11, 2025 01:51

LLLLKKKK assigned netaddi Oct 11, 2025

LLLLKKKK requested review from baowendin and netaddi October 11, 2025 04:11

baowendin reviewed Oct 11, 2025

View reviewed changes

rtp_llm/models_py/model_desc/deepseek_v2.py Outdated Show resolved Hide resolved

Nancheng-11 force-pushed the feature/pymodel_deepseek branch from 04eeba7 to 51212f5 Compare October 11, 2025 13:23

LLLLKKKK reviewed Oct 13, 2025

View reviewed changes

rtp_llm/models_py/bindings/cuda/FlashInferMlaParams.cc Outdated Show resolved Hide resolved

netaddi reviewed Oct 13, 2025

View reviewed changes

netaddi reviewed Oct 14, 2025

View reviewed changes

rtp_llm/models_py/modules/mla/flashinfer_mla.py Show resolved Hide resolved

baowendin reviewed Oct 14, 2025

View reviewed changes

rtp_llm/models_py/model_desc/deepseek_v2.py Outdated Show resolved Hide resolved

baowendin reviewed Oct 14, 2025

View reviewed changes

rtp_llm/models_py/modules/mla/flashinfer_mla.py Outdated Show resolved Hide resolved

rtp_llm/models_py/modules/mla/flashinfer_mla.py Outdated Show resolved Hide resolved

rtp_llm/models_py/modules/fmha.py Show resolved Hide resolved

Nancheng-11 force-pushed the feature/pymodel_deepseek branch 10 times, most recently from c12c735 to f860657 Compare October 16, 2025 06:29

LLLLKKKK enabled auto-merge (rebase) October 16, 2025 07:25

LLLLKKKK requested changes Oct 16, 2025

View reviewed changes

Nancheng-11 force-pushed the feature/pymodel_deepseek branch 4 times, most recently from f49b636 to d881269 Compare October 17, 2025 06:57

feature - adapt deepseek in model py

c0cead4

Nancheng-11 force-pushed the feature/pymodel_deepseek branch from 97ed9be to c0cead4 Compare October 17, 2025 08:22


		PREFILL_MHA_IMPS.append(FlashInferPrefillImpl)

		class MlaFlashInferPrefillImpl(FMHAPrefillImplBase):

feature - adapt deepseek in model py #207

Are you sure you want to change the base?

feature - adapt deepseek in model py #207

Uh oh!

Conversation

Nancheng-11 commented Oct 10, 2025

Uh oh!

CLAassistant commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

LLLLKKKK commented Oct 10, 2025

Uh oh!

Nancheng-11 commented Oct 10, 2025

Uh oh!

Uh oh!

Uh oh!

LLLLKKKK commented Oct 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LLLLKKKK Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Nancheng-11 Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LLLLKKKK Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Nancheng-11 Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

LLLLKKKK Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Nancheng-11 Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CLAassistant commented Oct 10, 2025 •

edited

Loading

Nancheng-11 Oct 16, 2025 •

edited

Loading