[Model] Add native OpenPangu Embedded 7B backend #27941

YoussefEssDS · 2025-11-02T14:56:19Z

Purpose

Add a vLLM backend for the OpenPangu Embedded 7B model (https://huggingface.co/FreedomIntelligence/openPangu-Embedded-7B), so it no longer relies on the transformers fallback.
Register the architecture (PanguEmbeddedForCausalLM) and provide a native implementation that mirrors the HF reference.

Test Plan

Start instance: vllm serve FreedomIntelligence/openPangu-Embedded-7B --tensor-parallel-size 4 --port 18051 --trust_remote_code
Functional correctness:
curl http://localhost:18051/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "FreedomIntelligence/openPangu-Embedded-7B", "messages": [ {"role": "user", "content": "What is the capital of Morocco?."} ], "temperature": 0.6, "top_p": 0.95, "top_k": 20, "max_tokens": 8192 }'
Run performance benchmark: (Used the script here: https://gist.github.com/YoussefEssDS/03c456ef5fed6f24e27394d2b2f2cc05)

Test Result

Significant improvement in throughput (TP=4 on nvidia H100 GPUs): vLLM backend ~147.5 T/s vs. transformers fallback ~50.8 T/s.

github-actions · 2025-11-02T14:56:26Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request adds native support for the OpenPangu Embedded 7B model. The implementation is well-structured and follows the existing patterns in vLLM, particularly drawing from the Llama model implementation. I've identified a critical bug in the model's forward pass that affects speculative decoding and a missing configuration in the test registry. Addressing these issues will ensure the model integrates correctly and performs as expected.

gemini-code-assist · 2025-11-02T15:05:16Z

vllm/model_executor/models/pangu.py

+
+        aux_hidden_states: list[torch.Tensor] = []
+        for idx, layer in enumerate(self.layers[self.start_layer : self.end_layer]):
+            if idx in self.aux_hidden_state_layers:


There is a bug in how auxiliary hidden states are collected for speculative decoding. The loop variable idx is a relative index for the layers on the current pipeline stage, starting from 0. However, self.aux_hidden_state_layers contains global layer indices. This means the check if idx in self.aux_hidden_state_layers: will almost always be incorrect, leading to auxiliary hidden states not being collected, which breaks features like EAGLE3 speculative decoding.

To fix this, you should use the global layer index for the check by adding self.start_layer to idx.

Suggested change

if idx in self.aux_hidden_state_layers:

if self.start_layer + idx in self.aux_hidden_state_layers:

gemini-code-assist · 2025-11-02T15:05:16Z

tests/models/registry.py

+    "PanguEmbeddedForCausalLM": _HfExamplesInfo(
+        "FreedomIntelligence/openPangu-Embedded-7B"
+    ),


The test plan in the pull request description indicates that trust_remote_code=True is required to run this model. This parameter is missing from the _HfExamplesInfo entry, which could cause tests for this model to fail. Please add trust_remote_code=True to ensure the tests can run correctly.

Suggested change

"PanguEmbeddedForCausalLM": _HfExamplesInfo(

"FreedomIntelligence/openPangu-Embedded-7B"

),

"PanguEmbeddedForCausalLM": _HfExamplesInfo(

"FreedomIntelligence/openPangu-Embedded-7B", trust_remote_code=True

),

tests/models/registry.py

DarkLight1337 · 2025-11-03T03:03:26Z

/gemini review

gemini-code-assist

Code Review

This pull request adds a native vLLM backend for the OpenPangu Embedded 7B model, which is a valuable performance enhancement. The implementation correctly mirrors the Hugging Face reference architecture, notably by using separate projections for QKV and MLP layers, and the code is well-structured, following existing patterns in the vLLM codebase. I have identified one critical latent bug related to speculative decoding that should be addressed to ensure future stability.

gemini-code-assist · 2025-11-03T03:06:02Z

vllm/model_executor/models/pangu.py

+            if self.start_layer + idx in self.aux_hidden_state_layers:
+                aux_hidden_states.append(hidden_states + residual)


There's a potential TypeError when using speculative decoding. The residual tensor can be None before the first decoder layer, which will cause hidden_states + residual to fail. This can happen if 0 is in aux_hidden_state_layers. While speculative decoding might not be the primary use case for this PR, it's best to fix this latent bug to prevent future issues.

Suggested change

if self.start_layer + idx in self.aux_hidden_state_layers:

aux_hidden_states.append(hidden_states + residual)

if self.start_layer + idx in self.aux_hidden_state_layers:

if residual is None:

aux_hidden_states.append(hidden_states)

else:

aux_hidden_states.append(hidden_states + residual)

tests/models/registry.py

Signed-off-by: YoussefEssDS <[email protected]>

mergify · 2025-11-03T03:42:18Z

Documentation preview: https://vllm--27941.org.readthedocs.build/en/27941/

Signed-off-by: YoussefEssDS <[email protected]>

docs/models/supported_models.md

Signed-off-by: YoussefEssDS <[email protected]>

DarkLight1337

LGTM, thanks for your patience

jeejeelee · 2025-11-03T09:20:01Z

FYI @DarkLight1337 @YoussefEssDS We already have a PR supporting this model, see: #27521

yzy1996 · 2025-11-03T09:58:16Z

vllm/model_executor/models/pangu.py

+        if get_pp_group().is_first_rank or (
+            getattr(config, "tie_word_embeddings", True) and get_pp_group().is_last_rank
+        ):
+            self.embed_tokens = VocabParallelEmbedding(


The missing line prefix=f"{prefix}.embed_tokens", may disable quantization support. As @jeejeelee pointed out, this PR should be merged into #27521.

DarkLight1337 · 2025-11-03T12:13:42Z

Sorry I missed that, let's work on the existing PR #27521 then.

YoussefEssDS requested review from DarkLight1337 and ywang96 as code owners November 2, 2025 14:56

mergify bot added the new-model Requests to new models label Nov 2, 2025

gemini-code-assist bot reviewed Nov 2, 2025

View reviewed changes

DarkLight1337 reviewed Nov 3, 2025

View reviewed changes

tests/models/registry.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Nov 3, 2025

View reviewed changes

DarkLight1337 reviewed Nov 3, 2025

View reviewed changes

tests/models/registry.py Show resolved Hide resolved

YoussefEssDS added 3 commits November 3, 2025 03:34

Add native OpenPangu Embedded backend to vLLM

5afb5b8

Signed-off-by: YoussefEssDS <[email protected]>

Fix Pangu aux-state indexing and apply ruff format

80fbcca

Signed-off-by: YoussefEssDS <[email protected]>

Guard aux residual collection & update supported models docs

232c9c5

Signed-off-by: YoussefEssDS <[email protected]>

YoussefEssDS force-pushed the openpangu-embedded branch from 6d694a4 to 232c9c5 Compare November 3, 2025 03:41

mergify bot added the documentation Improvements or additions to documentation label Nov 3, 2025

Add missing doc entry

dd7cf36

Signed-off-by: YoussefEssDS <[email protected]>

DarkLight1337 reviewed Nov 3, 2025

View reviewed changes

docs/models/supported_models.md Outdated Show resolved Hide resolved

Fix model placement in docs

2ce7be4

Signed-off-by: YoussefEssDS <[email protected]>

DarkLight1337 approved these changes Nov 3, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) November 3, 2025 04:58

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 3, 2025

yzy1996 reviewed Nov 3, 2025

View reviewed changes

DarkLight1337 closed this Nov 3, 2025

auto-merge was automatically disabled November 3, 2025 12:13
Pull request was closed

	if idx in self.aux_hidden_state_layers:
	if self.start_layer + idx in self.aux_hidden_state_layers:

		if self.start_layer + idx in self.aux_hidden_state_layers:
		aux_hidden_states.append(hidden_states + residual)

Uh oh!

[Model] Add native OpenPangu Embedded 7B backend #27941

[Model] Add native OpenPangu Embedded 7B backend #27941

Uh oh!

Conversation

YoussefEssDS commented Nov 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Nov 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DarkLight1337 commented Nov 3, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Nov 3, 2025

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

jeejeelee commented Nov 3, 2025

Uh oh!

yzy1996 Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

YoussefEssDS commented Nov 2, 2025 •

edited by github-actions bot

Loading