Skip to content

[Bug]: GLM-5-w8a8 fails to launch in 910C #967

@Pastens

Description

@Pastens

Your environment

  • Hardware: 910C with ARM
  • xLLM version: preview/glm5
  • startup parameters:
    • --max_memory_utilization=0.85
      --max_tokens_per_batch=8192
      --max_seqs_per_batch=16
      --block_size=128
      --enable_prefix_cache=true
      --enable_chunked_prefill=true
      --communication_backend="hccl"
      --enable_schedule_overlap=true
      --enable_graph=true
      --enable_graph_no_padding=true
      --enable_mla=true
      --draft_model=$DRAFT_MODEL_PATH
      --draft_devices="npu:$DEVICE"
      --num_speculative_tokens=1
      --ep_size=8
      --dp_size=1

🐛 Describe the bug

  1. Log from rank-0 showed that word_embedding_layer execute plan fail:
I20260302 12:21:10.077145 521062 llm_engine.cpp:389] Initializing v cache with shape: [275 128 1 64]
I20260302 12:21:10.077220 521062 llm_engine.cpp:391] Initializing indexer cache with shape: [275 128 1 128]
I20260302 12:21:10.078318 521062 profile_manager.cpp:63] Starting ACL Graph/CUDA Graph warmup.
I20260302 12:21:10.078365 521062 profile_manager.cpp:771] Starting ACL Graph/CUDA Graph warmup with prefill and decode requests...
I20260302 12:21:10.078394 521062 profile_manager.cpp:809] Warming up prefill request: tokens=8192
mki_log mkdir /root/ascend/log/atb
E20260302 12:21:10.300601 525515 npu_base_layer.cpp:124] word_embedding_layer execute plan fail, error code: 28
terminate called after throwing an instance of 'std::runtime_error'
terminate called recursively
  what():  The Inner error is reported as above. The process exits for this inner error, and the current working operator name is word_embedding_layer0.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, please set the environment variable ASCEND_LAUNCH_BLOCKING=1.
Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging.
[ERROR] 2026-03-02-12:21:10 (PID:521062, Device:0, RankID:-1) ERR00100 PTA call acl api failed.

[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
/usr/lib64/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
  1. ATB log show that the HcclGetRootInfo fail:
[2026-03-02 12:21:10.125760] [error] [525604] [hccl_runner.cpp:178] AllGatherHcclRunner:0 HcclGetRootInfo fail, error:7, rank:0
[2026-03-02 12:21:10.127881] [error] [525604] [comm_pool.h:42] CommPool commCreateFunc fail
[2026-03-02 12:21:10.127889] [error] [525604] [hccl_runner.cpp:81] AllGatherHcclRunner:0 get hccl comm fail by rank:0
[2026-03-02 12:21:10.300542] [error] [525515] [all_gather_hccl_runner.cpp:39] hcclComm is null, rank: 0
[2026-03-02 12:21:10.300575] [error] [525515] [runner.cpp:133] AllGatherHcclRunner_0_1:1 Execute Failed. st: 28
[2026-03-02 12:21:10.300583] [error] [525515] [graph_runner.cpp:972] WordEmbeddingRunner_0:0  node[1] execute fail, runner name:AllGatherHcclRunner
[2026-03-02 12:21:10.300588] [error] [525515] [runner.cpp:133] WordEmbeddingRunner_0:1 Execute Failed. st: 28
[2026-03-02 12:21:10.300593] [error] [525515] [operation_base.cpp:1018] WordEmbedding_0 execute WordEmbeddingRunner fail
[2026-03-02 12:21:10.300596] [error] [525515] [operation_base.cpp:1095] WordEmbedding_0 Launch fail, error code: 28
  1. Always happens.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions