Skip to content

[Bug]: MTP with enable_schedule_overlap=true causes overflows #1126

@phantomlei3

Description

@phantomlei3

Your environment

MLU
commit id:3011eb4c7a0f4722204878298bafe04c66164ea0

🐛 Describe the bug

test model: GLM-5-W8A8
start command:

for ((i = 0; i < NNODES; i++)); do
    DEVICE=$((START_DEVICE + i))
    LOG_FILE="${LOG_DIR}/node_${i}.log"
    # node_rank = server_rank * NNODES + i
    NODE_RANK=$((SERVER_RANK * NNODES + i))
    xllm \
        --model "${MODEL_PATH}" \
        --devices="mlu:${DEVICE}" \
	--draft_model "${MODEL_PATH}" --draft_devices="mlu:${DEVICE}" --num_speculative_tokens 1 \
	--port "${PORT}" \
        --host="0.0.0.0" \
        --master_node_addr="${MASTER_NODE_ADDR}" \
        --nnodes="${WORLD_SIZE}" \
        --max_memory_utilization=0.84 \
        --max_tokens_per_batch="${max_tokens_per_batch}" \
        --max_seqs_per_batch="${max_seqs_per_batch}" \
        --block_size=16 \
        --max_cache_size=0 \
        --enable_prefix_cache=true \
        --enable_chunked_prefill=true \
        --enable_schedule_overlap=true \
        --enable_prefill_sp=false \
        --node_rank="${NODE_RANK}" \
        --enable_shm=false \
        --enable_graph=false \
        --random_seed=42 \
        --reasoning_parser glm5 \
        --tool_call_parser glm5 \
        --expert_parallel_degree=2 \
	--dp_size=4 \
        --ep_size=${WORLD_SIZE} \
    > "${LOG_FILE}" 2>&1 &
done

error message:

F20260329 13:32:06.908157 70386 spec_input_builder.cpp:82] Check failed: static_cast<size_t>(block_idx) < block_table_slice.size() (17368 vs. 11077) block table index out of range, block_idx=17368, block_table_size=11077, position=277888, block_size=16
*** Check failure stack trace: ***
    @     0x562a85686876  google::LogMessage::SendToLog()
    @     0x562a85682dd4  google::LogMessage::Flush()
    @     0x562a8568700f  google::LogMessageFatal::~LogMessageFatal()
    @     0x562a85c521d2  xllm::specBuilder::calc_slot_id()
    @     0x562a85c52680  xllm::specBuilder::append_decode_row()
    @     0x562a85c38157  _ZZN4xllm13MTPWorkerImpl27prepare_draft_extend_inputsERKNS_12ForwardInputERKNS_12SampleOutputERS1_ENKUliiRKN2at6TensorEE_clEiiSB_
    @     0x562a85c3c1b1  xllm::MTPWorkerImpl::prepare_draft_extend_inputs()
    @     0x562a85c3f184  xllm::MTPWorkerImpl::run_draft_extend()
    @     0x562a85c3f769  xllm::MTPWorkerImpl::run_validate()
    @     0x562a85c40764  xllm::MTPWorkerImpl::step_decode_single()
    @     0x562a85c42245  xllm::MTPWorkerImpl::step_decode()
    @     0x562a85c73216  xllm::SpeculativeWorkerImpl::step()
    @     0x562a85be11b1  _ZZN4xllm10WorkerImpl10step_asyncERKNS_12ForwardInputEENUlvE_clEv
    @     0x562a8750501b  xllm::ThreadPool::internal_loop()

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions