-
Notifications
You must be signed in to change notification settings - Fork 167
[Bug]: MTP with enable_schedule_overlap=true causes overflows #1126
Copy link
Copy link
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Your environment
MLU
commit id:3011eb4c7a0f4722204878298bafe04c66164ea0
🐛 Describe the bug
test model: GLM-5-W8A8
start command:
for ((i = 0; i < NNODES; i++)); do
DEVICE=$((START_DEVICE + i))
LOG_FILE="${LOG_DIR}/node_${i}.log"
# node_rank = server_rank * NNODES + i
NODE_RANK=$((SERVER_RANK * NNODES + i))
xllm \
--model "${MODEL_PATH}" \
--devices="mlu:${DEVICE}" \
--draft_model "${MODEL_PATH}" --draft_devices="mlu:${DEVICE}" --num_speculative_tokens 1 \
--port "${PORT}" \
--host="0.0.0.0" \
--master_node_addr="${MASTER_NODE_ADDR}" \
--nnodes="${WORLD_SIZE}" \
--max_memory_utilization=0.84 \
--max_tokens_per_batch="${max_tokens_per_batch}" \
--max_seqs_per_batch="${max_seqs_per_batch}" \
--block_size=16 \
--max_cache_size=0 \
--enable_prefix_cache=true \
--enable_chunked_prefill=true \
--enable_schedule_overlap=true \
--enable_prefill_sp=false \
--node_rank="${NODE_RANK}" \
--enable_shm=false \
--enable_graph=false \
--random_seed=42 \
--reasoning_parser glm5 \
--tool_call_parser glm5 \
--expert_parallel_degree=2 \
--dp_size=4 \
--ep_size=${WORLD_SIZE} \
> "${LOG_FILE}" 2>&1 &
done
error message:
F20260329 13:32:06.908157 70386 spec_input_builder.cpp:82] Check failed: static_cast<size_t>(block_idx) < block_table_slice.size() (17368 vs. 11077) block table index out of range, block_idx=17368, block_table_size=11077, position=277888, block_size=16
*** Check failure stack trace: ***
@ 0x562a85686876 google::LogMessage::SendToLog()
@ 0x562a85682dd4 google::LogMessage::Flush()
@ 0x562a8568700f google::LogMessageFatal::~LogMessageFatal()
@ 0x562a85c521d2 xllm::specBuilder::calc_slot_id()
@ 0x562a85c52680 xllm::specBuilder::append_decode_row()
@ 0x562a85c38157 _ZZN4xllm13MTPWorkerImpl27prepare_draft_extend_inputsERKNS_12ForwardInputERKNS_12SampleOutputERS1_ENKUliiRKN2at6TensorEE_clEiiSB_
@ 0x562a85c3c1b1 xllm::MTPWorkerImpl::prepare_draft_extend_inputs()
@ 0x562a85c3f184 xllm::MTPWorkerImpl::run_draft_extend()
@ 0x562a85c3f769 xllm::MTPWorkerImpl::run_validate()
@ 0x562a85c40764 xllm::MTPWorkerImpl::step_decode_single()
@ 0x562a85c42245 xllm::MTPWorkerImpl::step_decode()
@ 0x562a85c73216 xllm::SpeculativeWorkerImpl::step()
@ 0x562a85be11b1 _ZZN4xllm10WorkerImpl10step_asyncERKNS_12ForwardInputEENUlvE_clEv
@ 0x562a8750501b xllm::ThreadPool::internal_loop()
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working