Skip to content

[Bug]: 如何手动支持GLM-4.6-W8A8? #116

@kev1n77

Description

@kev1n77

您的联系方式

2506930365@qq.com

问题描述

想使用chitu部署推理GLM-4.6-W8A8应该如何操作?需要修改配置文件或者添加相关的代码,然后重新编译吗?
直接修改配置运行会报错:

AssertionError: Illegal parallel tensor layers.0.mlp.gate_proj.weight_offset

机器:昇腾910B4(8*32G)
镜像:qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-ascend:latest
模型链接:https://modelers.cn/models/Modelers_Park/GLM-4.6-w8a8
启动脚本:

nic_name="enp67s0f5"
node0_ip="xxx"
local_ip="xxx"
export GLOO_SOCKET_IFNAME=$nic_name
export TP_SOCKET_IFNAME=$nic_name
export HCCL_SOCKET_IFNAME=$nic_name
export HCCL_IF_IP=$local_ip
export WORLD_SIZE=32
export TASK_QUEUE_ENABLE=1
export CPU_AFFINITY_CONF=2
export HCCL_OP_EXPANSION_MODE=AIV
torchrun --nnodes 4
--nproc_per_node 8
--master_addr=$local_ip
--master_port=22525
--node_rank=0
--rdzv_conf='timeout=1500'
-m chitu
serve.host=0.0.0.0
serve.port=21002
infer.cache_type=paged
infer.attn_type=npu
infer.pp_size=2
infer.dp_size=2
infer.tp_size=8
models=GLM-4.6
models.ckpt_dir=/nfs/data1/models/Modelers_Park/GLM-4.6-w8a8
quant=ascend_w8a8_dynamic
infer.mla_absorb=absorb-without-precomp
infer.raise_lower_bit_float_to=bfloat16
scheduler.pp_config.prefill_num_tasks_divided_by_pp=False
scheduler.pp_config.prefill_num_tasks=8
scheduler.pp_config.enforce_decode_num_tasks_max=True
scheduler.pp_config.decode_num_tasks=8
infer.max_reqs=8
infer.max_seq_len=65536
request.max_new_tokens=8192
infer.use_cuda_graph=True

提交前请确认

  • 确认

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions