Skip to content

convert_checkpoint report error #2356

Open
@imilli

Description

@imilli

System Info
GPU: NVIDIA RTX 4090
TensorRT-LLM 0.13

root@docker-desktop:/llm/tensorrt-llm-0.13.0/examples/chatglm# python3 convert_checkpoint.py --chatglm_version glm4 --model_dir "/llm/other/models/glm-4-9b-chat" --output_dir "/llm/other/trt-model" --dtype float16 --use_weight_only --int8_kv_cache --weight_only_precision int8

[TensorRT-LLM] TensorRT-LLM version: 0.13.0
0.13.0
Inferring chatglm version from path...
Chatglm version: glm4
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████| 10/10 [04:35<00:00, 27.53s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Calibration: 100%|█████████████████████████████████████████████████████████████████████████| 64/64 [00:05<00:00, 10.68it/s]
Traceback (most recent call last):
File "/llm/tensorrt-llm-0.13.0/examples/chatglm/convert_checkpoint.py", line 263, in
main()
File "/llm/tensorrt-llm-0.13.0/examples/chatglm/convert_checkpoint.py", line 255, in main
convert_and_save_hf(args)
File "/llm/tensorrt-llm-0.13.0/examples/chatglm/convert_checkpoint.py", line 213, in convert_and_save_hf
ChatGLMForCausalLM.quantize(args.model_dir,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/chatglm/model.py", line 351, in quantize
convert.quantize(hf_model_dir,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/chatglm/convert.py", line 723, in quantize
weights = load_weights_from_hf_model(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/chatglm/convert.py", line 438, in load_weights_from_hf_model
np.array([qkv_vals_int8['scale_y_quant_orig']],
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 1084, in array
return self.numpy().astype(dtype, copy=False)
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    LLM API/WorkflowHigh-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.bugSomething isn't workingtriagedIssue has been triaged by maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions