Skip to content

LLAMA checkpoint ImportError: undefined symbol #1950

Open
@Pareek-Yash

Description

@Pareek-Yash

System Info

  • Architecture: x86_64
  • OS Ubuntu 22.04
  • GPU: NVIDIA GeForce RTX 4090
  • Gpu memory 2x24gb
  • CPU max MHz: 5000.0000
  • Driver Version: 535.183.01
  • CUDA Version: 12.2
  • Container: nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3
  • TensorRT-LLM version: 0.10.0

Who can help?

@byshiue @nv-guomingz

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

nvidia-docker run -d -it --name trtllm     -v /home/remotessh/text-generation-webui/models/Llama-2-13b-chat-hf:/root/.cache/huggingface/llama-2-13b-chat-hf     -v /home/remotessh/TensorRT_engines:/engines     --shm-size=16G --network=host nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3 /bin/bash
11f808569af7484e003da1e5eb26729a7decd74f470b0699d983056df2ca1aef
git clone https://github.com/triton-inference-server/tensorrtllm_backend.git
cd tensorrtllm_backend/
git clone https://github.com/NVIDIA/TensorRT-LLM.git
pip install git+https://github.com/NVIDIA/TensorRT-LLM.git

mkdir /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/

cp /opt/tritonserver/backends/tensorrtllm/* /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/
export PYTHONPATH=/root/.cache/huggingface/llama-2-13b-chat-hf
  1. This leads to the following error:

ImportError: The bindings module does not exist. Please check the package integrity. If you are attempting to use the pip development mode (editable installation), please execute build_wheels.py first, and then run pip install -e ..

python scripts/build_wheel.py
pip install -e
cd TensorRT-LLM/examples/llama
python convert_checkpoint.py --model_dir /root/.cache/huggingface/llama-2-13b-chat-hf \
                             --output_dir /workspace/tensorrt_llm/llama-2-13b-chat-hf \
                             --dtype float16 \
                             --tp_size 2

To get this error:

/opt/tritonserver/tensorrtllm_backend/TensorRT-LLM/examples/llama# python convert_checkpoint.py --model_dir /root/.cache/huggingface/llama-2-13b-chat-hf/llama-2-13b-chat-hf --output_dir /workspace/tensorrt_llm/llama-2-13b-chat-hf --dtype float16 --tp_size 2
Traceback (most recent call last):
File "/opt/tritonserver/tensorrtllm_backend/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 8, in
import tensorrt_llm
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/init.py", line 32, in
import tensorrt_llm.functional as functional
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/functional.py", line 28, in
from . import graph_rewriting as gw
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/graph_rewriting.py", line 12, in
from .network import Network
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/network.py", line 27, in
from tensorrt_llm.module import Module
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 17, in
from ._common import default_net
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_common.py", line 31, in
from ._utils import str_dtype_to_trt
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_utils.py", line 30, in
from tensorrt_llm.bindings.BuildInfo import ENABLE_MULTI_DEVICE

ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKSs

Expected behavior

convert checkpoints successfull

actual behavior

ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKSs

additional notes

I'm following the official documentation and some fixes suggested by other devs

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingfunctionality issuetriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions