LLAMA checkpoint ImportError: undefined symbol

### System Info

- Architecture: x86_64
- OS Ubuntu 22.04
- GPU: NVIDIA GeForce RTX 4090
- Gpu memory 2x24gb
- CPU max MHz: 5000.0000
- Driver Version: 535.183.01
- CUDA Version: 12.2
- Container: nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3
- TensorRT-LLM version: 0.10.0

### Who can help?

@byshiue @nv-guomingz 

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

1. 
```
nvidia-docker run -d -it --name trtllm     -v /home/remotessh/text-generation-webui/models/Llama-2-13b-chat-hf:/root/.cache/huggingface/llama-2-13b-chat-hf     -v /home/remotessh/TensorRT_engines:/engines     --shm-size=16G --network=host nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3 /bin/bash
11f808569af7484e003da1e5eb26729a7decd74f470b0699d983056df2ca1aef
git clone https://github.com/triton-inference-server/tensorrtllm_backend.git
cd tensorrtllm_backend/
git clone https://github.com/NVIDIA/TensorRT-LLM.git
pip install git+https://github.com/NVIDIA/TensorRT-LLM.git

mkdir /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/

cp /opt/tritonserver/backends/tensorrtllm/* /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/
export PYTHONPATH=/root/.cache/huggingface/llama-2-13b-chat-hf
```
2. This leads to the following error:

> ImportError: The `bindings` module does not exist. Please check the package integrity. If you are attempting to use the pip development mode (editable installation), please execute `build_wheels.py` first, and then run `pip install -e .`.

3.
```
python scripts/build_wheel.py
pip install -e
cd TensorRT-LLM/examples/llama
python convert_checkpoint.py --model_dir /root/.cache/huggingface/llama-2-13b-chat-hf \
                             --output_dir /workspace/tensorrt_llm/llama-2-13b-chat-hf \
                             --dtype float16 \
                             --tp_size 2
```
To get this error:

/opt/tritonserver/tensorrtllm_backend/TensorRT-LLM/examples/llama# python convert_checkpoint.py --model_dir /root/.cache/huggingface/llama-2-13b-chat-hf/llama-2-13b-chat-hf                              --output_dir /workspace/tensorrt_llm/llama-2-13b-chat-hf                              --dtype float16                              --tp_size 2
Traceback (most recent call last):
  File "/opt/tritonserver/tensorrtllm_backend/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 8, in <module>
    import tensorrt_llm
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/__init__.py", line 32, in <module>
    import tensorrt_llm.functional as functional
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/functional.py", line 28, in <module>
    from . import graph_rewriting as gw
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/graph_rewriting.py", line 12, in <module>
    from .network import Network
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/network.py", line 27, in <module>
    from tensorrt_llm.module import Module
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 17, in <module>
    from ._common import default_net
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_common.py", line 31, in <module>
    from ._utils import str_dtype_to_trt
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_utils.py", line 30, in <module>
    from tensorrt_llm.bindings.BuildInfo import ENABLE_MULTI_DEVICE

> ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKSs

### Expected behavior

convert checkpoints successfull

### actual behavior

ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKSs

### additional notes

I'm following the official documentation and some fixes suggested by other devs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLAMA checkpoint ImportError: undefined symbol #1950

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LLAMA checkpoint ImportError: undefined symbol #1950

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions