Skip to content

Commit 566b4ff

Browse files
authored
Update TensorRT-LLM backend (triton-inference-server#494)
Update TensorRT-LLM backend
1 parent 39ba55a commit 566b4ff

File tree

4 files changed

+35
-8
lines changed

4 files changed

+35
-8
lines changed

README.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,6 @@ repo. If you don't find your answer there you can ask questions on the
4545

4646
There are several ways to access the TensorRT-LLM Backend.
4747

48-
**Before Triton 23.10 release, please use [Option 3 to build TensorRT-LLM backend via Docker](#option-3-build-via-docker).**
49-
5048
### Run the Pre-built Docker Container
5149

5250
Starting with Triton 23.10 release, Triton includes a container with the TensorRT-LLM

docs/llama.md

Lines changed: 33 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,42 @@
1+
## End to end workflow to run llama 7b
12

2-
## End to end workflow to run llama
3+
0. Make sure that you have initialized the TRT-LLM submodule:
34

4-
* Build engine
5+
```bash
6+
git lfs install
7+
git submodule update --init --recursive
8+
```
9+
10+
1. (Optional) Download the LLaMa model from HuggingFace:
11+
12+
```bash
13+
huggingface-cli login
14+
15+
huggingface-cli download meta-llama/Llama-2-7b-hf
16+
```
17+
18+
> **NOTE**
19+
>
20+
> Make sure that you have access to https://huggingface.co/meta-llama/Llama-2-7b-hf.
21+
22+
2. Start the Triton Server Docker container:
23+
24+
```bash
25+
# Replace <yy.mm> with the version of Triton you want to use.
26+
# The command below assumes the the current directory is the
27+
# TRT-LLM backend root git repository.
28+
29+
docker run --rm -ti -v `pwd`:/mnt -w /mnt -v ~/.cache/huggingface:~/.cache/huggingface --gpus all nvcr.io/nvidia/tritonserver:\<yy.mm\>-trtllm-python-py3 bash
30+
```
531

32+
3. Build the engine:
633
```bash
7-
export HF_LLAMA_MODEL=llama-7b-hf/
34+
# Replace 'HF_LLAMA_MODE' with another path if you didn't download the model from step 1
35+
# or you're not using HuggingFace.
36+
export HF_LLAMA_MODEL=`python3 -c "from pathlib import Path; from huggingface_hub import hf_hub_download; print(Path(hf_hub_download('meta-llama/Llama-2-7b-hf', filename='config.json')).parent)"`
837
export UNIFIED_CKPT_PATH=/tmp/ckpt/llama/7b/
938
export ENGINE_PATH=/tmp/engines/llama/7b/
10-
python convert_checkpoint.py --model_dir ${HF_LLAMA_MODEL} \
39+
python tensorrt_llm/examples/llama/convert_checkpoint.py --model_dir ${HF_LLAMA_MODEL} \
1140
--output_dir ${UNIFIED_CKPT_PATH} \
1241
--dtype float16
1342

tensorrt_llm

Submodule tensorrt_llm updated 301 files

tools/version.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
225fd4fc55948de398989c334464d4478064b4f7
1+
1353d8632b255979eac4667d631a90538c07d269

0 commit comments

Comments
 (0)