File tree Expand file tree Collapse file tree 4 files changed +35
-8
lines changed
Expand file tree Collapse file tree 4 files changed +35
-8
lines changed Original file line number Diff line number Diff line change @@ -45,8 +45,6 @@ repo. If you don't find your answer there you can ask questions on the
4545
4646There are several ways to access the TensorRT-LLM Backend.
4747
48- ** Before Triton 23.10 release, please use [ Option 3 to build TensorRT-LLM backend via Docker] ( #option-3-build-via-docker ) .**
49-
5048### Run the Pre-built Docker Container
5149
5250Starting with Triton 23.10 release, Triton includes a container with the TensorRT-LLM
Original file line number Diff line number Diff line change 1+ ## End to end workflow to run llama 7b
12
2- ## End to end workflow to run llama
3+ 0 . Make sure that you have initialized the TRT-LLM submodule:
34
4- * Build engine
5+ ``` bash
6+ git lfs install
7+ git submodule update --init --recursive
8+ ```
9+
10+ 1 . (Optional) Download the LLaMa model from HuggingFace:
11+
12+ ``` bash
13+ huggingface-cli login
14+
15+ huggingface-cli download meta-llama/Llama-2-7b-hf
16+ ```
17+
18+ > ** NOTE**
19+ >
20+ > Make sure that you have access to https://huggingface.co/meta-llama/Llama-2-7b-hf .
21+
22+ 2 . Start the Triton Server Docker container:
23+
24+ ``` bash
25+ # Replace <yy.mm> with the version of Triton you want to use.
26+ # The command below assumes the the current directory is the
27+ # TRT-LLM backend root git repository.
28+
29+ docker run --rm -ti -v ` pwd` :/mnt -w /mnt -v ~ /.cache/huggingface:~ /.cache/huggingface --gpus all nvcr.io/nvidia/tritonserver:\< yy.mm\> -trtllm-python-py3 bash
30+ ```
531
32+ 3 . Build the engine:
633``` bash
7- export HF_LLAMA_MODEL=llama-7b-hf/
34+ # Replace 'HF_LLAMA_MODE' with another path if you didn't download the model from step 1
35+ # or you're not using HuggingFace.
36+ export HF_LLAMA_MODEL=` python3 -c " from pathlib import Path; from huggingface_hub import hf_hub_download; print(Path(hf_hub_download('meta-llama/Llama-2-7b-hf', filename='config.json')).parent)" `
837export UNIFIED_CKPT_PATH=/tmp/ckpt/llama/7b/
938export ENGINE_PATH=/tmp/engines/llama/7b/
10- python convert_checkpoint.py --model_dir ${HF_LLAMA_MODEL} \
39+ python tensorrt_llm/examples/llama/ convert_checkpoint.py --model_dir ${HF_LLAMA_MODEL} \
1140 --output_dir ${UNIFIED_CKPT_PATH} \
1241 --dtype float16
1342
Original file line number Diff line number Diff line change 1- 225fd4fc55948de398989c334464d4478064b4f7
1+ 1353d8632b255979eac4667d631a90538c07d269
You can’t perform that action at this time.
0 commit comments