File tree Expand file tree Collapse file tree 4 files changed +35
-8
lines changed Expand file tree Collapse file tree 4 files changed +35
-8
lines changed Original file line number Diff line number Diff line change @@ -45,8 +45,6 @@ repo. If you don't find your answer there you can ask questions on the
45
45
46
46
There are several ways to access the TensorRT-LLM Backend.
47
47
48
- ** Before Triton 23.10 release, please use [ Option 3 to build TensorRT-LLM backend via Docker] ( #option-3-build-via-docker ) .**
49
-
50
48
### Run the Pre-built Docker Container
51
49
52
50
Starting with Triton 23.10 release, Triton includes a container with the TensorRT-LLM
Original file line number Diff line number Diff line change
1
+ ## End to end workflow to run llama 7b
1
2
2
- ## End to end workflow to run llama
3
+ 0 . Make sure that you have initialized the TRT-LLM submodule:
3
4
4
- * Build engine
5
+ ``` bash
6
+ git lfs install
7
+ git submodule update --init --recursive
8
+ ```
9
+
10
+ 1 . (Optional) Download the LLaMa model from HuggingFace:
11
+
12
+ ``` bash
13
+ huggingface-cli login
14
+
15
+ huggingface-cli download meta-llama/Llama-2-7b-hf
16
+ ```
17
+
18
+ > ** NOTE**
19
+ >
20
+ > Make sure that you have access to https://huggingface.co/meta-llama/Llama-2-7b-hf .
21
+
22
+ 2 . Start the Triton Server Docker container:
23
+
24
+ ``` bash
25
+ # Replace <yy.mm> with the version of Triton you want to use.
26
+ # The command below assumes the the current directory is the
27
+ # TRT-LLM backend root git repository.
28
+
29
+ docker run --rm -ti -v ` pwd` :/mnt -w /mnt -v ~ /.cache/huggingface:~ /.cache/huggingface --gpus all nvcr.io/nvidia/tritonserver:\< yy.mm\> -trtllm-python-py3 bash
30
+ ```
5
31
32
+ 3 . Build the engine:
6
33
``` bash
7
- export HF_LLAMA_MODEL=llama-7b-hf/
34
+ # Replace 'HF_LLAMA_MODE' with another path if you didn't download the model from step 1
35
+ # or you're not using HuggingFace.
36
+ export HF_LLAMA_MODEL=` python3 -c " from pathlib import Path; from huggingface_hub import hf_hub_download; print(Path(hf_hub_download('meta-llama/Llama-2-7b-hf', filename='config.json')).parent)" `
8
37
export UNIFIED_CKPT_PATH=/tmp/ckpt/llama/7b/
9
38
export ENGINE_PATH=/tmp/engines/llama/7b/
10
- python convert_checkpoint.py --model_dir ${HF_LLAMA_MODEL} \
39
+ python tensorrt_llm/examples/llama/ convert_checkpoint.py --model_dir ${HF_LLAMA_MODEL} \
11
40
--output_dir ${UNIFIED_CKPT_PATH} \
12
41
--dtype float16
13
42
Original file line number Diff line number Diff line change 1
- 225fd4fc55948de398989c334464d4478064b4f7
1
+ 1353d8632b255979eac4667d631a90538c07d269
You can’t perform that action at this time.
0 commit comments