You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -209,19 +210,31 @@ The following table shows the fields that need to be modified before deployment:
209
210
|`tokenizer_dir`| The path to the tokenizer for the model. In this example, the path should be set to `/tensorrtllm_backend/tensorrt_llm/examples/gpt/gpt2` as the tensorrtllm_backend directory will be mounted to `/tensorrtllm_backend` within the container |
210
211
|`tokenizer_type`| The type of the tokenizer for the model, `t5`, `auto` and `llama` are supported. In this example, the type should be set to `auto`|
211
212
212
-
### Launch Triton server*within NGC container*
213
+
### Launch Triton server
213
214
214
-
**The NGC container will be available with Triton 23.10 release soon**
215
+
Please follow the option corresponding to the way you build the TensorRT-LLM backend.
216
+
217
+
#### Option 1. Launch Triton server *within Triton NGC container*
218
+
219
+
```bash
220
+
docker run --rm -it --net host --shm-size=2g --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -v /path/to/tensorrtllm_backend:/tensorrtllm_backend nvcr.io/nvidia/tritonserver:23.10-trtllm-python-py3 bash
221
+
```
215
222
216
-
Before the Triton 23.10 release, you can launch the Triton 23.09 container
217
-
`nvcr.io/nvidia/tritonserver:23.09-py3` and add the directory
218
-
`/opt/tritonserver/backends/tensorrtllm` within the container following the
219
-
instructions in [Option 3 Build via Docker](#option-3-build-via-docker).
223
+
#### Option 2. Launch Triton server *within the Triton container built via build.py script*
224
+
225
+
```bash
226
+
docker run --rm -it --net host --shm-size=2g --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -v /path/to/tensorrtllm_backend:/tensorrtllm_backend tritonserver bash
227
+
```
228
+
229
+
#### Option 3. Launch Triton server *within the Triton container built via Docker*
220
230
221
231
```bash
222
-
# Launch the Triton container
223
232
docker run --rm -it --net host --shm-size=2g --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -v /path/to/tensorrtllm_backend:/tensorrtllm_backend triton_trt_llm bash
233
+
```
234
+
235
+
Once inside the container, you can launch the Triton server with the following command:
224
236
237
+
```bash
225
238
cd /tensorrtllm_backend
226
239
# --world_size is the number of GPUs you want to use for serving
0 commit comments