You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+77Lines changed: 77 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -377,6 +377,83 @@ You might have to contact your cluster's administrator to help you customize the
377
377
pkill tritonserver
378
378
```
379
379
380
+
## Triton Metrics
381
+
Starting with the 23.11 release of Triton, users can now obtain TRT LLM Batch Manager [statistics](https://github.com/NVIDIA/TensorRT-LLM/blob/ffd5af342a817a2689d38e4af2cc59ded877e339/docs/source/batch_manager.md#statistics) by querying the Triton metrics endpoint. This can be accomplished by launching a Triton server in any of the ways described above (ensuring the build code / container is 23.11 or later) and querying the sever with the generate endpoint. Upon receiving a successful response, you can query the metrics endpoint by entering the following:
382
+
```bash
383
+
curl localhost:8002/metrics
384
+
```
385
+
Batch manager statistics are reported by the metrics endpoint in fields that are prefixed with `nv_trt_llm_`. Your output for these fields should look similar to the following (assuming your model is an inflight batcher model):
386
+
```bash
387
+
# HELP nv_trt_llm_request_statistics TRT LLM request metrics
If, instead, you launched a V1 model, your output will look similar to the output above except the inflight batcher related fields will be replaced with something similar to the following:
415
+
```bash
416
+
# HELP nv_trt_llm_v1_statistics TRT LLM v1-specific metrics
Please note that as of the 23.11 Triton release, a link between base Triton metrics (such as inference request count and latency) is being actively developed, but is not yet supported.
423
+
As such, the following fields will report 0:
424
+
```bash
425
+
# HELP nv_inference_request_success Number of successful inference requests, all batch sizes
0 commit comments