Executor API: How to get throughput

I am looking at [benchmarks](https://github.com/NVIDIA/TensorRT-LLM/tree/main/benchmarks)/[python](https://github.com/NVIDIA/TensorRT-LLM/tree/main/benchmarks/python)/[kv_cache_offload](https://github.com/NVIDIA/TensorRT-LLM/tree/main/benchmarks/python/kv_cache_offload)
/benchmark.py example and trying to figure out how can I get throughput. This example gets TTFT from `executor.get_latest_iteration_stats().` I looked at get_latest_iteration_stats() definitions but didn't find any information regarding how to get throughput. 
Here is what is available via get_latest_iteration_stats() function 
`'cpu_mem_usage', 'cross_kv_cache_stats', 'gpu_mem_usage', 'inflight_batching_stats', 'iter', 'iter_latency_ms', 'kv_cache_stats', 'max_num_active_requests', 'new_active_requests_queue_latency_ms', 'num_active_requests', 'num_completed_requests', 'num_new_active_requests', 'num_queued_requests', 'pinned_mem_usage', 'static_batching_stats', 'timestamp', 'to_json_str']`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Executor API: How to get throughput #3142

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Executor API: How to get throughput #3142

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions