Skip to content

Executor API: How to get throughput #3142

Open
@khayamgondal

Description

@khayamgondal

I am looking at benchmarks/python/kv_cache_offload
/benchmark.py example and trying to figure out how can I get throughput. This example gets TTFT from executor.get_latest_iteration_stats(). I looked at get_latest_iteration_stats() definitions but didn't find any information regarding how to get throughput.
Here is what is available via get_latest_iteration_stats() function
'cpu_mem_usage', 'cross_kv_cache_stats', 'gpu_mem_usage', 'inflight_batching_stats', 'iter', 'iter_latency_ms', 'kv_cache_stats', 'max_num_active_requests', 'new_active_requests_queue_latency_ms', 'num_active_requests', 'num_completed_requests', 'num_new_active_requests', 'num_queued_requests', 'pinned_mem_usage', 'static_batching_stats', 'timestamp', 'to_json_str']

Metadata

Metadata

Labels

InvestigatingPerformanceTRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.triagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions