Open
Description
I am looking at benchmarks/python/kv_cache_offload
/benchmark.py example and trying to figure out how can I get throughput. This example gets TTFT from executor.get_latest_iteration_stats().
I looked at get_latest_iteration_stats() definitions but didn't find any information regarding how to get throughput.
Here is what is available via get_latest_iteration_stats() function
'cpu_mem_usage', 'cross_kv_cache_stats', 'gpu_mem_usage', 'inflight_batching_stats', 'iter', 'iter_latency_ms', 'kv_cache_stats', 'max_num_active_requests', 'new_active_requests_queue_latency_ms', 'num_active_requests', 'num_completed_requests', 'num_new_active_requests', 'num_queued_requests', 'pinned_mem_usage', 'static_batching_stats', 'timestamp', 'to_json_str']