From dcda62302315d85310418997af8bc67b2b6b38ea Mon Sep 17 00:00:00 2001 From: Haiyang Shi Date: Tue, 18 Feb 2025 16:14:12 -0800 Subject: [PATCH] [Docs] Add feature description of dist kv cache in README (#705) Signed-off-by: Dwyane Shi --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index d22e2ead..1793c733 100644 --- a/README.md +++ b/README.md @@ -11,6 +11,7 @@ The initial release includes the following key features: - **Distributed Inference**: Scalable architecture to handle large workloads across multiple nodes. - **LLM App-Tailored Autoscaler**: Dynamically scale inference resources based on real-time demand. - **Unified AI Runtime**: A versatile sidecar enabling metric standardization, model downloading, and management. +- **Distributed KV Cache**: Enables high-capacity, cross-engine KV reuse. - **GPU Hardware Failure Detection (TBD)**: Proactive detection of GPU hardware issues. - **Benchmark Tool (TBD)**: A tool for measuring inference performance and resource efficiency.