Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Prefix and load aware routing with radix tree kv cache (#719)
Implement prefix and load aware routing with radix tree-based cache - Initial implementation of radix tree-based cache with prefix matching - Added routing strategy in gateway for prefix-cache-and-load - Updated tree.go implementation (GPU -> Pod) - Implemented sophisticated prefill time cost computation for V100 - Added attention quadratic cost calculation - Fixed bugs in SplitNode and evictNode functionality - Added proper ModelToPods mapping propagation - Support for dynamic pod changes - Optimized longest prefix matching logic - Updated package paths and cleaned up deprecated functions Signed-off-by: Gangmuk Lim <[email protected]>
- Loading branch information