Skip to content

Commit

Permalink
Prefix and load aware routing with radix tree kv cache (#719)
Browse files Browse the repository at this point in the history
Implement prefix and load aware routing with radix tree-based cache

- Initial implementation of radix tree-based cache with prefix matching
- Added routing strategy in gateway for prefix-cache-and-load
- Updated tree.go implementation (GPU -> Pod)
- Implemented sophisticated prefill time cost computation for V100
- Added attention quadratic cost calculation
- Fixed bugs in SplitNode and evictNode functionality
- Added proper ModelToPods mapping propagation
- Support for dynamic pod changes
- Optimized longest prefix matching logic
- Updated package paths and cleaned up deprecated functions

Signed-off-by: Gangmuk Lim <[email protected]>
  • Loading branch information
gangmuk authored Feb 22, 2025
1 parent 53696b1 commit d350ca5
Show file tree
Hide file tree
Showing 4 changed files with 828 additions and 82 deletions.
Loading

0 comments on commit d350ca5

Please sign in to comment.