v0.2.0-rc.2
Pre-release
Pre-release
·
89 commits
to main
since this release
Automatically generated release for tag v0.2.0-rc.2.
What's Changed
- [Bug] Accumulated bug fix on controller manager, mock app configuration, and gpu optimizer. by @zhangjyr in #522
- [Misc] Reduced runtime's container image size by @nwangfw in #518
- clean memory scaler object when pa crd is deleted by @kr11 in #520
- Configure autoscaler http client to skip certificate check by @Jeffwan in #530
- [Doc] Update aibrix documentation by @Jeffwan in #533
- Refactor the gateway-plugin and metadata service manifests by @Jeffwan in #531
- Fix the GITHUB_WORKSPACE artifact sharing issue in release workflow by @Jeffwan in #532
- [Misc] Polish the benchmark scripts by @Jeffwan in #525
- Fix APA bugs in creation, add test and demo yaml by @kr11 in #536
- Add VKE IPv4 Testing Cluster Config by @nwangfw in #537
- Support for request length internal trace by @happyandslow in #538
- [Feat] Add download status into runtime downloader by @brosoul in #539
- [Feat] Add runtime model management api by @brosoul in #540
- [gateway] handle the wrong model name and cache inconsistency case by @Jeffwan in #542
- [Docs] fix: update the parameters instruction in readme by @scarlet25151 in #548
- add lora schedulers - bin pack, least latency, least throughput, random by @Aspirin96 in #544
- add request routers - least kv cache, least expected latency by @Aspirin96 in #543
- [Docs] heterogenous gpu docs added by @nwangfw in #545
- Fix race condition in cache by @varungup90 in #550
- Fix pod internal cache delete handling by @varungup90 in #552
- Handle terminating pod for request routing by @varungup90 in #549
- Support absolute path as lora adapter artifact path by @Jeffwan in #556
- Deadlock fix for cache by @varungup90 in #557
- Mock app log fix for missing metrics warning by @varungup90 in #564
- Add vllm graceful termination configuration by @nwangfw in #568
- Enhance dynamic lora adapter support for auth enabled scenario by @Jeffwan in #571
- Update pyproject.toml to support python 3.12 by @Jeffwan in #579
- [Docs ]Update ai runtime management api and downloader docs by @Jeffwan in #577
- Check the HPA ownerReference in request enqueue by @Jeffwan in #582
- Add request length for traces by @happyandslow in #569
- Support model registration flow using aibrix runtime api by @Jeffwan in #580
- Gateway plugin report total incoming requests and pending requests by @zhangjyr in #554
- Support distributed kv cache orchestration by @Jeffwan in #583
- Grant workflow action permission to write packages by @Jeffwan in #586
- Update routers to use GetPodModelMetric api and misc cleanup in metri… by @varungup90 in #590
- Update upload/download artifact github actions version to v4 by @varungup90 in #591
- Update version in aibrix/python to 0.2.0-rc.2 by @varungup90 in #594
New Contributors
- @scarlet25151 made their first contribution in #548
- @Aspirin96 made their first contribution in #544
Full Changelog: v0.2.0-rc.1...v0.2.0-rc.2