Skip to content

v0.2.0-rc.2

Pre-release
Pre-release
Compare
Choose a tag to compare
@github-actions github-actions released this 23 Jan 22:23
· 89 commits to main since this release
6ee2f11

Automatically generated release for tag v0.2.0-rc.2.

What's Changed

  • [Bug] Accumulated bug fix on controller manager, mock app configuration, and gpu optimizer. by @zhangjyr in #522
  • [Misc] Reduced runtime's container image size by @nwangfw in #518
  • clean memory scaler object when pa crd is deleted by @kr11 in #520
  • Configure autoscaler http client to skip certificate check by @Jeffwan in #530
  • [Doc] Update aibrix documentation by @Jeffwan in #533
  • Refactor the gateway-plugin and metadata service manifests by @Jeffwan in #531
  • Fix the GITHUB_WORKSPACE artifact sharing issue in release workflow by @Jeffwan in #532
  • [Misc] Polish the benchmark scripts by @Jeffwan in #525
  • Fix APA bugs in creation, add test and demo yaml by @kr11 in #536
  • Add VKE IPv4 Testing Cluster Config by @nwangfw in #537
  • Support for request length internal trace by @happyandslow in #538
  • [Feat] Add download status into runtime downloader by @brosoul in #539
  • [Feat] Add runtime model management api by @brosoul in #540
  • [gateway] handle the wrong model name and cache inconsistency case by @Jeffwan in #542
  • [Docs] fix: update the parameters instruction in readme by @scarlet25151 in #548
  • add lora schedulers - bin pack, least latency, least throughput, random by @Aspirin96 in #544
  • add request routers - least kv cache, least expected latency by @Aspirin96 in #543
  • [Docs] heterogenous gpu docs added by @nwangfw in #545
  • Fix race condition in cache by @varungup90 in #550
  • Fix pod internal cache delete handling by @varungup90 in #552
  • Handle terminating pod for request routing by @varungup90 in #549
  • Support absolute path as lora adapter artifact path by @Jeffwan in #556
  • Deadlock fix for cache by @varungup90 in #557
  • Mock app log fix for missing metrics warning by @varungup90 in #564
  • Add vllm graceful termination configuration by @nwangfw in #568
  • Enhance dynamic lora adapter support for auth enabled scenario by @Jeffwan in #571
  • Update pyproject.toml to support python 3.12 by @Jeffwan in #579
  • [Docs ]Update ai runtime management api and downloader docs by @Jeffwan in #577
  • Check the HPA ownerReference in request enqueue by @Jeffwan in #582
  • Add request length for traces by @happyandslow in #569
  • Support model registration flow using aibrix runtime api by @Jeffwan in #580
  • Gateway plugin report total incoming requests and pending requests by @zhangjyr in #554
  • Support distributed kv cache orchestration by @Jeffwan in #583
  • Grant workflow action permission to write packages by @Jeffwan in #586
  • Update routers to use GetPodModelMetric api and misc cleanup in metri… by @varungup90 in #590
  • Update upload/download artifact github actions version to v4 by @varungup90 in #591
  • Update version in aibrix/python to 0.2.0-rc.2 by @varungup90 in #594

New Contributors

Full Changelog: v0.2.0-rc.1...v0.2.0-rc.2