[RFC] Build AI Runtime as different Inference engines adapters #81

Jeffwan · 2024-08-14T03:06:34Z

Summary

This RFC proposes the introduction of an AIRuntime concept, which aims to be a sidecar of the container in Kubernetes. The main goal of AIRuntime is to address several challenges faced in the current architecture and provide a more efficient and flexible solution.

Motivation

Currently, the control-plane side is overly exposed to engine details, requiring frequent adaptations to engine changes. This leads to increased complexity and maintenance efforts.
For example, when the engine undergoes an update that modifies its API or data structure, the control-plane code needs to be promptly updated to align with these changes. This can result in disruptions and potential bugs.
Extracting infra-related features such as metrics conversion, model downloading, and model management from engines helps to modularize the system and improve its maintainability and scalability.
Take metrics conversion as an example. Different engines might have different formats for metrics. By extracting this functionality into AIRuntime, a standardized and consistent metrics conversion process can be implemented, making it easier for the control-plane to consume and analyze the metrics.

Proposed Change

Implement AIRuntime as a sidecar container within the Kubernetes pod. This sidecar will communicate with the main application container and handle the hidden engine details and infra-related features.
Define a clear interface between the main container and the AIRuntime sidecar for data exchange and control flow.
Develop a configuration mechanism to enable flexible customization of AIRuntime's behavior based on different application requirements. For instance, for applications with high model download frequencies, the configuration could prioritize optimizing the download process.

Alternatives Considered

No response

Jeffwan · 2024-08-14T07:44:48Z

/cc @brosoul

brosoul · 2024-08-16T09:29:52Z

Breakdown

Init runtime framework [Core] init aibrix runtime framework #88
- add CI into workflow (lint and format check)
Download model from s3, huggingface/modelscope ... [Core] Add Downloader implementation for runtime #96
- support download settings auto set (threadings, buffersizes)
- support directory download
Sort out the interaction interface and create a proto file.
- described the interaction actions between the main container and AI runtime sidecar
- described the interaction actions between the controller and AI runtime sidecar
Something about injector
- inject AI runtime sidecar to user's deployment
- inject model_exist_path into user's start model command

brosoul · 2024-08-21T11:28:17Z

Runtime framework need more check
- license check in Makefile
- ruff format --check . [Core] Add Downloader implementation for runtime #96

brosoul · 2024-08-29T02:26:50Z

Compare runtime download speed with CLI (s5cmd, tosutil )

Jeffwan added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. area/runtime labels Aug 14, 2024

Jeffwan added this to the v0.1.0 milestone Aug 14, 2024

Jeffwan assigned brosoul Aug 14, 2024

brosoul mentioned this issue Aug 20, 2024

[Core] init aibrix runtime framework #88

Merged

brosoul closed this as completed in #88 Aug 21, 2024

Jeffwan reopened this Aug 21, 2024

brosoul mentioned this issue Aug 26, 2024

[Core] Add Downloader implementation for runtime #96

Merged

brosoul mentioned this issue Aug 29, 2024

Benchmark for AI Runtime models downloading from different sources #105

Closed

Jeffwan closed this as completed in #96 Aug 29, 2024

brosoul mentioned this issue Aug 29, 2024

AI Runtime merge self metrics with enigne metrics #106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Build AI Runtime as different Inference engines adapters #81

[RFC] Build AI Runtime as different Inference engines adapters #81

Jeffwan commented Aug 14, 2024

Jeffwan commented Aug 14, 2024

brosoul commented Aug 16, 2024 •

edited

Loading

brosoul commented Aug 21, 2024 •

edited

Loading

brosoul commented Aug 29, 2024

[RFC] Build AI Runtime as different Inference engines adapters #81

[RFC] Build AI Runtime as different Inference engines adapters #81

Comments

Jeffwan commented Aug 14, 2024

Summary

Motivation

Proposed Change

Alternatives Considered

Jeffwan commented Aug 14, 2024

brosoul commented Aug 16, 2024 • edited Loading

brosoul commented Aug 21, 2024 • edited Loading

brosoul commented Aug 29, 2024

brosoul commented Aug 16, 2024 •

edited

Loading

brosoul commented Aug 21, 2024 •

edited

Loading