[RFC] Build AI Runtime as different Inference engines adapters #81
Labels
area/runtime
priority/important-longterm
Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone
Summary
This RFC proposes the introduction of an AIRuntime concept, which aims to be a sidecar of the container in Kubernetes. The main goal of AIRuntime is to address several challenges faced in the current architecture and provide a more efficient and flexible solution.
Motivation
Currently, the control-plane side is overly exposed to engine details, requiring frequent adaptations to engine changes. This leads to increased complexity and maintenance efforts.
For example, when the engine undergoes an update that modifies its API or data structure, the control-plane code needs to be promptly updated to align with these changes. This can result in disruptions and potential bugs.
Extracting infra-related features such as metrics conversion, model downloading, and model management from engines helps to modularize the system and improve its maintainability and scalability.
Take metrics conversion as an example. Different engines might have different formats for metrics. By extracting this functionality into AIRuntime, a standardized and consistent metrics conversion process can be implemented, making it easier for the control-plane to consume and analyze the metrics.
Proposed Change
Implement AIRuntime as a sidecar container within the Kubernetes pod. This sidecar will communicate with the main application container and handle the hidden engine details and infra-related features.
Define a clear interface between the main container and the AIRuntime sidecar for data exchange and control flow.
Develop a configuration mechanism to enable flexible customization of AIRuntime's behavior based on different application requirements. For instance, for applications with high model download frequencies, the configuration could prioritize optimizing the download process.
Alternatives Considered
No response
The text was updated successfully, but these errors were encountered: