Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Build AI Runtime as different Inference engines adapters #81

Closed
Jeffwan opened this issue Aug 14, 2024 · 4 comments · Fixed by #88 or #96
Closed

[RFC] Build AI Runtime as different Inference engines adapters #81

Jeffwan opened this issue Aug 14, 2024 · 4 comments · Fixed by #88 or #96
Assignees
Labels
area/runtime priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@Jeffwan
Copy link
Collaborator

Jeffwan commented Aug 14, 2024

Summary

This RFC proposes the introduction of an AIRuntime concept, which aims to be a sidecar of the container in Kubernetes. The main goal of AIRuntime is to address several challenges faced in the current architecture and provide a more efficient and flexible solution.

Motivation

  1. Currently, the control-plane side is overly exposed to engine details, requiring frequent adaptations to engine changes. This leads to increased complexity and maintenance efforts.
    For example, when the engine undergoes an update that modifies its API or data structure, the control-plane code needs to be promptly updated to align with these changes. This can result in disruptions and potential bugs.

  2. Extracting infra-related features such as metrics conversion, model downloading, and model management from engines helps to modularize the system and improve its maintainability and scalability.
    Take metrics conversion as an example. Different engines might have different formats for metrics. By extracting this functionality into AIRuntime, a standardized and consistent metrics conversion process can be implemented, making it easier for the control-plane to consume and analyze the metrics.

Proposed Change

  1. Implement AIRuntime as a sidecar container within the Kubernetes pod. This sidecar will communicate with the main application container and handle the hidden engine details and infra-related features.

  2. Define a clear interface between the main container and the AIRuntime sidecar for data exchange and control flow.

  3. Develop a configuration mechanism to enable flexible customization of AIRuntime's behavior based on different application requirements. For instance, for applications with high model download frequencies, the configuration could prioritize optimizing the download process.

Alternatives Considered

No response

@Jeffwan Jeffwan added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. area/runtime labels Aug 14, 2024
@Jeffwan Jeffwan added this to the v0.1.0 milestone Aug 14, 2024
@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Aug 14, 2024

/cc @brosoul

@brosoul
Copy link
Collaborator

brosoul commented Aug 16, 2024

Breakdown

  • Init runtime framework [Core] init aibrix runtime framework #88
    • add CI into workflow (lint and format check)
  • Download model from s3, huggingface/modelscope ... [Core] Add Downloader implementation for runtime #96
    • support download settings auto set (threadings, buffersizes)
    • support directory download
  • Sort out the interaction interface and create a proto file.
    • described the interaction actions between the main container and AI runtime sidecar
    • described the interaction actions between the controller and AI runtime sidecar
  • Something about injector
    • inject AI runtime sidecar to user's deployment
    • inject model_exist_path into user's start model command

@brosoul
Copy link
Collaborator

brosoul commented Aug 21, 2024

@brosoul
Copy link
Collaborator

brosoul commented Aug 29, 2024

  • Compare runtime download speed with CLI (s5cmd, tosutil )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/runtime priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
2 participants