diff --git a/docs/source/assets/images/cloud/lambda-cloud-installation.png b/docs/source/assets/images/cloud/lambda-cloud-installation.png new file mode 100644 index 00000000..cddcc0f3 Binary files /dev/null and b/docs/source/assets/images/cloud/lambda-cloud-installation.png differ diff --git a/docs/source/assets/images/cloud/lambda-cloud-instance.png b/docs/source/assets/images/cloud/lambda-cloud-instance.png new file mode 100644 index 00000000..01b03b3e Binary files /dev/null and b/docs/source/assets/images/cloud/lambda-cloud-instance.png differ diff --git a/docs/source/assets/images/cloud/lambda-cloud-ssh.png b/docs/source/assets/images/cloud/lambda-cloud-ssh.png new file mode 100644 index 00000000..a69452d3 Binary files /dev/null and b/docs/source/assets/images/cloud/lambda-cloud-ssh.png differ diff --git a/docs/source/assets/images/cloud/lambda-cloud-verify-installation.png b/docs/source/assets/images/cloud/lambda-cloud-verify-installation.png new file mode 100644 index 00000000..b4ec72db Binary files /dev/null and b/docs/source/assets/images/cloud/lambda-cloud-verify-installation.png differ diff --git a/docs/source/community/community.rst b/docs/source/community/community.rst index 9f229e2c..3dae4a9a 100644 --- a/docs/source/community/community.rst +++ b/docs/source/community/community.rst @@ -14,4 +14,4 @@ Join the channels to start conversations and get support from the community. AiBrix Mailing List ------------------- -The official AIBrix mailing list is hosted on Google Groups at `support@aibrix.ai `_. +The official AIBrix mailing list is hosted on Google Groups at `maintainers@aibrix.ai `_. diff --git a/docs/source/community/research.rst b/docs/source/community/research.rst new file mode 100644 index 00000000..21741779 --- /dev/null +++ b/docs/source/community/research.rst @@ -0,0 +1,41 @@ +.. _research: + +====================== +Research Collaboration +====================== + +At AIBrix, we strongly support system-level research and are committed to fostering collaboration with researchers and academics in the AI infrastructure domain. +Our platform provides a unique opportunity to bridge the gap between theoretical research and real-world production challenges. +If you're a PhD student or researcher looking to explore innovative ideas in AI systems, we'd love to support your work. + +Opportunities for Research +-------------------------- + +- **Unresolved Production Challenges**: The AI production environment presents numerous unresolved problems, from efficient resource allocation to scalable inference architectures. We can provide case studies and real-world scenarios for researchers interested in exploring and solving these challenges. + +- **Research Paper Implementation**: Some research ideas have been integrated into AIBrix, making it a practical landing ground for cutting-edge innovations. We regularly publish `Help Wanted `_ issues on our GitHub, highlighting open research opportunities—feel free to take a look and contribute! + +- **AIBrix as a Research Testbed**: Our system is designed to serve as a research testbed for system-level problems in AI infrastructure. Whether it's testing new scheduling algorithms, optimizing inference latency, or improving resource efficiency, we provide hands-on support to help set up experiments and validate hypotheses. + +Acknowledgments +--------------- + +Many of our innovative ideas have been inspired by academic research, including works such as Preble, Melange, QLM, and MoonCake. Integrating cutting-edge research into a production-grade system has been an enriching journey, enabling us to transform theoretical concepts into real-world applications. These contributions have significantly influenced our work, and we sincerely appreciate the researchers behind them—thank you! + +We also extend our gratitude to the vLLM community for their support in making AIBrix the control plane for vLLM, further strengthening our mission to build scalable and efficient AI infrastructure. + +Get Involved +------------ + +At AIBrix, we actively welcome research collaborations across a wide range of topics, from cloud infrastructure cost optimizations to engine and system co-design innovations. AIBrix offers a robust experimentation platform featuring: + +- Request Routing Strategies +- LLM Specific Autoscaling +- Disaggregated KV Cache Pool +- Serverless & Engine Resource Elasticity +- Large scale inference system tracing and simulation + +Whether you're a researcher, an academic, or an engineer working on GenAI system optimization strategies, we invite you to collaborate with us and contribute to the future of scalable AI infrastructure. + +For inquiries or collaboration opportunities, feel free to reach out to us by cutting Github issues or through the Slack Channel. + diff --git a/docs/source/designs/architecture.rst b/docs/source/designs/architecture.rst index 0044d2d9..f5d522f3 100644 --- a/docs/source/designs/architecture.rst +++ b/docs/source/designs/architecture.rst @@ -8,13 +8,35 @@ An overview of AIBrix’s architecture This guide introduces the AIBrix ecosystem and explains how its components integrate into the LLM inference lifecycle. -AIBrix Ecosystem ----------------- +AIBrix Architecture +------------------- The following diagram gives an overview of the AIBrix Ecosystem and how it relates to the wider Kubernetes and LLM landscapes. .. figure:: ../assets/images/aibrix-architecture-v1.png :alt: aibrix-architecture-v1 - :width: 70% + :width: 100% :align: center +AIBrix contains both :strong:`control plane` components and :strong:`data plane` components. The components of the control plane manage the registration of model metadata, autoscaling, model adapter registration, and enforce various types of policies. Data plane components provide configurable components for dispatching, scheduling, and serving inference requests, enabling flexible and high-performance model execution. + +AIBrix Control Plane +-------------------- + +AIBrix currently provides several control plane components. + +- :strong:`Model Adapter (Lora) controller`: enables multi-LoRA-per-pod deployments, significantly improving scalability and resource efficiency. +- :strong:`RayClusterFleet`: orchestrates multi-node inference, ensuring optimal performance across distributed environments. +- :strong:`LLM-Specific Autoscale`: enables real-time, second-level scaling, leveraging KV cache utilization and inference-aware metrics to dynamically optimize resource allocation +- :strong:`GPU Optimizer`: a profiler based optimizer which optimizes heterogeneous serving, dynamically adjusting allocations to maximize cost-efficiency while maintaining service guarantee +- :strong:`AI Engine Runtime`: a lightweight sidecar that offloads management tasks, enforces policies, and abstracts engine interactions. +- :strong:`Accelerator Diagnose Tools`: provides automated failure detection and mock-up testing to improve fault resilience. + + +AIBrix Data Plane +----------------- + +AIBrix currently provides several data plane components: + +- :strong:`Request Router`: serves as the central request dispatcher, enforcing fairness policies, rate control (TPM/RPM), and workload isolation. +- :strong:`Distributed KV Cache Runtime`: provides scalable, low-latency cache access across nodes. By enabling KV cache reuse, it reduces redundant computation and improves token generation efficiency. diff --git a/docs/source/getting_started/lambda.rst b/docs/source/getting_started/lambda.rst new file mode 100644 index 00000000..2b6dd8fe --- /dev/null +++ b/docs/source/getting_started/lambda.rst @@ -0,0 +1,141 @@ +.. _lambda_cloud_installation: + +================================================= +AIBrix Single-Node Deployment on Lambda Instances +================================================= + +This guide provides a step-by-step tutorial to deploy AIBrix on a single-node Lambda instance for testing purposes. The setup includes installing dependencies, verifying the installation, setting up the cluster, and deploying AIBrix components. + +Prerequisites +------------- + +Before you begin, ensure you have the following: +* A `Lambda Clouds `_ instance with single NVIDIA GPUs +* clone AIBrix code base + +You can follow `lambda cloud docs `_ to launch an instance. + +.. figure:: ../assets/images/cloud/lambda-cloud-instance.png + :alt: lambda-cloud-instance + :width: 70% + :align: center + +After launching the instance, you can get the instance's IP address and ssh into the instance. + +.. figure::../assets/images/cloud/lambda-cloud-ssh.png + :alt: lambda-cloud-ssh + :width: 70% + :align: center + + +Installation Steps +------------------ + +1. Install Dependencies +~~~~~~~~~~~~~~~~~~~~~~~ + +Run the following script to install the necessary dependencies including `nvkind`, `kubectl`, `Helm`, `Go`, and the `NVIDIA Container Toolkit`. + +.. code-block:: bash + + bash hack/lambda-cloud/install.sh + +**install.sh Summary:** +- Installs required system packages (`jq`, `Go`, `kubectl`, `kind`, `Helm`) +- Installs `nvkind` (custom Kubernetes-in-Docker with GPU support) +- Configures the NVIDIA Container Toolkit +- Updates Docker settings for GPU compatibility + +.. figure::../assets/images/cloud/lambda-cloud-installation.png + :alt: lambda-cloud-installation + :width: 70% + :align: center + +Once completed, restart your terminal or run: + +.. code-block:: bash + + source ~/.bashrc + +2. Verify Installation +~~~~~~~~~~~~~~~~~~~~~~ +Run the following script to ensure that the NVIDIA drivers and Docker integration are correctly configured: + +.. code-block:: bash + + bash verify.sh + +**verify.sh Summary:** +- Runs `nvidia-smi` to check GPU availability +- Runs a Docker container with NVIDIA runtime to verify GPU detection +- Ensures that GPU devices are accessible within containers + +If all checks pass successfully like below, proceed to the next step. + +.. figure::../assets/images/cloud/lambda-cloud-verify-installation.png + :alt: lambda-cloud-verify-installation + :width: 70% + :align: center + + +3. Create an `nvkind` Cluster +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Create a Kubernetes cluster using nvkind: + +.. code-block:: bash + + nvkind cluster create --config-template=nvkind-cluster.yaml + +This will set up a single-node cluster with GPU support. Make sure you see `Ready` status for the node: + +.. code-block:: bash + + kubectl get nodes + + +4. Setup NVIDIA GPU Operator +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Run the following script to install the NVIDIA GPU Operator and configure the cloud provider: + +.. code-block:: bash + + bash setup.sh + +**setup.sh Summary:** +- Installs the NVIDIA GPU Operator using Helm +- Installs the Cloud Provider Kind (`cloud-provider-kind`) +- Runs `cloud-provider-kind` in the background for cloud integration + +5. Install AIBrix +~~~~~~~~~~~~~~~~~ +Once the cluster is up and running, install AIBrix components: + +**Install dependencies:** + +.. code-block:: bash + + # install dependencies + kubectl create -k "github.com/aibrix/aibrix/config/dependency?ref=v0.2.0-rc.2" + + # install core components + kubectl create -k "github.com/aibrix/aibrix/config/overlays/release?ref=v0.2.0-rc.2" + +Verify that the AIBrix components are installed successfully: + +.. code-block:: bash + + kubectl get pods -n aibrix-system + + +Conclusion +---------- +You have successfully deployed AIBrix on a single-node Lambda instance. This setup allows for efficient testing and debugging of AIBrix components in a local environment. + +If you encounter issues, ensure that: +- The NVIDIA GPU Operator is correctly installed +- The cluster has GPU resources available (`kubectl describe nodes`) +- Docker and Kubernetes configurations match GPU compatibility requirements + +Happy Testing! diff --git a/docs/source/index.rst b/docs/source/index.rst index f1486421..e8fab8e6 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -17,7 +17,7 @@ Key features: - **LLM App-Tailored Autoscaler**: Dynamically scale inference resources based on real-time demand. - **Unified AI Runtime**: A versatile sidecar enabling metric standardization, model downloading, and management. - **Heterogeneous-GPU Inference**: Cost-effective SLO-driven LLM inference using heterogeneous GPUs. -- **GPU Hardware Failure Detection (TBD)**: Proactive detection of GPU hardware issues. +- **GPU Hardware Failure Detection**: Proactive detection of GPU hardware issues. - **Benchmark Tool (TBD)**: A tool for measuring inference performance and resource efficiency. Documentation @@ -57,3 +57,4 @@ Documentation community/community.rst community/contribution.rst + community/research.rst diff --git a/hack/lambda-cloud/setup.sh b/hack/lambda-cloud/setup.sh index 1b6c8e50..f04785f9 100755 --- a/hack/lambda-cloud/setup.sh +++ b/hack/lambda-cloud/setup.sh @@ -36,7 +36,7 @@ LOG_FILE="/tmp/cloud-provider-kind.log" nohup cloud-provider-kind > ${LOG_FILE} 2>&1 & # Save the process ID -echo $! > /var/run/cloud-provider-kind.pid +echo $! > /tmp/cloud-provider-kind.pid echo "Cloud Provider Kind is running in the background. Logs are being written to ${LOG_FILE}." echo "Setup complete. All components have been installed successfully."