Skip to content

Commit

Permalink
Add a research section, update architecture and lambda guidance (#663)
Browse files Browse the repository at this point in the history
* Add a research collaboration section in the documentation

* Add control plane and data plane details in architecture page

* Add guidance to deploy AIBrix on lambda cloud

* Fix rst grammer issue

---------

Signed-off-by: Jiaxin Shan <[email protected]>
  • Loading branch information
Jeffwan authored Feb 13, 2025
1 parent 36cd9b9 commit 60de0c1
Show file tree
Hide file tree
Showing 10 changed files with 211 additions and 6 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/source/community/community.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ Join the channels to start conversations and get support from the community.
AiBrix Mailing List
-------------------

The official AIBrix mailing list is hosted on Google Groups at `support@aibrix.ai <support@aibrix.ai>`_.
The official AIBrix mailing list is hosted on Google Groups at `maintainers@aibrix.ai <maintainers@aibrix.ai>`_.
41 changes: 41 additions & 0 deletions docs/source/community/research.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
.. _research:

======================
Research Collaboration
======================

At AIBrix, we strongly support system-level research and are committed to fostering collaboration with researchers and academics in the AI infrastructure domain.
Our platform provides a unique opportunity to bridge the gap between theoretical research and real-world production challenges.
If you're a PhD student or researcher looking to explore innovative ideas in AI systems, we'd love to support your work.

Opportunities for Research
--------------------------

- **Unresolved Production Challenges**: The AI production environment presents numerous unresolved problems, from efficient resource allocation to scalable inference architectures. We can provide case studies and real-world scenarios for researchers interested in exploring and solving these challenges.

- **Research Paper Implementation**: Some research ideas have been integrated into AIBrix, making it a practical landing ground for cutting-edge innovations. We regularly publish `Help Wanted <https://github.com/aibrix/aibrix/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22help%20wanted%22>`_ issues on our GitHub, highlighting open research opportunities—feel free to take a look and contribute!

- **AIBrix as a Research Testbed**: Our system is designed to serve as a research testbed for system-level problems in AI infrastructure. Whether it's testing new scheduling algorithms, optimizing inference latency, or improving resource efficiency, we provide hands-on support to help set up experiments and validate hypotheses.

Acknowledgments
---------------

Many of our innovative ideas have been inspired by academic research, including works such as Preble, Melange, QLM, and MoonCake. Integrating cutting-edge research into a production-grade system has been an enriching journey, enabling us to transform theoretical concepts into real-world applications. These contributions have significantly influenced our work, and we sincerely appreciate the researchers behind them—thank you!

We also extend our gratitude to the vLLM community for their support in making AIBrix the control plane for vLLM, further strengthening our mission to build scalable and efficient AI infrastructure.

Get Involved
------------

At AIBrix, we actively welcome research collaborations across a wide range of topics, from cloud infrastructure cost optimizations to engine and system co-design innovations. AIBrix offers a robust experimentation platform featuring:

- Request Routing Strategies
- LLM Specific Autoscaling
- Disaggregated KV Cache Pool
- Serverless & Engine Resource Elasticity
- Large scale inference system tracing and simulation

Whether you're a researcher, an academic, or an engineer working on GenAI system optimization strategies, we invite you to collaborate with us and contribute to the future of scalable AI infrastructure.

For inquiries or collaboration opportunities, feel free to reach out to us by cutting Github issues or through the Slack Channel.

28 changes: 25 additions & 3 deletions docs/source/designs/architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,35 @@ An overview of AIBrix’s architecture

This guide introduces the AIBrix ecosystem and explains how its components integrate into the LLM inference lifecycle.

AIBrix Ecosystem
----------------
AIBrix Architecture
-------------------

The following diagram gives an overview of the AIBrix Ecosystem and how it relates to the wider Kubernetes and LLM landscapes.

.. figure:: ../assets/images/aibrix-architecture-v1.png
:alt: aibrix-architecture-v1
:width: 70%
:width: 100%
:align: center

AIBrix contains both :strong:`control plane` components and :strong:`data plane` components. The components of the control plane manage the registration of model metadata, autoscaling, model adapter registration, and enforce various types of policies. Data plane components provide configurable components for dispatching, scheduling, and serving inference requests, enabling flexible and high-performance model execution.

AIBrix Control Plane
--------------------

AIBrix currently provides several control plane components.

- :strong:`Model Adapter (Lora) controller`: enables multi-LoRA-per-pod deployments, significantly improving scalability and resource efficiency.
- :strong:`RayClusterFleet`: orchestrates multi-node inference, ensuring optimal performance across distributed environments.
- :strong:`LLM-Specific Autoscale`: enables real-time, second-level scaling, leveraging KV cache utilization and inference-aware metrics to dynamically optimize resource allocation
- :strong:`GPU Optimizer`: a profiler based optimizer which optimizes heterogeneous serving, dynamically adjusting allocations to maximize cost-efficiency while maintaining service guarantee
- :strong:`AI Engine Runtime`: a lightweight sidecar that offloads management tasks, enforces policies, and abstracts engine interactions.
- :strong:`Accelerator Diagnose Tools`: provides automated failure detection and mock-up testing to improve fault resilience.


AIBrix Data Plane
-----------------

AIBrix currently provides several data plane components:

- :strong:`Request Router`: serves as the central request dispatcher, enforcing fairness policies, rate control (TPM/RPM), and workload isolation.
- :strong:`Distributed KV Cache Runtime`: provides scalable, low-latency cache access across nodes. By enabling KV cache reuse, it reduces redundant computation and improves token generation efficiency.
141 changes: 141 additions & 0 deletions docs/source/getting_started/lambda.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
.. _lambda_cloud_installation:

=================================================
AIBrix Single-Node Deployment on Lambda Instances
=================================================

This guide provides a step-by-step tutorial to deploy AIBrix on a single-node Lambda instance for testing purposes. The setup includes installing dependencies, verifying the installation, setting up the cluster, and deploying AIBrix components.

Prerequisites
-------------

Before you begin, ensure you have the following:
* A `Lambda Clouds <https://lambdalabs.com/>`_ instance with single NVIDIA GPUs
* clone AIBrix code base

You can follow `lambda cloud docs <https://docs.lambdalabs.com/>`_ to launch an instance.

.. figure:: ../assets/images/cloud/lambda-cloud-instance.png
:alt: lambda-cloud-instance
:width: 70%
:align: center

After launching the instance, you can get the instance's IP address and ssh into the instance.

.. figure::../assets/images/cloud/lambda-cloud-ssh.png
:alt: lambda-cloud-ssh
:width: 70%
:align: center


Installation Steps
------------------

1. Install Dependencies
~~~~~~~~~~~~~~~~~~~~~~~

Run the following script to install the necessary dependencies including `nvkind`, `kubectl`, `Helm`, `Go`, and the `NVIDIA Container Toolkit`.

.. code-block:: bash
bash hack/lambda-cloud/install.sh
**install.sh Summary:**
- Installs required system packages (`jq`, `Go`, `kubectl`, `kind`, `Helm`)
- Installs `nvkind` (custom Kubernetes-in-Docker with GPU support)
- Configures the NVIDIA Container Toolkit
- Updates Docker settings for GPU compatibility

.. figure::../assets/images/cloud/lambda-cloud-installation.png
:alt: lambda-cloud-installation
:width: 70%
:align: center

Once completed, restart your terminal or run:

.. code-block:: bash
source ~/.bashrc
2. Verify Installation
~~~~~~~~~~~~~~~~~~~~~~
Run the following script to ensure that the NVIDIA drivers and Docker integration are correctly configured:

.. code-block:: bash
bash verify.sh
**verify.sh Summary:**
- Runs `nvidia-smi` to check GPU availability
- Runs a Docker container with NVIDIA runtime to verify GPU detection
- Ensures that GPU devices are accessible within containers

If all checks pass successfully like below, proceed to the next step.

.. figure::../assets/images/cloud/lambda-cloud-verify-installation.png
:alt: lambda-cloud-verify-installation
:width: 70%
:align: center


3. Create an `nvkind` Cluster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Create a Kubernetes cluster using nvkind:

.. code-block:: bash
nvkind cluster create --config-template=nvkind-cluster.yaml
This will set up a single-node cluster with GPU support. Make sure you see `Ready` status for the node:

.. code-block:: bash
kubectl get nodes
4. Setup NVIDIA GPU Operator
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Run the following script to install the NVIDIA GPU Operator and configure the cloud provider:

.. code-block:: bash
bash setup.sh
**setup.sh Summary:**
- Installs the NVIDIA GPU Operator using Helm
- Installs the Cloud Provider Kind (`cloud-provider-kind`)
- Runs `cloud-provider-kind` in the background for cloud integration

5. Install AIBrix
~~~~~~~~~~~~~~~~~
Once the cluster is up and running, install AIBrix components:

**Install dependencies:**

.. code-block:: bash
# install dependencies
kubectl create -k "github.com/aibrix/aibrix/config/dependency?ref=v0.2.0-rc.2"
# install core components
kubectl create -k "github.com/aibrix/aibrix/config/overlays/release?ref=v0.2.0-rc.2"
Verify that the AIBrix components are installed successfully:

.. code-block:: bash
kubectl get pods -n aibrix-system
Conclusion
----------
You have successfully deployed AIBrix on a single-node Lambda instance. This setup allows for efficient testing and debugging of AIBrix components in a local environment.

If you encounter issues, ensure that:
- The NVIDIA GPU Operator is correctly installed
- The cluster has GPU resources available (`kubectl describe nodes`)
- Docker and Kubernetes configurations match GPU compatibility requirements

Happy Testing!
3 changes: 2 additions & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Key features:
- **LLM App-Tailored Autoscaler**: Dynamically scale inference resources based on real-time demand.
- **Unified AI Runtime**: A versatile sidecar enabling metric standardization, model downloading, and management.
- **Heterogeneous-GPU Inference**: Cost-effective SLO-driven LLM inference using heterogeneous GPUs.
- **GPU Hardware Failure Detection (TBD)**: Proactive detection of GPU hardware issues.
- **GPU Hardware Failure Detection**: Proactive detection of GPU hardware issues.
- **Benchmark Tool (TBD)**: A tool for measuring inference performance and resource efficiency.

Documentation
Expand Down Expand Up @@ -57,3 +57,4 @@ Documentation

community/community.rst
community/contribution.rst
community/research.rst
2 changes: 1 addition & 1 deletion hack/lambda-cloud/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ LOG_FILE="/tmp/cloud-provider-kind.log"
nohup cloud-provider-kind > ${LOG_FILE} 2>&1 &

# Save the process ID
echo $! > /var/run/cloud-provider-kind.pid
echo $! > /tmp/cloud-provider-kind.pid
echo "Cloud Provider Kind is running in the background. Logs are being written to ${LOG_FILE}."

echo "Setup complete. All components have been installed successfully."

0 comments on commit 60de0c1

Please sign in to comment.