[feat]: GPU Optimizer and Simulator development app (#430)

* Integrate Vidur as a LLM simulator. * Test deployment * Fix debug log output Support next request claim. * Integrate gpu optimizer server that exposes customized pod metrics to guide autoscaling. Update reconciler to support metricSources. * bug fix: autoscaler with metricsSources now works. * Decoupled workload monitoring and visualizer Integrated visualizer * Integrate ILP solver and profile to GPU optimizer. Aggregated traces are supported now. * Add support for minimum replicas. * Debugged GPU profile benchmark, generation, and loading from file. * Add redis support for profile exchange. * bug fix: Duplicate 'http://' on calling RestMetricsFetcher.FetchPodMetrics (#408), Add abstractMetricsFetcher to make CustomMetricsFetcher, ResourceMetricsFetcher, and KubernetesMetricsFetcher comply MetricsFetcher. * bug fix * Tuning the granularity of aggregated profile * Adjust request trace to finer granularity. Introduce meta info and version for compatibility * Make request trace self-explanatory on time interval * Apply new request trace schema. * Add Readme.md for demo walkthrough. * Remove model cache * Remove TargetPort changes, which is not used. * Fix deployment * Organize Imports * Python 3.9 format check * Python 3.8 format check * Add a40 deployment * Bug fix * Improve benchmark stability. * Fix python file names. Fix python package reference to start with aibrix.gpu_optimizer Reuse aibrix/runtime image. * python CI test bump to 3.10 --------- Co-authored-by: Jingyuan Zhang <[email protected]> Co-authored-by: Ning Wang <[email protected]>
vllm-project · Nov 27, 2024 · 572c84c · 572c84c
1 parent 0a56301
commit 572c84c
Show file tree

Hide file tree

Showing 49 changed files with 9,686 additions and 39 deletions.
diff --git a/.github/workflows/python-tests.yml b/.github/workflows/python-tests.yml
@@ -14,7 +14,7 @@ jobs:
     runs-on: ubuntu-latest
     strategy:
       matrix:
-        python-version: ["3.8", "3.9", "3.10", "3.11"]
+        python-version: ["3.10", "3.11", "3.12"]
     name: Lint
     steps:
       - name: Check out source repository

diff --git a/.gitignore b/.gitignore
@@ -36,8 +36,11 @@ __pycache__
 docs/build/
 !**/*.template.rst
 
-
 # benchmark logs, result and figs
 benchmarks/autoscaling/logs
 benchmarks/autoscaling/output_stats
-benchmarks/autoscaling/workload_plot
+benchmarks/autoscaling/workload_plot
+
+# simulator cache and output
+docs/development/simulator/simulator_output
+docs/development/simulator/cache
diff --git a/docs/development/simulator/Dockerfile b/docs/development/simulator/Dockerfile
@@ -0,0 +1,33 @@
+# Use the official Python base image
+FROM python:3.10-slim
+
+# Set environment variables
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+ENV WANDB_MODE=disabled
+
+# Set the working directory
+WORKDIR /simulator
+
+# Copy the requirements file into the container
+COPY requirements.txt /simulator/
+
+# Install dependencies
+RUN apt update && apt install -y curl jq git
+
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Copy the rest of the application code into the container
+COPY ./*.py /simulator/
+# COPY ./model_cache /simulator/model_cache
+
+ENV MODEL_NAME=llama2-7b
+ARG GPU_TYPE=a100
+ # Trigger profiling
+RUN python app.py --time_limit 1000 --replica_config_device ${GPU_TYPE}
+
+# Expose the port the app runs on
+EXPOSE 8000
+
+# Run the application
+CMD ["python", "app.py"]
diff --git a/docs/development/simulator/Makefile b/docs/development/simulator/Makefile
diff --git a/docs/development/simulator/README.md b/docs/development/simulator/README.md
@@ -0,0 +1,75 @@
+# vLLM application simulator
+
+## Run locally
+
+Ensure that you have Python 3.10 installed on your system. Refer https://www.bitecode.dev/p/installing-python-the-bare-minimum
+Create a virtual environment using venv module using python3.10 -m venv .venv
+Activate the virtual environment using source .venv/bin/activate
+Install the dependencies using python -m pip install -r requirements.txt
+Run python app.py to start the server.
+Run deactivate to deactivate the virtual environment
+
+## Run in kubernetes
+
+1. Build simulated base model image
+```dockerfile
+docker build -t aibrix/vllm-simulator:nightly -f Dockerfile .
+
+# If you are using Docker-Desktop on Mac, Kubernetes shares the local image repository with Docker.
+# Therefore, the following command is not necessary.
+kind load docker-image aibrix/vllm-simulator:nightly
+```
+
+2. Deploy simulated model image
+```shell
+kubectl apply -f docs/development/simulator/deployment.yaml
+kubectl -n aibrix-system port-forward svc/llama2-7b 8000:8000 1>/dev/null 2>&1 &
+```
+
+## Test python app separately
+
+```shell
+curl http://localhost:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer any_key" \
+  -d '{
+     "model": "llama2-7b",
+     "messages": [{"role": "user", "content": "Say this is a test!"}],
+     "temperature": 0.7
+   }'
+```
+
+```shell
+kubectl delete -f docs/development/simulator/deployment.yaml
+```
+
+## Test with envoy gateway
+
+Add User:
+
+
+Port forward to the User and Envoy service:
+```shell
+kubectl -n aibrix-system port-forward svc/aibrix-gateway-users 8090:8090 1>/dev/null 2>&1 &
+kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc 8888:80 1>/dev/null 2>&1 &
+```
+
+Add User
+```shell
+curl http://localhost:8090/CreateUser \
+  -H "Content-Type: application/json" \
+  -d '{"name": "your-user-name","rpm": 100,"tpm": 1000}'
+```
+
+Test request (ensure header model name matches with deployment's model name for routing)
+```shell
+curl -v http://localhost:8888/v1/chat/completions \
+  -H "user: your-user-name" \
+  -H "model: llama2-7b" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer any_key" \
+  -d '{
+     "model": "llama2-7b",
+     "messages": [{"role": "user", "content": "Say this is a test!"}],
+     "temperature": 0.7
+   }' &