Skip to content

Commit

Permalink
[feat]: GPU Optimizer and Simulator development app (#430)
Browse files Browse the repository at this point in the history
* Integrate Vidur as a LLM simulator.

* Test deployment

* Fix debug log output
Support next request claim.

* Integrate gpu optimizer server that exposes customized pod metrics to guide autoscaling.
Update reconciler to support metricSources.

* bug fix: autoscaler with metricsSources now works.

* Decoupled workload monitoring and visualizer
Integrated visualizer

* Integrate ILP solver and profile to GPU optimizer. Aggregated traces are supported now.

* Add support for minimum replicas.

* Debugged GPU profile benchmark, generation, and loading from file.

* Add redis support for profile exchange.

* bug fix: Duplicate 'http://' on calling RestMetricsFetcher.FetchPodMetrics (#408), Add abstractMetricsFetcher to make CustomMetricsFetcher, ResourceMetricsFetcher, and KubernetesMetricsFetcher comply MetricsFetcher.

* bug fix

* Tuning the granularity of aggregated profile

* Adjust request trace to finer granularity.
Introduce meta info and version for compatibility

* Make request trace self-explanatory on time interval

* Apply new request trace schema.

* Add Readme.md for demo walkthrough.

* Remove model cache

* Remove TargetPort changes, which is not used.

* Fix deployment

* Organize Imports

* Python 3.9 format check

* Python 3.8 format check

* Add a40 deployment

* Bug fix

* Improve benchmark stability.

* Fix python file names.
Fix python package reference to start with aibrix.gpu_optimizer
Reuse aibrix/runtime image.

* python CI test bump to 3.10

---------

Co-authored-by: Jingyuan Zhang <[email protected]>
Co-authored-by: Ning Wang <[email protected]>
  • Loading branch information
3 people authored Nov 27, 2024
1 parent 0a56301 commit 572c84c
Show file tree
Hide file tree
Showing 49 changed files with 9,686 additions and 39 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/python-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11"]
python-version: ["3.10", "3.11", "3.12"]
name: Lint
steps:
- name: Check out source repository
Expand Down
7 changes: 5 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,11 @@ __pycache__
docs/build/
!**/*.template.rst


# benchmark logs, result and figs
benchmarks/autoscaling/logs
benchmarks/autoscaling/output_stats
benchmarks/autoscaling/workload_plot
benchmarks/autoscaling/workload_plot

# simulator cache and output
docs/development/simulator/simulator_output
docs/development/simulator/cache
33 changes: 33 additions & 0 deletions docs/development/simulator/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Use the official Python base image
FROM python:3.10-slim

# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV WANDB_MODE=disabled

# Set the working directory
WORKDIR /simulator

# Copy the requirements file into the container
COPY requirements.txt /simulator/

# Install dependencies
RUN apt update && apt install -y curl jq git

RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application code into the container
COPY ./*.py /simulator/
# COPY ./model_cache /simulator/model_cache

ENV MODEL_NAME=llama2-7b
ARG GPU_TYPE=a100
# Trigger profiling
RUN python app.py --time_limit 1000 --replica_config_device ${GPU_TYPE}

# Expose the port the app runs on
EXPOSE 8000

# Run the application
CMD ["python", "app.py"]
80 changes: 80 additions & 0 deletions docs/development/simulator/Makefile

Large diffs are not rendered by default.

75 changes: 75 additions & 0 deletions docs/development/simulator/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# vLLM application simulator

## Run locally

Ensure that you have Python 3.10 installed on your system. Refer https://www.bitecode.dev/p/installing-python-the-bare-minimum
Create a virtual environment using venv module using python3.10 -m venv .venv
Activate the virtual environment using source .venv/bin/activate
Install the dependencies using python -m pip install -r requirements.txt
Run python app.py to start the server.
Run deactivate to deactivate the virtual environment

## Run in kubernetes

1. Build simulated base model image
```dockerfile
docker build -t aibrix/vllm-simulator:nightly -f Dockerfile .

# If you are using Docker-Desktop on Mac, Kubernetes shares the local image repository with Docker.
# Therefore, the following command is not necessary.
kind load docker-image aibrix/vllm-simulator:nightly
```

2. Deploy simulated model image
```shell
kubectl apply -f docs/development/simulator/deployment.yaml
kubectl -n aibrix-system port-forward svc/llama2-7b 8000:8000 1>/dev/null 2>&1 &
```

## Test python app separately

```shell
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer any_key" \
-d '{
"model": "llama2-7b",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}'
```

```shell
kubectl delete -f docs/development/simulator/deployment.yaml
```

## Test with envoy gateway

Add User:


Port forward to the User and Envoy service:
```shell
kubectl -n aibrix-system port-forward svc/aibrix-gateway-users 8090:8090 1>/dev/null 2>&1 &
kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc 8888:80 1>/dev/null 2>&1 &
```

Add User
```shell
curl http://localhost:8090/CreateUser \
-H "Content-Type: application/json" \
-d '{"name": "your-user-name","rpm": 100,"tpm": 1000}'
```

Test request (ensure header model name matches with deployment's model name for routing)
```shell
curl -v http://localhost:8888/v1/chat/completions \
-H "user: your-user-name" \
-H "model: llama2-7b" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer any_key" \
-d '{
"model": "llama2-7b",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}' &
Loading

0 comments on commit 572c84c

Please sign in to comment.