Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat]: GPU Optimizer and Simulator development app #430

Merged
merged 45 commits into from
Nov 27, 2024
Merged
Show file tree
Hide file tree
Changes from 42 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
c3e1f94
Integrate Vidur as a LLM simulator.
Sep 26, 2024
52d2ccb
Merge commit 'main' into jingyuan/autoscaler
Sep 27, 2024
92f141d
Test deployment
Oct 3, 2024
b2a914f
Merge commit 'f9702c3d8da7ab4d0e6fd8c5877c1e57ce528fb9' into jingyuan…
Oct 3, 2024
d75cbb8
Fix debug log output
Oct 8, 2024
45d2801
Merge commit '5d8d8439077b08b12fd3de62ad7c4e6e2fb6e4ed' into jingyuan…
Oct 24, 2024
702346f
Integrate gpu optimizer server that exposes customized pod metrics to…
Oct 28, 2024
c1b317c
Merge commit 'ea5dc7784fa767fcf40b61417627ab47b6dba426' into jingyuan…
Oct 28, 2024
2ce7c10
bug fix: autoscaler with metricsSources now works.
Oct 28, 2024
4d811c4
Decoupled workload monitoring and visualizer
Oct 31, 2024
e64052b
Integrate ILP solver and profile to GPU optimizer. Aggregated traces …
Nov 8, 2024
6066cc0
Add support for minimum replicas.
Nov 11, 2024
9951e11
Merge branch 'main' into jingyuan/autoscaler
Nov 11, 2024
9abf0d4
Debugged GPU profile benchmark, generation, and loading from file.
Nov 15, 2024
6ce6e87
Add redis support for profile exchange.
Nov 18, 2024
22e94c5
bug fix: Duplicate 'http://' on calling RestMetricsFetcher.FetchPodMe…
Nov 20, 2024
9604ac0
bug fix
Nov 20, 2024
5803c6d
Merge branch 'issues/408_Duplicated_http_in_RestMetricsFetcher' into …
Nov 20, 2024
3d7d28a
Tuning the granularity of aggregated profile
Nov 21, 2024
c4b5a64
Adjust request trace to finer granularity.
Nov 21, 2024
74491b4
Make request trace self-explanatory on time interval
Nov 21, 2024
6fe9c5f
Merge branch 'jingyuan/finer_profile_granuality' into jingyuan/autosc…
Nov 21, 2024
e7f5e2f
Apply new request trace schema.
Nov 22, 2024
dba5205
Add Readme.md for demo walkthrough.
Nov 22, 2024
e8514c9
Merge branch 'main' into jingyuan/finer_profile_granuality
Nov 22, 2024
d39fdcb
Merge branch 'jingyuan/finer_profile_granuality' into jingyuan/autosc…
Nov 22, 2024
b828e51
Remove model cache
Nov 22, 2024
984feb1
Remove TargetPort changes, which is not used.
Nov 22, 2024
60a9364
Fix deployment
Nov 22, 2024
548ab63
Organize Imports
Nov 22, 2024
5afab5c
Lint fix
Nov 23, 2024
d43162a
Lint fix
Nov 23, 2024
715ab4b
ruff reformat
Nov 23, 2024
599f820
Passed mypy
Nov 23, 2024
6c357a6
Lint fix
Nov 23, 2024
d05a087
pass mypy for redis
Nov 23, 2024
8c835c8
ruff again
Nov 23, 2024
d77cd5a
Pass lint
Nov 23, 2024
009b341
Python 3.9 format check
Nov 23, 2024
019afd9
Python 3.8 format check
Nov 23, 2024
b609c33
Add a40 deployment
Nov 25, 2024
6e84e6f
Bug fix
Nov 26, 2024
a573d44
Improve benchmark stability.
Nov 26, 2024
69c4b47
Fix python file names.
Nov 27, 2024
469955b
python CI test bump to 3.10
nwangfw Nov 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,21 @@ go.work.sum
.DS_Store
__pycache__

# Python virtual environment directory
.venv

# Jupyter notebook
.ipynb_checkpoints/

# Sphinx documentation
docs/build/
!**/*.template.rst


# benchmark logs, result and figs
benchmarks/autoscaling/logs
benchmarks/autoscaling/output_stats
benchmarks/autoscaling/workload_plot
benchmarks/autoscaling/workload_plot

# simulator cache and output
docs/development/simulator/simulator_output
docs/development/simulator/cache
32 changes: 32 additions & 0 deletions docs/development/simulator/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Use the official Python base image
FROM python:3.10-slim

# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV WANDB_MODE=disabled

# Set the working directory
WORKDIR /simulator

# Copy the requirements file into the container
COPY requirements.txt /simulator/

# Install dependencies
RUN apt update && apt install -y curl jq git

RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application code into the container
COPY ./*.py /simulator/
# COPY ./model_cache /simulator/model_cache

ENV MODEL_NAME=llama2-7b
# Trigger profiling
RUN python app.py --time_limit 1000 --replica_config_device a100

# Expose the port the app runs on
EXPOSE 8000

# Run the application
CMD ["python", "app.py"]
32 changes: 32 additions & 0 deletions docs/development/simulator/Dockerfile-a40
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Use the official Python base image
FROM python:3.10-slim

# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV WANDB_MODE=disabled

# Set the working directory
WORKDIR /simulator

# Copy the requirements file into the container
COPY requirements.txt /simulator/

# Install dependencies
RUN apt update && apt install -y curl jq git

RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application code into the container
COPY ./*.py /simulator/
# COPY ./model_cache /simulator/model_cache

ENV MODEL_NAME=llama2-7b
# Trigger profiling
RUN python app.py --time_limit 1000 --replica_config_device a40

# Expose the port the app runs on
EXPOSE 8000

# Run the application
CMD ["python", "app.py"]
80 changes: 80 additions & 0 deletions docs/development/simulator/Makefile

Large diffs are not rendered by default.

106 changes: 106 additions & 0 deletions docs/development/simulator/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# vLLM application simulator

## Run locally

Ensure that you have Python 3.10 installed on your system. Refer https://www.bitecode.dev/p/installing-python-the-bare-minimum
Create a virtual environment using venv module using python3.10 -m venv .venv
Activate the virtual environment using source .venv/bin/activate
Install the dependencies using python -m pip install -r requirements.txt
Run python app.py to start the server.
Run deactivate to deactivate the virtual environment

## Run in kubernetes

1. Build simulated base model image
```dockerfile
docker build -t aibrix/vllm-simulator:nightly -f Dockerfile .

# If you are using Docker-Desktop on Mac, Kubernetes shares the local image repository with Docker.
# Therefore, the following command is not necessary.
kind load docker-image aibrix/vllm-simulator:nightly
```

2. Deploy simulated model image
```shell
kubectl apply -f docs/development/simulator/deployment.yaml
kubectl -n aibrix-system port-forward svc/llama2-7b 8000:8000 1>/dev/null 2>&1 &
```

## Test python app separately

```shell
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer any_key" \
-d '{
"model": "llama2-7b",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}'
```

```shell
kubectl delete -f docs/development/simulator/deployment.yaml
```

## Test with envoy gateway

Add User:


Port forward to the User and Envoy service:
```shell
kubectl -n aibrix-system port-forward svc/aibrix-gateway-users 8090:8090 1>/dev/null 2>&1 &
kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc 8888:80 1>/dev/null 2>&1 &
```

Add User
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we remove the necesssarity of user here. This can be skipped but this is minor. We can do some clean ups later.

```shell
curl http://localhost:8090/CreateUser \
-H "Content-Type: application/json" \
-d '{"name": "your-user-name","rpm": 100,"tpm": 1000}'
```

Test request (ensure header model name matches with deployment's model name for routing)
```shell
curl -v http://localhost:8888/v1/chat/completions \
-H "user: your-user-name" \
-H "model: llama2-7b" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer any_key" \
-d '{
"model": "llama2-7b",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}' &

# least-request based
for i in {1..10}; do
curl -v http://localhost:8888/v1/chat/completions \
-H "user: your-user-name" \
-H "routing-strategy: least-request" \
-H "model: llama2-7b" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer any_key" \
-d '{
"model": "llama2-7b",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}' &
done

# throughput based
for i in {1..10}; do
curl -v http://localhost:8888/v1/chat/completions \
-H "user: your-user-name" \
-H "routing-strategy: throughput" \
-H "model: llama2-7b" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer any_key" \
-d '{
"model": "llama2-7b",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}' &
done
```
Loading
Loading