-
Notifications
You must be signed in to change notification settings - Fork 262
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[feat]: GPU Optimizer and Simulator development app (#430)
* Integrate Vidur as a LLM simulator. * Test deployment * Fix debug log output Support next request claim. * Integrate gpu optimizer server that exposes customized pod metrics to guide autoscaling. Update reconciler to support metricSources. * bug fix: autoscaler with metricsSources now works. * Decoupled workload monitoring and visualizer Integrated visualizer * Integrate ILP solver and profile to GPU optimizer. Aggregated traces are supported now. * Add support for minimum replicas. * Debugged GPU profile benchmark, generation, and loading from file. * Add redis support for profile exchange. * bug fix: Duplicate 'http://' on calling RestMetricsFetcher.FetchPodMetrics (#408), Add abstractMetricsFetcher to make CustomMetricsFetcher, ResourceMetricsFetcher, and KubernetesMetricsFetcher comply MetricsFetcher. * bug fix * Tuning the granularity of aggregated profile * Adjust request trace to finer granularity. Introduce meta info and version for compatibility * Make request trace self-explanatory on time interval * Apply new request trace schema. * Add Readme.md for demo walkthrough. * Remove model cache * Remove TargetPort changes, which is not used. * Fix deployment * Organize Imports * Python 3.9 format check * Python 3.8 format check * Add a40 deployment * Bug fix * Improve benchmark stability. * Fix python file names. Fix python package reference to start with aibrix.gpu_optimizer Reuse aibrix/runtime image. * python CI test bump to 3.10 --------- Co-authored-by: Jingyuan Zhang <[email protected]> Co-authored-by: Ning Wang <[email protected]>
- Loading branch information
1 parent
0a56301
commit 572c84c
Showing
49 changed files
with
9,686 additions
and
39 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# Use the official Python base image | ||
FROM python:3.10-slim | ||
|
||
# Set environment variables | ||
ENV PYTHONDONTWRITEBYTECODE=1 | ||
ENV PYTHONUNBUFFERED=1 | ||
ENV WANDB_MODE=disabled | ||
|
||
# Set the working directory | ||
WORKDIR /simulator | ||
|
||
# Copy the requirements file into the container | ||
COPY requirements.txt /simulator/ | ||
|
||
# Install dependencies | ||
RUN apt update && apt install -y curl jq git | ||
|
||
RUN pip install --no-cache-dir -r requirements.txt | ||
|
||
# Copy the rest of the application code into the container | ||
COPY ./*.py /simulator/ | ||
# COPY ./model_cache /simulator/model_cache | ||
|
||
ENV MODEL_NAME=llama2-7b | ||
ARG GPU_TYPE=a100 | ||
# Trigger profiling | ||
RUN python app.py --time_limit 1000 --replica_config_device ${GPU_TYPE} | ||
|
||
# Expose the port the app runs on | ||
EXPOSE 8000 | ||
|
||
# Run the application | ||
CMD ["python", "app.py"] |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
# vLLM application simulator | ||
|
||
## Run locally | ||
|
||
Ensure that you have Python 3.10 installed on your system. Refer https://www.bitecode.dev/p/installing-python-the-bare-minimum | ||
Create a virtual environment using venv module using python3.10 -m venv .venv | ||
Activate the virtual environment using source .venv/bin/activate | ||
Install the dependencies using python -m pip install -r requirements.txt | ||
Run python app.py to start the server. | ||
Run deactivate to deactivate the virtual environment | ||
|
||
## Run in kubernetes | ||
|
||
1. Build simulated base model image | ||
```dockerfile | ||
docker build -t aibrix/vllm-simulator:nightly -f Dockerfile . | ||
|
||
# If you are using Docker-Desktop on Mac, Kubernetes shares the local image repository with Docker. | ||
# Therefore, the following command is not necessary. | ||
kind load docker-image aibrix/vllm-simulator:nightly | ||
``` | ||
|
||
2. Deploy simulated model image | ||
```shell | ||
kubectl apply -f docs/development/simulator/deployment.yaml | ||
kubectl -n aibrix-system port-forward svc/llama2-7b 8000:8000 1>/dev/null 2>&1 & | ||
``` | ||
|
||
## Test python app separately | ||
|
||
```shell | ||
curl http://localhost:8000/v1/chat/completions \ | ||
-H "Content-Type: application/json" \ | ||
-H "Authorization: Bearer any_key" \ | ||
-d '{ | ||
"model": "llama2-7b", | ||
"messages": [{"role": "user", "content": "Say this is a test!"}], | ||
"temperature": 0.7 | ||
}' | ||
``` | ||
|
||
```shell | ||
kubectl delete -f docs/development/simulator/deployment.yaml | ||
``` | ||
|
||
## Test with envoy gateway | ||
|
||
Add User: | ||
|
||
|
||
Port forward to the User and Envoy service: | ||
```shell | ||
kubectl -n aibrix-system port-forward svc/aibrix-gateway-users 8090:8090 1>/dev/null 2>&1 & | ||
kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc 8888:80 1>/dev/null 2>&1 & | ||
``` | ||
|
||
Add User | ||
```shell | ||
curl http://localhost:8090/CreateUser \ | ||
-H "Content-Type: application/json" \ | ||
-d '{"name": "your-user-name","rpm": 100,"tpm": 1000}' | ||
``` | ||
|
||
Test request (ensure header model name matches with deployment's model name for routing) | ||
```shell | ||
curl -v http://localhost:8888/v1/chat/completions \ | ||
-H "user: your-user-name" \ | ||
-H "model: llama2-7b" \ | ||
-H "Content-Type: application/json" \ | ||
-H "Authorization: Bearer any_key" \ | ||
-d '{ | ||
"model": "llama2-7b", | ||
"messages": [{"role": "user", "content": "Say this is a test!"}], | ||
"temperature": 0.7 | ||
}' & |
Oops, something went wrong.