-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat]: GPU Optimizer and Simulator development app #430
Merged
Merged
Changes from all commits
Commits
Show all changes
45 commits
Select commit
Hold shift + click to select a range
c3e1f94
Integrate Vidur as a LLM simulator.
52d2ccb
Merge commit 'main' into jingyuan/autoscaler
92f141d
Test deployment
b2a914f
Merge commit 'f9702c3d8da7ab4d0e6fd8c5877c1e57ce528fb9' into jingyuan…
d75cbb8
Fix debug log output
45d2801
Merge commit '5d8d8439077b08b12fd3de62ad7c4e6e2fb6e4ed' into jingyuan…
702346f
Integrate gpu optimizer server that exposes customized pod metrics to…
c1b317c
Merge commit 'ea5dc7784fa767fcf40b61417627ab47b6dba426' into jingyuan…
2ce7c10
bug fix: autoscaler with metricsSources now works.
4d811c4
Decoupled workload monitoring and visualizer
e64052b
Integrate ILP solver and profile to GPU optimizer. Aggregated traces …
6066cc0
Add support for minimum replicas.
9951e11
Merge branch 'main' into jingyuan/autoscaler
9abf0d4
Debugged GPU profile benchmark, generation, and loading from file.
6ce6e87
Add redis support for profile exchange.
22e94c5
bug fix: Duplicate 'http://' on calling RestMetricsFetcher.FetchPodMe…
9604ac0
bug fix
5803c6d
Merge branch 'issues/408_Duplicated_http_in_RestMetricsFetcher' into …
3d7d28a
Tuning the granularity of aggregated profile
c4b5a64
Adjust request trace to finer granularity.
74491b4
Make request trace self-explanatory on time interval
6fe9c5f
Merge branch 'jingyuan/finer_profile_granuality' into jingyuan/autosc…
e7f5e2f
Apply new request trace schema.
dba5205
Add Readme.md for demo walkthrough.
e8514c9
Merge branch 'main' into jingyuan/finer_profile_granuality
d39fdcb
Merge branch 'jingyuan/finer_profile_granuality' into jingyuan/autosc…
b828e51
Remove model cache
984feb1
Remove TargetPort changes, which is not used.
60a9364
Fix deployment
548ab63
Organize Imports
5afab5c
Lint fix
d43162a
Lint fix
715ab4b
ruff reformat
599f820
Passed mypy
6c357a6
Lint fix
d05a087
pass mypy for redis
8c835c8
ruff again
d77cd5a
Pass lint
009b341
Python 3.9 format check
019afd9
Python 3.8 format check
b609c33
Add a40 deployment
6e84e6f
Bug fix
a573d44
Improve benchmark stability.
69c4b47
Fix python file names.
469955b
python CI test bump to 3.10
nwangfw File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# Use the official Python base image | ||
FROM python:3.10-slim | ||
|
||
# Set environment variables | ||
ENV PYTHONDONTWRITEBYTECODE=1 | ||
ENV PYTHONUNBUFFERED=1 | ||
ENV WANDB_MODE=disabled | ||
|
||
# Set the working directory | ||
WORKDIR /simulator | ||
|
||
# Copy the requirements file into the container | ||
COPY requirements.txt /simulator/ | ||
|
||
# Install dependencies | ||
RUN apt update && apt install -y curl jq git | ||
|
||
RUN pip install --no-cache-dir -r requirements.txt | ||
|
||
# Copy the rest of the application code into the container | ||
COPY ./*.py /simulator/ | ||
# COPY ./model_cache /simulator/model_cache | ||
|
||
ENV MODEL_NAME=llama2-7b | ||
ARG GPU_TYPE=a100 | ||
# Trigger profiling | ||
RUN python app.py --time_limit 1000 --replica_config_device ${GPU_TYPE} | ||
|
||
# Expose the port the app runs on | ||
EXPOSE 8000 | ||
|
||
# Run the application | ||
CMD ["python", "app.py"] |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
# vLLM application simulator | ||
|
||
## Run locally | ||
|
||
Ensure that you have Python 3.10 installed on your system. Refer https://www.bitecode.dev/p/installing-python-the-bare-minimum | ||
Create a virtual environment using venv module using python3.10 -m venv .venv | ||
Activate the virtual environment using source .venv/bin/activate | ||
Install the dependencies using python -m pip install -r requirements.txt | ||
Run python app.py to start the server. | ||
Run deactivate to deactivate the virtual environment | ||
|
||
## Run in kubernetes | ||
|
||
1. Build simulated base model image | ||
```dockerfile | ||
docker build -t aibrix/vllm-simulator:nightly -f Dockerfile . | ||
|
||
# If you are using Docker-Desktop on Mac, Kubernetes shares the local image repository with Docker. | ||
# Therefore, the following command is not necessary. | ||
kind load docker-image aibrix/vllm-simulator:nightly | ||
``` | ||
|
||
2. Deploy simulated model image | ||
```shell | ||
kubectl apply -f docs/development/simulator/deployment.yaml | ||
kubectl -n aibrix-system port-forward svc/llama2-7b 8000:8000 1>/dev/null 2>&1 & | ||
``` | ||
|
||
## Test python app separately | ||
|
||
```shell | ||
curl http://localhost:8000/v1/chat/completions \ | ||
-H "Content-Type: application/json" \ | ||
-H "Authorization: Bearer any_key" \ | ||
-d '{ | ||
"model": "llama2-7b", | ||
"messages": [{"role": "user", "content": "Say this is a test!"}], | ||
"temperature": 0.7 | ||
}' | ||
``` | ||
|
||
```shell | ||
kubectl delete -f docs/development/simulator/deployment.yaml | ||
``` | ||
|
||
## Test with envoy gateway | ||
|
||
Add User: | ||
|
||
|
||
Port forward to the User and Envoy service: | ||
```shell | ||
kubectl -n aibrix-system port-forward svc/aibrix-gateway-users 8090:8090 1>/dev/null 2>&1 & | ||
kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc 8888:80 1>/dev/null 2>&1 & | ||
``` | ||
|
||
Add User | ||
```shell | ||
curl http://localhost:8090/CreateUser \ | ||
-H "Content-Type: application/json" \ | ||
-d '{"name": "your-user-name","rpm": 100,"tpm": 1000}' | ||
``` | ||
|
||
Test request (ensure header model name matches with deployment's model name for routing) | ||
```shell | ||
curl -v http://localhost:8888/v1/chat/completions \ | ||
-H "user: your-user-name" \ | ||
-H "model: llama2-7b" \ | ||
-H "Content-Type: application/json" \ | ||
-H "Authorization: Bearer any_key" \ | ||
-d '{ | ||
"model": "llama2-7b", | ||
"messages": [{"role": "user", "content": "Say this is a test!"}], | ||
"temperature": 0.7 | ||
}' & |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we remove the necesssarity of user here. This can be skipped but this is minor. We can do some clean ups later.