Skip to content

Commit

Permalink
Add GPU Optimizer deployment and update configurations
Browse files Browse the repository at this point in the history
  • Loading branch information
nwangfw committed Dec 4, 2024
1 parent 7eafc59 commit 90cd690
Show file tree
Hide file tree
Showing 9 changed files with 85 additions and 7 deletions.
1 change: 1 addition & 0 deletions config/default/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ resources:
- ../rbac
- ../manager
- ../gateway
- ../gpu-optimizer
- ../dependency/kuberay-operator
# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in
# crd/kustomization.yaml
Expand Down
30 changes: 30 additions & 0 deletions config/gpu-optimizer/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: gpu-optimizer
namespace: aibrix-system
spec:
replicas: 1
selector:
matchLabels:
app: gpu-optimizer
template:
metadata:
labels:
app: gpu-optimizer
spec:
serviceAccountName: gpu-optimizer
automountServiceAccountToken: true
containers:
- name: gpu-optimizer
image: aibrix/runtime:nightly
command: ["python", "-m", "aibrix.gpu_optimizer.app"]
ports:
- containerPort: 8080
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: REDIS_HOST
value: aibrix-redis-master.aibrix-system.svc.cluster.local
4 changes: 4 additions & 0 deletions config/gpu-optimizer/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
resources:
- deployment.yaml
- service.yaml
- rbac.yaml
29 changes: 29 additions & 0 deletions config/gpu-optimizer/rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: gpu-optimizer
namespace: aibrix-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: aibrix-system
name: gpu-optimizer
rules:
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: gpu-optimizer
namespace: aibrix-system
subjects:
- kind: ServiceAccount
name: gpu-optimizer
namespace: aibrix-system
roleRef:
kind: Role
name: gpu-optimizer
apiGroup: rbac.authorization.k8s.io
13 changes: 13 additions & 0 deletions config/gpu-optimizer/service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: v1
kind: Service
metadata:
name: gpu-optimizer
namespace: aibrix-system
spec:
selector:
app: gpu-optimizer
ports:
- protocol: TCP
port: 8080
targetPort: 8080
type: ClusterIP
1 change: 1 addition & 0 deletions config/overlays/vke/default/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ resources:
- ../../../rbac
- manager
- gateway
- ../../../gpu-optimizer
- ../../../dependency/kuberay-operator


Expand Down
4 changes: 2 additions & 2 deletions development/simulator/deployment-a100.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -150,11 +150,11 @@ spec:
apiVersion: apps/v1
kind: Deployment
name: simulator-llama2-7b-a100
minReplicas: 0
minReplicas: 1
maxReplicas: 10
targetMetric: "avg_prompt_throughput_toks_per_s" # Ignore if metricsSources is configured
metricsSources:
- endpoint: gpu-optimizer.aibrix-system.svc.cluster.local:8080
- endpoint: aibrix-gpu-optimizer.aibrix-system.svc.cluster.local:8080
path: /metrics/aibrix-system/simulator-llama2-7b-a100
metric: "vllm:deployment_replicas"
targetValue: "1"
Expand Down
4 changes: 2 additions & 2 deletions development/simulator/deployment-a40.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -149,11 +149,11 @@ spec:
apiVersion: apps/v1
kind: Deployment
name: simulator-llama2-7b-a40
minReplicas: 0
minReplicas: 1
maxReplicas: 10
targetMetric: "avg_prompt_throughput_toks_per_s" # Ignore if metricsSources is configured
metricsSources:
- endpoint: gpu-optimizer.aibrix-system.svc.cluster.local:8080
- endpoint: aibrix-gpu-optimizer.aibrix-system.svc.cluster.local:8080
path: /metrics/aibrix-system/simulator-llama2-7b-a40
metric: "vllm:deployment_replicas"
targetValue: "1"
Expand Down
6 changes: 3 additions & 3 deletions python/aibrix/aibrix/gpu_optimizer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,15 @@ kubectl -n aibrix-system port-forward svc/aibrix-redis-master 6379:6379 1>/dev/n
# Or use make
make debug-init

python optimizer/profiling/gen-profile.py simulator-llama2-7b-a100 -o "redis://localhost:6379/?model=llama2-7b"
python optimizer/profiling/gen_profile.py simulator-llama2-7b-a100 -o "redis://localhost:6379/?model=llama2-7b"
# Or use make
make DP=simulator-llama2-7b-a100 gen-profile
```

5. Deploy GPU Optimizer
```shell
kubectl apply -f deployment.yaml
kubectl -n aibrix-system port-forward svc/gpu-optimizer 8080:8080 1>/dev/null 2>&1 &
kubectl -n aibrix-system port-forward svc/aibrix-gpu-optimizer 8080:8080 1>/dev/null 2>&1 &

# Or use make
make deploy
Expand All @@ -47,7 +47,7 @@ make deploy
5. Start workload and see how model scale. Benchmark toolkit can be used to generate workload as:
```shell
# Make sure gateway's local access, see docs/development/simulator/README.md for details.
python optimizer/profiling/gpu-benchmark.py --backend=vllm --port 8888 --request-rate=10 --num-prompts=100 --input_len 2000 --output_len 128 --model=llama2-7b
python optimizer/profiling/gpu_benchmark.py --backend=vllm --port 8888 --request-rate=10 --num-prompts=100 --input_len 2000 --output_len 128 --model=llama2-7b
```

6. Observability: visit http://localhost:8080/dash/llama2-7b for workload pattern visualization. A independent visualization demo can access by:
Expand Down

0 comments on commit 90cd690

Please sign in to comment.