Skip to content

Commit

Permalink
Merge branch 'gpu-optimizer-orchestration' into issues/484_Controller…
Browse files Browse the repository at this point in the history
…_failed_to_fetch_metrics_from_MetricSource

# Conflicts:
#	development/simulator/deployment-a100.yaml
#	development/simulator/deployment-a40.yaml
  • Loading branch information
Jingyuan Zhang committed Dec 6, 2024
2 parents d2be10a + 90cd690 commit e544c12
Show file tree
Hide file tree
Showing 7 changed files with 75 additions and 3 deletions.
1 change: 1 addition & 0 deletions config/default/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ resources:
- ../rbac
- ../manager
- ../gateway
- ../gpu-optimizer
- ../dependency/kuberay-operator
# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in
# crd/kustomization.yaml
Expand Down
26 changes: 26 additions & 0 deletions config/gpu-optimizer/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: gpu-optimizer
namespace: aibrix-system
spec:
replicas: 1
selector:
matchLabels:
app: gpu-optimizer
template:
metadata:
labels:
app: gpu-optimizer
spec:
serviceAccountName: gpu-optimizer-sa
automountServiceAccountToken: true
containers:
- name: gpu-optimizer
image: aibrix/runtime:nightly
command: ["python", "-m", "aibrix.gpu_optimizer.app"]
ports:
- containerPort: 8080
env:
- name: REDIS_HOST
value: aibrix-redis-master.aibrix-system.svc.cluster.local
4 changes: 4 additions & 0 deletions config/gpu-optimizer/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
resources:
- deployment.yaml
- service.yaml
- rbac.yaml
27 changes: 27 additions & 0 deletions config/gpu-optimizer/rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: gpu-optimizer-sa
namespace: aibrix-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: gpu-optimizer-clusterrole
rules:
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gpu-optimizer-clusterrole-binding
subjects:
- kind: ServiceAccount
name: gpu-optimizer-sa
namespace: aibrix-system
roleRef:
kind: ClusterRole
name: gpu-optimizer-clusterrole
apiGroup: rbac.authorization.k8s.io
13 changes: 13 additions & 0 deletions config/gpu-optimizer/service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: v1
kind: Service
metadata:
name: gpu-optimizer
namespace: aibrix-system
spec:
selector:
app: gpu-optimizer
ports:
- protocol: TCP
port: 8080
targetPort: 8080
type: ClusterIP
1 change: 1 addition & 0 deletions config/overlays/vke/default/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ resources:
- ../../../rbac
- manager
- gateway
- ../../../gpu-optimizer
- ../../../dependency/kuberay-operator


Expand Down
6 changes: 3 additions & 3 deletions python/aibrix/aibrix/gpu_optimizer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,15 @@ kubectl -n aibrix-system port-forward svc/aibrix-redis-master 6379:6379 1>/dev/n
# Or use make
make debug-init

python optimizer/profiling/gen-profile.py simulator-llama2-7b-a100 -o "redis://localhost:6379/?model=llama2-7b"
python optimizer/profiling/gen_profile.py simulator-llama2-7b-a100 -o "redis://localhost:6379/?model=llama2-7b"
# Or use make
make DP=simulator-llama2-7b-a100 gen-profile
```

5. Deploy GPU Optimizer
```shell
kubectl apply -f deployment.yaml
kubectl -n aibrix-system port-forward svc/gpu-optimizer 8080:8080 1>/dev/null 2>&1 &
kubectl -n aibrix-system port-forward svc/aibrix-gpu-optimizer 8080:8080 1>/dev/null 2>&1 &

# Or use make
make deploy
Expand All @@ -47,7 +47,7 @@ make deploy
5. Start workload and see how model scale. Benchmark toolkit can be used to generate workload as:
```shell
# Make sure gateway's local access, see docs/development/simulator/README.md for details.
python optimizer/profiling/gpu-benchmark.py --backend=vllm --port 8888 --request-rate=10 --num-prompts=100 --input_len 2000 --output_len 128 --model=llama2-7b
python optimizer/profiling/gpu_benchmark.py --backend=vllm --port 8888 --request-rate=10 --num-prompts=100 --input_len 2000 --output_len 128 --model=llama2-7b
```

6. Observability: visit http://localhost:8080/dash/llama2-7b for workload pattern visualization. A independent visualization demo can access by:
Expand Down

0 comments on commit e544c12

Please sign in to comment.