Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Integrate deployment configurations and fix autoscaler/gpu optimizer connectivity #492

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions config/default/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ resources:
- ../rbac
- ../manager
- ../gateway
- ../gpu-optimizer
- ../dependency/kuberay-operator
# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in
# crd/kustomization.yaml
Expand Down
26 changes: 26 additions & 0 deletions config/gpu-optimizer/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: gpu-optimizer
namespace: aibrix-system
spec:
replicas: 1
selector:
matchLabels:
app: gpu-optimizer
template:
metadata:
labels:
app: gpu-optimizer
spec:
serviceAccountName: gpu-optimizer-sa
automountServiceAccountToken: true
containers:
- name: gpu-optimizer
image: aibrix/runtime:nightly
command: ["python", "-m", "aibrix.gpu_optimizer.app"]
ports:
- containerPort: 8080
env:
- name: REDIS_HOST
value: aibrix-redis-master.aibrix-system.svc.cluster.local
4 changes: 4 additions & 0 deletions config/gpu-optimizer/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
resources:
- deployment.yaml
- service.yaml
- rbac.yaml
27 changes: 27 additions & 0 deletions config/gpu-optimizer/rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: gpu-optimizer-sa
namespace: aibrix-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: gpu-optimizer-clusterrole
rules:
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gpu-optimizer-clusterrole-binding
subjects:
- kind: ServiceAccount
name: gpu-optimizer-sa
namespace: aibrix-system
roleRef:
kind: ClusterRole
name: gpu-optimizer-clusterrole
apiGroup: rbac.authorization.k8s.io
13 changes: 13 additions & 0 deletions config/gpu-optimizer/service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: v1
kind: Service
metadata:
name: gpu-optimizer
namespace: aibrix-system
spec:
selector:
app: gpu-optimizer
ports:
- protocol: TCP
port: 8080
targetPort: 8080
type: ClusterIP
1 change: 1 addition & 0 deletions config/overlays/vke/default/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ resources:
- ../../../rbac
- manager
- gateway
- ../../../gpu-optimizer
- ../../../dependency/kuberay-operator


Expand Down
6 changes: 3 additions & 3 deletions python/aibrix/aibrix/gpu_optimizer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,15 @@ kubectl -n aibrix-system port-forward svc/aibrix-redis-master 6379:6379 1>/dev/n
# Or use make
make debug-init

python optimizer/profiling/gen-profile.py simulator-llama2-7b-a100 -o "redis://localhost:6379/?model=llama2-7b"
python optimizer/profiling/gen_profile.py simulator-llama2-7b-a100 -o "redis://localhost:6379/?model=llama2-7b"
# Or use make
make DP=simulator-llama2-7b-a100 gen-profile
```

5. Deploy GPU Optimizer
```shell
kubectl apply -f deployment.yaml
kubectl -n aibrix-system port-forward svc/gpu-optimizer 8080:8080 1>/dev/null 2>&1 &
kubectl -n aibrix-system port-forward svc/aibrix-gpu-optimizer 8080:8080 1>/dev/null 2>&1 &

# Or use make
make deploy
Expand All @@ -47,7 +47,7 @@ make deploy
5. Start workload and see how model scale. Benchmark toolkit can be used to generate workload as:
```shell
# Make sure gateway's local access, see docs/development/simulator/README.md for details.
python optimizer/profiling/gpu-benchmark.py --backend=vllm --port 8888 --request-rate=10 --num-prompts=100 --input_len 2000 --output_len 128 --model=llama2-7b
python optimizer/profiling/gpu_benchmark.py --backend=vllm --port 8888 --request-rate=10 --num-prompts=100 --input_len 2000 --output_len 128 --model=llama2-7b
```

6. Observability: visit http://localhost:8080/dash/llama2-7b for workload pattern visualization. A independent visualization demo can access by:
Expand Down
4 changes: 3 additions & 1 deletion python/aibrix/aibrix/gpu_optimizer/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,7 +265,9 @@ def main(signal, timeout):

# List existing deployments
logger.info(f"Looking for deployments with {MODEL_LABEL}")
deployments = apps_v1.list_deployment_for_all_namespaces(label_selector=MODEL_LABEL)
deployments = apps_v1.list_deployment_for_all_namespaces(
label_selector=MODEL_LABEL
)
watch_version = deployments.metadata.resource_version
logger.debug(f"last watch version: {watch_version}")
for deployment in deployments.items:
Expand Down
Loading