-
Notifications
You must be signed in to change notification settings - Fork 543
Azure matrix 1gpu #6298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
vofish
wants to merge
52
commits into
GoogleCloudPlatform:master
Choose a base branch
from
kiryl-filatau:azure-matrix-1gpu
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+50
−0
Open
Azure matrix 1gpu #6298
Changes from all commits
Commits
Show all changes
52 commits
Select commit
Hold shift + click to select a range
0402b02
Add Azure 1 GPU kubernetes_scale
vofish a5ca221
Add newline
vofish 0914882
Address comments
vofish 9fc7703
Remove unnecessary line
vofish 113874f
Remove unnecessary line
vofish 6173672
Put NVIDIA plugin yaml to data/container/azure folder
vofish dbe0c2b
Add precommit pyink formatter
vofish 7532b2c
Add newline
vofish d84a68d
Update rev., add args
vofish 6baa2f7
Update pre-commit hook args
vofish 9ae39a3
Update the Postgres sysbench configuration logic.
b957e2b
Remove the node selector for (new) vpa test
b2eb291
Fix Vertex AI DNS endpoint & remove beta
hubatish be9f67a
Fix to rampup test to ensure correct metric collector runs
6909b40
Add Azure support to ai-inference
vofish 843eb01
Address comments
vofish 21868f1
Fix tests
vofish c1ea5bb
Fix tests
vofish ecf6162
Add _ProvisionGPUNodePool method
vofish bc2f146
Revert blob-public-access
vofish f151363
Fix linting issues
vofish 85b9239
Fix linting issues
vofish 5868fe4
Support triggering migration multiple times.
raymond13513 01373cf
Implement backup restore for DSQL
ScottLinnn 16df725
Update base Windows mixin to offer `RemoteCommandWithReturnCode` and …
jacklacey11 e719647
Run llama4 16e-instruct rather than 16e (without).
hubatish 2d49929
Add '.j2' to Azure blobfuse2 config file.
andyz422 589836c
Move container_service into it's own directory/module
6489719
Refactor kubernetes items out of container_service/__init__.py
85464b9
Refactor base items out of container_service/__init__.py
b4f887a
Default capture_live_migration_timestamps to true.
raymond13513 37d046e
Refactor BUILD file to create container_service library
305f713
Move yaml code from container_service -> vm_util
hubatish 5239ccb
Update Linux VM metadata to:
pmkc a90ede8
Allow configuring worker count in Trino
hubatish b22e16d
Support gs:// URLs in --ycsb_tar_url
eeb5acf
Add support for specifying storage type, IOPS, and throughput for Azu…
bvliu c5bbbd9
Refactor relational DB metrics collection.
bvliu 84285f1
Add Azure Flexible Server metrics implementation.
bvliu 86f7c04
Fix sysbench sleep duration.
bvliu 0f93c5c
Added GCE SQL Server 2025 images to PKB
a0748fc
Remove unused disk_iops_to_capacity module.
jellyfishcake 536e4ca
Add the option to enable `--redis_aof_verify` at the end of `Run` pha…
f5a8ccc
Add support for MSSQL 2025 on Linux and update SQL Server configurati…
429e3b9
Pass region to aws command
ScottLinnn 002e015
Support exporting metrics for multiple triggers for disruption.
raymond13513 75a3d7c
Reclassify timeouts on startup script retrieval to indicate the poten…
jacklacey11 a970e51
Allow configurable compaction strategy for cassandra
Arushi-07 ee7dfd7
Fix Create test
vofish 5ba0a44
Merge branch 'master' into azure-matrix-1gpu
vofish 38ec118
Remove pre-commit hook
vofish 6ced164
Apply linting
vofish File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
41 changes: 41 additions & 0 deletions
41
perfkitbenchmarker/data/container/azure/nvidia-device-plugin.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| # According to the official Microsoft documentation, the NVIDIA device plugin | ||
| # must be deployed as a DaemonSet to enable GPU support in the Kubernetes cluster. | ||
| # Reference: https://learn.microsoft.com/en-us/azure/aks/use-nvidia-gpu?tabs=add-ubuntu-gpu-node-pool#nvidia-device-plugin-installation | ||
| apiVersion: apps/v1 | ||
| kind: DaemonSet | ||
| metadata: | ||
| name: nvidia-device-plugin-daemonset | ||
| namespace: kube-system | ||
| spec: | ||
| selector: | ||
| matchLabels: | ||
| name: nvidia-device-plugin-ds | ||
| updateStrategy: | ||
| type: RollingUpdate | ||
| template: | ||
| metadata: | ||
| labels: | ||
| name: nvidia-device-plugin-ds | ||
| spec: | ||
| tolerations: | ||
| - key: 'nvidia.com/gpu' | ||
| operator: Exists | ||
| effect: NoSchedule | ||
| priorityClassName: 'system-node-critical' | ||
| containers: | ||
| - image: nvcr.io/nvidia/k8s-device-plugin:v0.18.0 | ||
| name: nvidia-device-plugin-ctr | ||
| env: | ||
| - name: FAIL_ON_INIT_ERROR | ||
| value: 'false' | ||
| securityContext: | ||
| allowPrivilegeEscalation: false | ||
| capabilities: | ||
| drop: ['ALL'] | ||
| volumeMounts: | ||
| - name: device-plugin | ||
| mountPath: /var/lib/kubelet/device-plugins | ||
| volumes: | ||
| - name: device-plugin | ||
| hostPath: | ||
| path: /var/lib/kubelet/device-plugins |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.