-
Notifications
You must be signed in to change notification settings - Fork 543
Azure matrix 1gpu #6298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Azure matrix 1gpu #6298
Conversation
rsgowman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, though let's get Zach to review too.
hubatish
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the tests are failing. In cloud Build logs (which you probably don't have access to?) I see:
INFO 2025-12-24T16:11:52.906172833Z Step #1: ERROR: testSetStorageSizeGCP (tests.disk_iops_to_capacity_test.DiskIOPSToCapacityTest.testSetStorageSizeGCP)
....
INFO 2025-12-24T16:11:52.906179565Z Step #1: self._cpu_count = int(
INFO 2025-12-24T16:11:52.906180155Z Step #1: ^^^^
INFO 2025-12-24T16:11:52.906180893Z Step #1: TypeError: only 0-dimensional arrays can be converted to Python scalars
..this doesn't look like your code. Possibly just syncing forward will fix it since this is 2 weeks old.
PiperOrigin-RevId: 844816397
This is no longer used and causes the test to fail as of GoogleCloudPlatform#6291 PiperOrigin-RevId: 844830523
model-garden CLI service is properly live now. PiperOrigin-RevId: 844840901
See GoogleCloudPlatform#6272. While refactoring the Run() method, the KubernetesMetricCollector was omitted, causing HPA's KMC to be run during VPA's test. This caused the primary metrics of the test to not be collected. PiperOrigin-RevId: 844844745
PiperOrigin-RevId: 845340374
PiperOrigin-RevId: 845429012
…the existing `RemoteCommand`. PiperOrigin-RevId: 845429895
PiperOrigin-RevId: 845508131
PiperOrigin-RevId: 845833015
Preparation for refactoring this file. PiperOrigin-RevId: 845900151
PiperOrigin-RevId: 845929610
PiperOrigin-RevId: 845943103
PiperOrigin-RevId: 845944984
Also fix resulting pytype errors that this exposed. PiperOrigin-RevId: 846252648
PiperOrigin-RevId: 846416972
1. Always show the kernel command line 2. Show if RT kernel was enabled PiperOrigin-RevId: 846435753
PiperOrigin-RevId: 846453023
PiperOrigin-RevId: 846466524
…re SQL databases. PiperOrigin-RevId: 846591475
PiperOrigin-RevId: 846606398
PiperOrigin-RevId: 846734952
PiperOrigin-RevId: 846749098
PiperOrigin-RevId: 847010068
PiperOrigin-RevId: 851449538
…se for Redis Memtier Benchmark PiperOrigin-RevId: 851458383
…on to include trace flag 9944. PiperOrigin-RevId: 852305770
PiperOrigin-RevId: 852383875
PiperOrigin-RevId: 852420089
…tial source issue from the startup script service PiperOrigin-RevId: 852450925
PiperOrigin-RevId: 852495812
Add support for kubernetes_scale to 1 GPU on AKS Standard:
container/kubernetes_scale/kubernetes_scale.yaml.j2to intall the NVIDIA device plugincloudyaml_docs manifestCommand to run: