feat: add MPS GPU sharing settings model #107

yeazelm · 2025-12-31T22:48:42Z

Issue #: bottlerocket-os/bottlerocket#4673

Description of changes:
Add NvidiaMpsSettings struct and Mps variant to NvidiaDeviceSharingStrategy for NVIDIA Multi-Process Service support. Includes validation to prevent MPS and MIG from being enabled simultaneously.

Testing:
The testing was documented in the related PR: bottlerocket-os/bottlerocket-core-kit#789

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Add NvidiaMpsSettings struct and Mps variant to NvidiaDeviceSharingStrategy for NVIDIA Multi-Process Service support. Includes validation to prevent MPS and MIG from being enabled simultaneously. Signed-off-by: Matthew Yeazel <[email protected]>

piyush-jena

LGTM apart from the nitpicky request for an additional unit test. Nice work coding the conflict between MIG and MPS.

piyush-jena · 2026-01-02T17:03:00Z

bottlerocket-settings-models/settings-extensions/kubelet-device-plugins/src/lib.rs

+        let test_json = r#"{"nvidia":{"pass-device-specs":true,"device-id-strategy":"index","device-list-strategy":"volume-mounts","device-sharing-strategy":"mps","mps":{"replicas":4},"device-partitioning-strategy":"none"}}"#;
+
+        let device_plugins: KubeletDevicePluginsV1 = serde_json::from_str(test_json).unwrap();
+        assert_eq!(


Can you add an assert for replicas? and an unit test for the default number of replicas = 2?

bcressey · 2026-01-09T18:46:38Z

bottlerocket-settings-models/modeled-types/src/kubernetes.rs

+// Define the bounds for the `mps.replicas` field
+const MPS_REPLICAS_MIN: i32 = 2;
+const MPS_REPLICAS_MAX: i32 = i32::MAX;


These are actual bounds? It's not possible to set only one replica, and the software is fine with 2 billion replicas?

bcressey · 2026-01-09T18:52:35Z

bottlerocket-settings-models/settings-extensions/kubelet-device-plugins/src/lib.rs

+    fn validate(value: Self, _validated_settings: Option<serde_json::Value>) -> Result<()> {
+        // Validate MPS and MIG are not both enabled
+        if let Some(ref nvidia) = value.nvidia {
+            let is_mps = matches!(
+                nvidia.device_sharing_strategy,
+                Some(NvidiaDeviceSharingStrategy::Mps)
+            );
+            let is_mig = matches!(
+                nvidia.device_partitioning_strategy,
+                Some(NvidiaDevicePartitioningStrategy::MIG)
+            );
+            if is_mps && is_mig {
+                return Err(KubeletDevicePluginsError::MpsMigConflict);
+            }
+        }


Does this validation logic actually work? AFAIK it's the first time we're attempting this sort of cross-setting validation.

yeazelm mentioned this pull request Dec 31, 2025

Support for CUDA MPS bottlerocket-os/bottlerocket#4673

Open

yeazelm force-pushed the add_mps branch from 7f7aa06 to 97fcbc2 Compare December 31, 2025 22:54

feat: add MPS GPU sharing settings model

db7691c

Add NvidiaMpsSettings struct and Mps variant to NvidiaDeviceSharingStrategy for NVIDIA Multi-Process Service support. Includes validation to prevent MPS and MIG from being enabled simultaneously. Signed-off-by: Matthew Yeazel <[email protected]>

yeazelm force-pushed the add_mps branch from 97fcbc2 to db7691c Compare January 2, 2026 16:43

piyush-jena approved these changes Jan 2, 2026

View reviewed changes

yeazelm requested review from bcressey and cbgbt January 6, 2026 16:21

bcressey reviewed Jan 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add MPS GPU sharing settings model #107

feat: add MPS GPU sharing settings model #107

yeazelm commented Dec 31, 2025

Uh oh!

piyush-jena left a comment

Uh oh!

piyush-jena Jan 2, 2026

Uh oh!

bcressey Jan 9, 2026

Uh oh!

bcressey Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: add MPS GPU sharing settings model #107

Are you sure you want to change the base?

feat: add MPS GPU sharing settings model #107

Conversation

yeazelm commented Dec 31, 2025

Uh oh!

piyush-jena left a comment

Choose a reason for hiding this comment

Uh oh!

piyush-jena Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

bcressey Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

bcressey Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants