Skip to content

Conversation

@yeazelm
Copy link
Contributor

@yeazelm yeazelm commented Dec 31, 2025

Issue #: bottlerocket-os/bottlerocket#4673

Description of changes:
Add NvidiaMpsSettings struct and Mps variant to NvidiaDeviceSharingStrategy for NVIDIA Multi-Process Service support. Includes validation to prevent MPS and MIG from being enabled simultaneously.

Testing:
The testing was documented in the related PR: bottlerocket-os/bottlerocket-core-kit#789

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Add NvidiaMpsSettings struct and Mps variant to
NvidiaDeviceSharingStrategy for NVIDIA Multi-Process Service support.
Includes validation to prevent MPS and MIG from being enabled
simultaneously.

Signed-off-by: Matthew Yeazel <[email protected]>
Copy link
Contributor

@piyush-jena piyush-jena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from the nitpicky request for an additional unit test. Nice work coding the conflict between MIG and MPS.

let test_json = r#"{"nvidia":{"pass-device-specs":true,"device-id-strategy":"index","device-list-strategy":"volume-mounts","device-sharing-strategy":"mps","mps":{"replicas":4},"device-partitioning-strategy":"none"}}"#;

let device_plugins: KubeletDevicePluginsV1 = serde_json::from_str(test_json).unwrap();
assert_eq!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an assert for replicas? and an unit test for the default number of replicas = 2?

@yeazelm yeazelm requested review from bcressey and cbgbt January 6, 2026 16:21
Comment on lines +27 to +29
// Define the bounds for the `mps.replicas` field
const MPS_REPLICAS_MIN: i32 = 2;
const MPS_REPLICAS_MAX: i32 = i32::MAX;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are actual bounds? It's not possible to set only one replica, and the software is fine with 2 billion replicas?

Comment on lines +44 to +58
fn validate(value: Self, _validated_settings: Option<serde_json::Value>) -> Result<()> {
// Validate MPS and MIG are not both enabled
if let Some(ref nvidia) = value.nvidia {
let is_mps = matches!(
nvidia.device_sharing_strategy,
Some(NvidiaDeviceSharingStrategy::Mps)
);
let is_mig = matches!(
nvidia.device_partitioning_strategy,
Some(NvidiaDevicePartitioningStrategy::MIG)
);
if is_mps && is_mig {
return Err(KubeletDevicePluginsError::MpsMigConflict);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this validation logic actually work? AFAIK it's the first time we're attempting this sort of cross-setting validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants