Skip to content

NVIDIA/k8s-test-infra

Repository files navigation

k8s-test-infra

Kubernetes test infrastructure for NVIDIA GPU software — mock GPU environments, CI tooling, and testing utilities.

nvml-mock

Turn any Kubernetes cluster into a multi-GPU environment for testing. No physical NVIDIA hardware required.

# 1. Create cluster
kind create cluster --name gpu-test

# 2. Load the published image (or build locally with: docker build -t nvml-mock:local -f deployments/nvml-mock/Dockerfile .)
kind load docker-image ghcr.io/nvidia/nvml-mock:latest --name gpu-test

# 3. Install
helm install nvml-mock deployments/nvml-mock/helm/nvml-mock

After install, deploy a consumer to test:

Consumer Guide
NVIDIA Device Plugin Quick Start
NVIDIA DRA Driver Quick Start
NVIDIA GPU Operator Quick Start

Full documentation: nvml-mock Helm chart README

E2E Testing

The nvml-mock E2E workflow tests all GPU consumers across multiple profiles and node topologies. Run manually via workflow_dispatch or automatically on PRs.

Test Suite What It Validates Profiles
Device Plugin nvidia.com/gpu allocatable resources A100, H100, T4
DRA Driver ResourceSlices via Dynamic Resource Allocation A100, H100, T4
GPU Operator Operator components: device plugin + GFD + validator (CDI injection) A100, H100, T4
Multi-Node Fleet Cross-node scheduling with heterogeneous GPUs A100 + T4

Manual dispatch supports all 6 profiles: a100, h100, b200, gb200, l40s, t4.

See .github/workflows/nvml-mock-e2e.yaml for details.

Mock NVML Library

The underlying CGo-based mock libnvidia-ml.so that powers nvml-mock. Use standalone for local development and CI pipelines.

Document Description
Overview Project overview, components, GPU profiles
Quick Start Build and run in 5 minutes
Configuration YAML configuration reference
Architecture System design and components
CUDA Mock Mock CUDA library overview
Development Contributing and extending the library
Examples Usage patterns and scenarios
Troubleshooting Common issues and solutions

Integrations

Integration Description Guide
fake-gpu-operator Run:ai's K8s-level GPU simulation Integration Guide

Demos

Demo Description
Standalone nvml-mock with FGO-style labels on Kind
With fake-gpu-operator Full FGO + nvml-mock integration

License

Apache License 2.0 — see LICENSE.

About

K8s-test-infra

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors