Background
internal/discovery/k8s_with_gpu_operator.go currently has two near-identical vendor-loop node queries:
Discover() (lines 33-99) — queries nodes per vendor, returns map[nodeName]map[model]AcceleratorModelInfo
discoverNodeGPUTypes() (lines 147-188) — same vendor-loop, returns map[nodeName]string (just the model name)
Both loop over vendors, build the same label selector, list nodes, iterate results. They differ only in the projection of the result. Additionally, neither captures node.Labels, which will be needed by upcoming features (see "Motivation" below).
Motivation
Upcoming work needs a third projection of the same cluster state — per-node info including labels, not just accelerator models. Concrete use cases:
Adding a third independent vendor-loop for this would lock in the duplication pattern. This PR consolidates all per-node discovery into a single internal helper with multiple public projections.
Non-goals
- No changes to
CapacityDiscovery or UsageDiscovery interface signatures. Existing callers continue to work unchanged.
- No new behavior. This is a pure refactor — same inputs produce the same outputs for existing public methods.
- Not consuming
DiscoverNodes anywhere yet. That happens in the namespace-scoped-limiter PR.
Behavior preservation checklist
The refactor must preserve:
WVA_NODE_SELECTOR environment variable handling
- Vendor iteration order (
nvidia.com, amd.com, intel.com)
- Multi-vendor node handling (a node with both
nvidia.com/gpu.product and amd.com/gpu.product-name labels should appear once in results with both accelerators)
node.Status.Allocatable for <vendor>/gpu used for Count
<vendor>/gpu.memory label used for Memory
- Empty GPU count if
Allocatable missing the resource
discoverNodeGPUTypes tie-breaking if a node has multiple vendor labels (preserve existing behavior; likely first-vendor-wins in the current loop order)
Acceptance criteria
- New
NodeInfo type in internal/discovery/types.go
- New
NodeDiscovery interface in internal/discovery/interface.go; added to FullDiscovery
listGPUNodes internal helper extracted; Discover and discoverNodeGPUTypes reimplemented as projections
- New
DiscoverNodes method on K8sWithGpuOperator
- All existing tests in
internal/discovery/*_test.go pass unchanged
- New unit tests for
DiscoverNodes covering:
- Single-vendor node (NVIDIA)
- Multi-vendor node (both NVIDIA and AMD labels) — one entry in result with both accelerators
- Node labels captured correctly in
NodeInfo.Labels
WVA_NODE_SELECTOR filters the node set
- Node without GPU resources (labels present but
Allocatable empty) handled gracefully
go build ./... clean, no linter issues
- No changes outside
internal/discovery/
Related
Background
internal/discovery/k8s_with_gpu_operator.gocurrently has two near-identical vendor-loop node queries:Discover()(lines 33-99) — queries nodes per vendor, returnsmap[nodeName]map[model]AcceleratorModelInfodiscoverNodeGPUTypes()(lines 147-188) — same vendor-loop, returnsmap[nodeName]string(just the model name)Both loop over
vendors, build the same label selector, list nodes, iterate results. They differ only in the projection of the result. Additionally, neither capturesnode.Labels, which will be needed by upcoming features (see "Motivation" below).Motivation
Upcoming work needs a third projection of the same cluster state — per-node info including labels, not just accelerator models. Concrete use cases:
node.Labelsalongsidenode.Status.Allocatable.Adding a third independent vendor-loop for this would lock in the duplication pattern. This PR consolidates all per-node discovery into a single internal helper with multiple public projections.
Non-goals
CapacityDiscoveryorUsageDiscoveryinterface signatures. Existing callers continue to work unchanged.DiscoverNodesanywhere yet. That happens in the namespace-scoped-limiter PR.Behavior preservation checklist
The refactor must preserve:
WVA_NODE_SELECTORenvironment variable handlingnvidia.com,amd.com,intel.com)nvidia.com/gpu.productandamd.com/gpu.product-namelabels should appear once in results with both accelerators)node.Status.Allocatablefor<vendor>/gpuused forCount<vendor>/gpu.memorylabel used forMemoryAllocatablemissing the resourcediscoverNodeGPUTypestie-breaking if a node has multiple vendor labels (preserve existing behavior; likely first-vendor-wins in the current loop order)Acceptance criteria
NodeInfotype ininternal/discovery/types.goNodeDiscoveryinterface ininternal/discovery/interface.go; added toFullDiscoverylistGPUNodesinternal helper extracted;DiscoveranddiscoverNodeGPUTypesreimplemented as projectionsDiscoverNodesmethod onK8sWithGpuOperatorinternal/discovery/*_test.gopass unchangedDiscoverNodescovering:NodeInfo.LabelsWVA_NODE_SELECTORfilters the node setAllocatableempty) handled gracefullygo build ./...clean, no linter issuesinternal/discovery/Related
DiscoverNodesonce this refactor lands