feat: add pdrole package for P/D role discovery and detection by ev-shindin · Pull Request #735 · llm-d/llm-d-workload-variant-autoscaler

ev-shindin · 2026-02-15T14:34:54Z

Summary

Add internal/utils/pdrole package that discovers P/D disaggregation configuration from the EPP's EndpointPickerConfig and detects each deployment's P/D role from pod template labels
Discovery is per-pool (accepts *pool.EndpointPool), enabling correct behavior when multiple EPPs exist with different P/D label settings
Returns PDDiscoveryResult with Disaggregated flag so callers can distinguish "no P/D plugins" (all pods serve both roles) from "P/D plugins found, use label config"

Design decisions

Label-only detection (no deployment name fallback).
The EPP's filter plugins (prefill-filter, decode-filter, by-label from llm-d-inference-scheduler) route traffic based solely on pod labels. A deployment named llama-prefill without a llm-d.ai/role label would still receive decode traffic from EPP (decode-filter has allowsNoLabel=true). Name-based guessing would misclassify it.

BothValues from intersection, not hardcoded.
For by-label custom plugins, values appearing in both prefill and decode profiles' validValues are classified as BothValues. This ensures GetDeploymentPDRole returns RoleBoth (not RolePrefill or RoleDecode) for pods that pass both filters.

Aligned with EPP's actual filter semantics:

EPP filter	Label	Valid values	`allowsNoLabel`
`prefill-filter`	`llm-d.ai/role`	`"prefill"`	`false`
`decode-filter`	`llm-d.ai/role`	`"decode"`, `"both"`	`true`

Package structure

File	Purpose
`types.go`	`PDRole`, `PDRoleLabelConfig`, `PDDiscoveryResult`, constants
`detect.go`	`GetDeploymentPDRole` — label-based role detection
`discover.go`	`DiscoverPDRoleLabelConfig` — EPP config discovery chain
`*_test.go`	43 Ginkgo specs

Discovery chain

Pool -> EPP service name/namespace
Service selector -> EPP deployment
Deployment volumes -> mounted ConfigMap
ConfigMap data -> parse EndpointPickerConfig (YAML/JSON)
Plugins -> detect prefill-filter/decode-filter (well-known label) or by-label (custom label from parameters)
Any step fails -> Disaggregated=false, default config

Intended caller pattern (not wired yet)

pool, _ := datastore.PoolGetFromLabels(deploy.Spec.Template.Labels)
result := pdrole.DiscoverPDRoleLabelConfig(ctx, k8sClient, pool)
if !result.Disaggregated {
    // no P/D plugins — all deployments serve both roles
    role = pdrole.RoleBoth
} else {
    role = pdrole.GetDeploymentPDRole(deploy, result.LabelConfig)
}

Test plan

go build ./internal/utils/pdrole/... — clean
go vet ./internal/utils/pdrole/... — clean
go test ./internal/utils/pdrole/... -v -count=1 — 43/43 specs pass
No callers yet (infrastructure-only PR), wiring comes in follow-up

lionelvillard · 2026-02-16T21:20:15Z

+// PDRole represents the Prefill/Decode role of a deployment in a P/D disaggregation setup.
+type PDRole string
+
+const (


many constants below are already define in https://github.com/llm-d/llm-d-inference-scheduler/blob/main/pkg/plugins/filter/pd_role.go. No need to redefine.

Should we add this heavy dependency?

yes it's fine, this is just code dependency.

asm582

Address review

asm582 · 2026-02-17T02:53:53Z

At the cost of synchronization, can we add a field in VA CRD and simplify this PR?

type VariantAutoscalingSpec struct {
    // ... existing fields ...
    // Role explicitly defines the P/D role for this variant.
    // +kubebuilder:validation:Enum=prefill;decode;both
    // +optional
    Role string `json:"role,omitempty"`
}

ev-shindin · 2026-02-23T17:17:17Z

At the cost of synchronization, can we add a field in VA CRD and simplify this PR?

I would like to simplify UX. With this PR user get P/D detection with no cost, otherwise he should care about synchronization between P/D definition/detection in the inference scheduler and autoscaler.

ev-shindin · 2026-02-24T08:06:02Z

/ok-to-test

github-actions · 2026-02-24T08:06:14Z

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

github-actions · 2026-02-24T08:09:09Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	13	37

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

asm582 · 2026-02-24T15:32:17Z

At the cost of synchronization, can we add a field in VA CRD and simplify this PR?

I would like to simplify UX. With this PR user get P/D detection with no cost, otherwise he should care about synchronization between P/D definition/detection in the inference scheduler and autoscaler.

To clarify, the comment on synchronization pertains to the implementation; for the user, it would involve adding labels to the VA CR.

ev-shindin · 2026-02-28T20:51:24Z

/retest

github-actions · 2026-02-28T20:51:31Z

🚀 E2E tests triggered by /retest

View the OpenShift E2E workflow run

github-actions · 2026-02-28T20:54:06Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	14	36

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

asm582 · 2026-03-03T15:43:31Z

/trigger-e2e-full

github-actions · 2026-03-03T15:43:40Z

🚀 Full E2E tests triggered by /trigger-e2e-full

View the Kind E2E workflow run

asm582 · 2026-03-03T15:44:26Z

@ev-shindin, can you please take a look at why OpenShift tests are failing to merge this PR?

ev-shindin · 2026-03-04T07:57:06Z

/retest

github-actions · 2026-03-04T07:57:16Z

🚀 E2E tests triggered by /retest

View the OpenShift E2E workflow run

github-actions · 2026-03-04T07:59:50Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	12	38

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

github-actions · 2026-03-04T08:16:37Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	12	38

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

asm582 · 2026-03-04T15:39:07Z

/ok-to-test

github-actions · 2026-03-04T15:39:21Z

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

github-actions · 2026-03-04T15:42:02Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	14	36

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

mamy-CS · 2026-03-04T16:57:33Z

@ev-shindin please rebase to pick up new changes and openshift e2e passes

asm582 · 2026-03-04T19:40:21Z

@ev-shindin please rebase to pick up new changes and openshift e2e passes

@ev-shindin, once rebase is done and the test cases pass, we should be able to merge this PR.

Add internal/utils/pdrole package that discovers Prefill/Decode role configuration from EPP's EndpointPickerConfig and detects each deployment's P/D role from pod template labels. Aligned with llm-d-inference-scheduler's actual behavior: - Label-only detection (no deployment name fallback). EPP's filter plugins (prefill-filter, decode-filter, by-label) route traffic based solely on pod labels, never deployment names. - prefill-filter: accepts only "prefill" labeled pods (allowsNoLabel=false) - decode-filter: accepts "decode"/"both" labeled + unlabeled pods (allowsNoLabel=true) - by-label: values appearing in both prefill and decode profiles' validValues are classified as BothValues (intersection logic) Discovery accepts a per-pool EndpointPool (not a global EPP name), enabling correct behavior when multiple EPPs have different P/D label settings. Returns PDDiscoveryResult with Disaggregated flag so callers can distinguish "no P/D plugins" (treat all as RoleBoth) from "P/D plugins found, use label config for detection".

- Return error from DiscoverPDRoleLabelConfig instead of silently swallowing K8s API failures. Non-error conditions (nil pool, no P/D plugins) return nil error with Disaggregated=false. - Sort ConfigMap data keys for deterministic traversal. - Document alignment of constants with llm-d-inference-scheduler pd_role.go (redefined locally to avoid heavyweight dependency).

ev-shindin · 2026-03-05T16:27:28Z

/trigger-e2e-full

ev-shindin · 2026-03-05T16:27:48Z

/ok-to-test

ev-shindin · 2026-03-06T10:28:10Z

/trigger-e2e-full

github-actions · 2026-03-06T10:28:20Z

🚀 Kind E2E (full) triggered by /trigger-e2e-full

View the Kind E2E workflow run

ev-shindin · 2026-03-06T13:54:42Z

/ok-to-test

github-actions · 2026-03-06T13:54:52Z

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

github-actions · 2026-03-06T13:57:40Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	11	39

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

lionelvillard · 2026-03-09T21:39:54Z

This PR makes several assumptions on how the EPP config is made available to the container running the EPP, via a configmap volume mount. This is not the only way. For example, the configuration can also be passed as in-line text the EPP command line argument (--config-text), which can complicate even further the P/D role discovery (kserve uses --config-text). Another example is the configmap may be stored in a container mounted as volume.

It also requires read access to all configmaps in the cluster, which is considered a medium security risk. However WVA already grants read (and update!) access to all configmaps, so it is not really a concern here (at least for now).

I agree that WVA needs to be aware of the EPP configuration, and not only the P/D roles. Since automatic discovery is quite complex and can potentially fail, at the very least there should be a fallback mechanism, typically of the form of a annotation on the VA object (wva.llmd.ai/endpoint-picker-config: | ...). This annotation can be automatically set by the "upper layers" (ie. kserve, model service, kustomize, etc...) and WVA may provide a way to automatically set this annotation (this PR) but it's not a priority IMO.

Long story short: this PR is a bit premature. I would start with the wva.llmd.ai/endpoint-picker-config: | ... annotation and decide later on if there is a need for automatic discovery.

asm582 · 2026-03-10T01:40:49Z

valid points, a similar comment was made here #735 (comment), but using fields to push for validation while having some sync cost.

github-actions · 2026-04-01T01:35:37Z

This PR is marked as stale after 21d of inactivity. After an additional 14d of inactivity (7d to become rotten, then 7d more), it will be closed. To prevent this PR from being closed, add a comment or remove the lifecycle/stale label.

asm582 · 2026-04-01T17:25:46Z

Are we still working on this PR?

ev-shindin requested a review from lionelvillard February 15, 2026 14:37

ev-shindin self-assigned this Feb 15, 2026

ev-shindin linked an issue Feb 15, 2026 that may be closed by this pull request

Support P/D (Prefill/Decode) Disaggregation #736

Closed

ev-shindin requested a review from asm582 February 16, 2026 16:19

lionelvillard reviewed Feb 16, 2026

View reviewed changes

asm582 reviewed Feb 17, 2026

View reviewed changes

Comment thread internal/utils/pdrole/discover.go

asm582 reviewed Feb 17, 2026

View reviewed changes

Comment thread internal/utils/pdrole/discover.go Outdated

asm582 requested changes Feb 17, 2026

View reviewed changes

ev-shindin requested review from asm582 and lionelvillard February 17, 2026 21:21

ev-shindin added 2 commits March 5, 2026 18:07

ev-shindin force-pushed the pd-role-discovery branch from 51c8825 to 789cb83 Compare March 5, 2026 16:08

github-actions bot added the lifecycle/stale label Apr 1, 2026

github-actions bot removed the lifecycle/stale label Apr 2, 2026

Conversation

ev-shindin commented Feb 15, 2026

Summary

Design decisions

Package structure

Discovery chain

Intended caller pattern (not wired yet)

Test plan

Uh oh!

lionelvillard Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

ev-shindin Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

lionelvillard Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

asm582 left a comment

Choose a reason for hiding this comment

Uh oh!

asm582 commented Feb 17, 2026

Uh oh!

ev-shindin commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ev-shindin commented Feb 24, 2026

Uh oh!

github-actions bot commented Feb 24, 2026

Uh oh!

github-actions bot commented Feb 24, 2026

GPU Pre-flight Check ✅

Uh oh!

asm582 commented Feb 24, 2026

Uh oh!

ev-shindin commented Feb 28, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

GPU Pre-flight Check ✅

Uh oh!

asm582 commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

asm582 commented Mar 3, 2026

Uh oh!

ev-shindin commented Mar 4, 2026

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

github-actions bot commented Mar 4, 2026

GPU Pre-flight Check ✅

Uh oh!

github-actions bot commented Mar 4, 2026

GPU Pre-flight Check ✅

Uh oh!

asm582 commented Mar 4, 2026

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

github-actions bot commented Mar 4, 2026

GPU Pre-flight Check ✅

Uh oh!

mamy-CS commented Mar 4, 2026

Uh oh!

asm582 commented Mar 4, 2026

Uh oh!

ev-shindin commented Mar 5, 2026

Uh oh!

ev-shindin commented Mar 5, 2026

Uh oh!

ev-shindin commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

ev-shindin commented Mar 6, 2026

ev-shindin commented Feb 23, 2026 •

edited

Loading