Skip to content

feat: add pdrole package for P/D role discovery and detection#735

Open
ev-shindin wants to merge 2 commits intollm-d:mainfrom
ev-shindin:pd-role-discovery
Open

feat: add pdrole package for P/D role discovery and detection#735
ev-shindin wants to merge 2 commits intollm-d:mainfrom
ev-shindin:pd-role-discovery

Conversation

@ev-shindin
Copy link
Copy Markdown
Collaborator

Summary

  • Add internal/utils/pdrole package that discovers P/D disaggregation configuration from the EPP's EndpointPickerConfig and detects each deployment's P/D role from pod template labels
  • Discovery is per-pool (accepts *pool.EndpointPool), enabling correct behavior when multiple EPPs exist with different P/D label settings
  • Returns PDDiscoveryResult with Disaggregated flag so callers can distinguish "no P/D plugins" (all pods serve both roles) from "P/D plugins found, use label config"

Design decisions

Label-only detection (no deployment name fallback).
The EPP's filter plugins (prefill-filter, decode-filter, by-label from llm-d-inference-scheduler) route traffic based solely on pod labels. A deployment named llama-prefill without a llm-d.ai/role label would still receive decode traffic from EPP (decode-filter has allowsNoLabel=true). Name-based guessing would misclassify it.

BothValues from intersection, not hardcoded.
For by-label custom plugins, values appearing in both prefill and decode profiles' validValues are classified as BothValues. This ensures GetDeploymentPDRole returns RoleBoth (not RolePrefill or RoleDecode) for pods that pass both filters.

Aligned with EPP's actual filter semantics:

EPP filter Label Valid values allowsNoLabel
prefill-filter llm-d.ai/role "prefill" false
decode-filter llm-d.ai/role "decode", "both" true

Package structure

File Purpose
types.go PDRole, PDRoleLabelConfig, PDDiscoveryResult, constants
detect.go GetDeploymentPDRole — label-based role detection
discover.go DiscoverPDRoleLabelConfig — EPP config discovery chain
*_test.go 43 Ginkgo specs

Discovery chain

  1. Pool -> EPP service name/namespace
  2. Service selector -> EPP deployment
  3. Deployment volumes -> mounted ConfigMap
  4. ConfigMap data -> parse EndpointPickerConfig (YAML/JSON)
  5. Plugins -> detect prefill-filter/decode-filter (well-known label) or by-label (custom label from parameters)
  6. Any step fails -> Disaggregated=false, default config

Intended caller pattern (not wired yet)

pool, _ := datastore.PoolGetFromLabels(deploy.Spec.Template.Labels)
result := pdrole.DiscoverPDRoleLabelConfig(ctx, k8sClient, pool)
if !result.Disaggregated {
    // no P/D plugins — all deployments serve both roles
    role = pdrole.RoleBoth
} else {
    role = pdrole.GetDeploymentPDRole(deploy, result.LabelConfig)
}

Test plan

  • go build ./internal/utils/pdrole/... — clean
  • go vet ./internal/utils/pdrole/... — clean
  • go test ./internal/utils/pdrole/... -v -count=1 — 43/43 specs pass
  • No callers yet (infrastructure-only PR), wiring comes in follow-up

@ev-shindin ev-shindin self-assigned this Feb 15, 2026
@ev-shindin ev-shindin linked an issue Feb 15, 2026 that may be closed by this pull request
@ev-shindin ev-shindin requested a review from asm582 February 16, 2026 16:19
// PDRole represents the Prefill/Decode role of a deployment in a P/D disaggregation setup.
type PDRole string

const (
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many constants below are already define in https://github.com/llm-d/llm-d-inference-scheduler/blob/main/pkg/plugins/filter/pd_role.go. No need to redefine.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add this heavy dependency?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it's fine, this is just code dependency.

Comment thread internal/utils/pdrole/discover.go
Comment thread internal/utils/pdrole/discover.go Outdated
Copy link
Copy Markdown
Collaborator

@asm582 asm582 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address review

@asm582
Copy link
Copy Markdown
Collaborator

asm582 commented Feb 17, 2026

At the cost of synchronization, can we add a field in VA CRD and simplify this PR?

type VariantAutoscalingSpec struct {
    // ... existing fields ...
    // Role explicitly defines the P/D role for this variant.
    // +kubebuilder:validation:Enum=prefill;decode;both
    // +optional
    Role string `json:"role,omitempty"`
}

@ev-shindin
Copy link
Copy Markdown
Collaborator Author

ev-shindin commented Feb 23, 2026

At the cost of synchronization, can we add a field in VA CRD and simplify this PR?

I would like to simplify UX. With this PR user get P/D detection with no cost, otherwise he should care about synchronization between P/D definition/detection in the inference scheduler and autoscaler.

@ev-shindin
Copy link
Copy Markdown
Collaborator Author

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 13 37
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

@asm582
Copy link
Copy Markdown
Collaborator

asm582 commented Feb 24, 2026

At the cost of synchronization, can we add a field in VA CRD and simplify this PR?

I would like to simplify UX. With this PR user get P/D detection with no cost, otherwise he should care about synchronization between P/D definition/detection in the inference scheduler and autoscaler.

To clarify, the comment on synchronization pertains to the implementation; for the user, it would involve adding labels to the VA CR.

@ev-shindin
Copy link
Copy Markdown
Collaborator Author

/retest

@github-actions
Copy link
Copy Markdown
Contributor

🚀 E2E tests triggered by /retest

View the OpenShift E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 14 36
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

@asm582
Copy link
Copy Markdown
Collaborator

asm582 commented Mar 3, 2026

/trigger-e2e-full

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

🚀 Full E2E tests triggered by /trigger-e2e-full

View the Kind E2E workflow run

@asm582
Copy link
Copy Markdown
Collaborator

asm582 commented Mar 3, 2026

@ev-shindin, can you please take a look at why OpenShift tests are failing to merge this PR?

@ev-shindin
Copy link
Copy Markdown
Collaborator Author

/retest

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 4, 2026

🚀 E2E tests triggered by /retest

View the OpenShift E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 4, 2026

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 12 38
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 4, 2026

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 12 38
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

@asm582
Copy link
Copy Markdown
Collaborator

asm582 commented Mar 4, 2026

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 4, 2026

🚀 E2E tests triggered by /ok-to-test

View the OpenShift E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 4, 2026

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 14 36
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

@mamy-CS
Copy link
Copy Markdown
Collaborator

mamy-CS commented Mar 4, 2026

@ev-shindin please rebase to pick up new changes and openshift e2e passes

@asm582
Copy link
Copy Markdown
Collaborator

asm582 commented Mar 4, 2026

@ev-shindin please rebase to pick up new changes and openshift e2e passes

@ev-shindin, once rebase is done and the test cases pass, we should be able to merge this PR.

Add internal/utils/pdrole package that discovers Prefill/Decode role
configuration from EPP's EndpointPickerConfig and detects each
deployment's P/D role from pod template labels.

Aligned with llm-d-inference-scheduler's actual behavior:
- Label-only detection (no deployment name fallback). EPP's filter
  plugins (prefill-filter, decode-filter, by-label) route traffic
  based solely on pod labels, never deployment names.
- prefill-filter: accepts only "prefill" labeled pods (allowsNoLabel=false)
- decode-filter: accepts "decode"/"both" labeled + unlabeled pods
  (allowsNoLabel=true)
- by-label: values appearing in both prefill and decode profiles'
  validValues are classified as BothValues (intersection logic)

Discovery accepts a per-pool EndpointPool (not a global EPP name),
enabling correct behavior when multiple EPPs have different P/D
label settings. Returns PDDiscoveryResult with Disaggregated flag
so callers can distinguish "no P/D plugins" (treat all as RoleBoth)
from "P/D plugins found, use label config for detection".
- Return error from DiscoverPDRoleLabelConfig instead of silently
  swallowing K8s API failures. Non-error conditions (nil pool, no P/D
  plugins) return nil error with Disaggregated=false.
- Sort ConfigMap data keys for deterministic traversal.
- Document alignment of constants with llm-d-inference-scheduler
  pd_role.go (redefined locally to avoid heavyweight dependency).
@ev-shindin
Copy link
Copy Markdown
Collaborator Author

/trigger-e2e-full

@ev-shindin
Copy link
Copy Markdown
Collaborator Author

/ok-to-test

@ev-shindin
Copy link
Copy Markdown
Collaborator Author

/trigger-e2e-full

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

🚀 Kind E2E (full) triggered by /trigger-e2e-full

View the Kind E2E workflow run

@ev-shindin
Copy link
Copy Markdown
Collaborator Author

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 11 39
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

@lionelvillard
Copy link
Copy Markdown
Collaborator

This PR makes several assumptions on how the EPP config is made available to the container running the EPP, via a configmap volume mount. This is not the only way. For example, the configuration can also be passed as in-line text the EPP command line argument (--config-text), which can complicate even further the P/D role discovery (kserve uses --config-text). Another example is the configmap may be stored in a container mounted as volume.

It also requires read access to all configmaps in the cluster, which is considered a medium security risk. However WVA already grants read (and update!) access to all configmaps, so it is not really a concern here (at least for now).

I agree that WVA needs to be aware of the EPP configuration, and not only the P/D roles. Since automatic discovery is quite complex and can potentially fail, at the very least there should be a fallback mechanism, typically of the form of a annotation on the VA object (wva.llmd.ai/endpoint-picker-config: | ...). This annotation can be automatically set by the "upper layers" (ie. kserve, model service, kustomize, etc...) and WVA may provide a way to automatically set this annotation (this PR) but it's not a priority IMO.

Long story short: this PR is a bit premature. I would start with the wva.llmd.ai/endpoint-picker-config: | ... annotation and decide later on if there is a need for automatic discovery.

@asm582
Copy link
Copy Markdown
Collaborator

asm582 commented Mar 10, 2026

valid points, a similar comment was made here #735 (comment), but using fields to push for validation while having some sync cost.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

This PR is marked as stale after 21d of inactivity. After an additional 14d of inactivity (7d to become rotten, then 7d more), it will be closed. To prevent this PR from being closed, add a comment or remove the lifecycle/stale label.

@asm582
Copy link
Copy Markdown
Collaborator

asm582 commented Apr 1, 2026

Are we still working on this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support P/D (Prefill/Decode) Disaggregation

4 participants