feat: add pdrole package for P/D role discovery and detection#735
feat: add pdrole package for P/D role discovery and detection#735ev-shindin wants to merge 2 commits intollm-d:mainfrom
Conversation
| // PDRole represents the Prefill/Decode role of a deployment in a P/D disaggregation setup. | ||
| type PDRole string | ||
|
|
||
| const ( |
There was a problem hiding this comment.
many constants below are already define in https://github.com/llm-d/llm-d-inference-scheduler/blob/main/pkg/plugins/filter/pd_role.go. No need to redefine.
There was a problem hiding this comment.
Should we add this heavy dependency?
There was a problem hiding this comment.
yes it's fine, this is just code dependency.
|
At the cost of synchronization, can we add a field in VA CRD and simplify this PR? |
I would like to simplify UX. With this PR user get P/D detection with no cost, otherwise he should care about synchronization between P/D definition/detection in the inference scheduler and autoscaler. |
|
/ok-to-test |
|
🚀 E2E tests triggered by /ok-to-test |
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
To clarify, the comment on synchronization pertains to the implementation; for the user, it would involve adding labels to the VA CR. |
|
/retest |
|
🚀 E2E tests triggered by /retest |
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
|
/trigger-e2e-full |
|
🚀 Full E2E tests triggered by /trigger-e2e-full |
|
@ev-shindin, can you please take a look at why OpenShift tests are failing to merge this PR? |
|
/retest |
|
🚀 E2E tests triggered by /retest |
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
1 similar comment
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
|
/ok-to-test |
|
🚀 E2E tests triggered by /ok-to-test |
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
|
@ev-shindin please rebase to pick up new changes and openshift e2e passes |
@ev-shindin, once rebase is done and the test cases pass, we should be able to merge this PR. |
Add internal/utils/pdrole package that discovers Prefill/Decode role configuration from EPP's EndpointPickerConfig and detects each deployment's P/D role from pod template labels. Aligned with llm-d-inference-scheduler's actual behavior: - Label-only detection (no deployment name fallback). EPP's filter plugins (prefill-filter, decode-filter, by-label) route traffic based solely on pod labels, never deployment names. - prefill-filter: accepts only "prefill" labeled pods (allowsNoLabel=false) - decode-filter: accepts "decode"/"both" labeled + unlabeled pods (allowsNoLabel=true) - by-label: values appearing in both prefill and decode profiles' validValues are classified as BothValues (intersection logic) Discovery accepts a per-pool EndpointPool (not a global EPP name), enabling correct behavior when multiple EPPs have different P/D label settings. Returns PDDiscoveryResult with Disaggregated flag so callers can distinguish "no P/D plugins" (treat all as RoleBoth) from "P/D plugins found, use label config for detection".
- Return error from DiscoverPDRoleLabelConfig instead of silently swallowing K8s API failures. Non-error conditions (nil pool, no P/D plugins) return nil error with Disaggregated=false. - Sort ConfigMap data keys for deterministic traversal. - Document alignment of constants with llm-d-inference-scheduler pd_role.go (redefined locally to avoid heavyweight dependency).
51c8825 to
789cb83
Compare
|
/trigger-e2e-full |
|
/ok-to-test |
|
/trigger-e2e-full |
|
🚀 Kind E2E (full) triggered by |
|
/ok-to-test |
|
🚀 OpenShift E2E — approve and run ( |
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
|
This PR makes several assumptions on how the EPP config is made available to the container running the EPP, via a configmap volume mount. This is not the only way. For example, the configuration can also be passed as in-line text the EPP command line argument ( It also requires read access to all configmaps in the cluster, which is considered a medium security risk. However WVA already grants read (and update!) access to all configmaps, so it is not really a concern here (at least for now). I agree that WVA needs to be aware of the EPP configuration, and not only the P/D roles. Since automatic discovery is quite complex and can potentially fail, at the very least there should be a fallback mechanism, typically of the form of a annotation on the VA object ( Long story short: this PR is a bit premature. I would start with the |
|
valid points, a similar comment was made here #735 (comment), but using fields to push for validation while having some sync cost. |
|
This PR is marked as stale after 21d of inactivity. After an additional 14d of inactivity (7d to become rotten, then 7d more), it will be closed. To prevent this PR from being closed, add a comment or remove the |
|
Are we still working on this PR? |
Summary
internal/utils/pdrolepackage that discovers P/D disaggregation configuration from the EPP'sEndpointPickerConfigand detects each deployment's P/D role from pod template labels*pool.EndpointPool), enabling correct behavior when multiple EPPs exist with different P/D label settingsPDDiscoveryResultwithDisaggregatedflag so callers can distinguish "no P/D plugins" (all pods serve both roles) from "P/D plugins found, use label config"Design decisions
Label-only detection (no deployment name fallback).
The EPP's filter plugins (
prefill-filter,decode-filter,by-labelfrom llm-d-inference-scheduler) route traffic based solely on pod labels. A deployment namedllama-prefillwithout allm-d.ai/rolelabel would still receive decode traffic from EPP (decode-filterhasallowsNoLabel=true). Name-based guessing would misclassify it.BothValues from intersection, not hardcoded.
For
by-labelcustom plugins, values appearing in both prefill and decode profiles'validValuesare classified asBothValues. This ensuresGetDeploymentPDRolereturnsRoleBoth(notRolePrefillorRoleDecode) for pods that pass both filters.Aligned with EPP's actual filter semantics:
allowsNoLabelprefill-filterllm-d.ai/role"prefill"falsedecode-filterllm-d.ai/role"decode","both"truePackage structure
types.goPDRole,PDRoleLabelConfig,PDDiscoveryResult, constantsdetect.goGetDeploymentPDRole— label-based role detectiondiscover.goDiscoverPDRoleLabelConfig— EPP config discovery chain*_test.goDiscovery chain
EndpointPickerConfig(YAML/JSON)prefill-filter/decode-filter(well-known label) orby-label(custom label from parameters)Disaggregated=false, default configIntended caller pattern (not wired yet)
Test plan
go build ./internal/utils/pdrole/...— cleango vet ./internal/utils/pdrole/...— cleango test ./internal/utils/pdrole/... -v -count=1— 43/43 specs pass