diff --git a/keps/165-colocated-placement/README.md b/keps/165-colocated-placement/README.md new file mode 100644 index 00000000..9ee8ce91 --- /dev/null +++ b/keps/165-colocated-placement/README.md @@ -0,0 +1,268 @@ +# KEP-165: Colocated placement support + + + + +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories (Optional)](#user-stories-optional) + - [Story 1](#story-1) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [LeaderWorkerSet API](#leaderworkerset-api) + - [SubGroup Policy Support](#subgroup-policy-support) + - [Test Plan](#test-plan) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) +- [Implementation History](#implementation-history) +- [Alternatives](#alternatives) + + +## Summary + + + +This KEP aims to add spec.leaderWorkerTemplate.topologyPlacementPolicy to support exclusive and colocated placement of multiple pod groups. + +## Motivation + +Fine-grained topology scheduling for leaders and workers is really necessary. +In some cases, being in the same domain means lower communication costs, so that leader and its related workers are supposed to be in the same topology domain, especially for workers that distribute across multi-hosts to do inference distributively. Currently, enabling [Exclusive placement](https://github.com/kubernetes-sigs/lws/blob/main/docs/examples/sample/README.md#exclusive-placement) makes a pod group schedule to a single topology, but one topology can only have one pod group. Since this restriction isn't really necessary in all cases, we should offer finer-grained scheduling ability. + +This KEP is to have LeaderWorkerSet to support colocated topology placement of one pod group. + +### Goals + + + +- Add a new way of topology scheduling and allow multiple pod groups land on the same domain +- Include exclusive placement in the new field and maintain backward compatibility + +### Non-Goals + + + +## Proposal + + + +### User Stories (Optional) + +#### Story 1 +Each pod group (leader and workers) is colocated in one domain(i.e., more than one group could land on the same domain) + +### Notes/Constraints/Caveats (Optional) + + + +### Risks and Mitigations + +For now, exclusive placement is enabled by setting annotation. We want to unify the exclusive and colocated topology scheduling into one field. We can maintain backward compatibility, but we have to validate that the topology key matches in annotations and fields. + +## Design Details + + + +Similar to implementation of exclusive placement, we only need to remove podAntiAffinity part in the pod webhook. Exclusive and colocated placement are contradictory. If we add this feature to annotations too, when we enable them both in the annotations, the meaning becomes seriously ambiguous. So we think they should be unified to one field and it is necessary to change API specs. + +### LeaderWorkerSet API + +```go +type TopologyPlacementPolicyType string + +const ( + ExclusiveTopologyPlacementPolicyType TopologyPlacementPolicyType = "Exclusive" + ColocatedTopologyPlacementPolicyType TopologyPlacementPolicyType = "Colocated" + NoneTopologyPlacementPolicyType TopologyPlacementPolicyType = "None" +) + +type LeaderWorkerTemplate struct { + // +optional + TopologyPlacementPolicy TopologyPlacementPolicy `json:"topologyPlacementPolicy",omitempty` +} + +type TopologyPlacementPolicy struct { + // +kubebuilder:default=None + // +kubebuilder:validation=Enum={ExclusiveTopologyPlacementPolicyType,ColocatedTopologyPlacementPolicyType,NoneTopologyPlacementPolicyType} + type TopologyPlacementPolicyType `json:"type"` + topologyKey *string `json:"topologyKey",omitempty` +} +``` + +### SubGroup Policy Support +Colocated placement can support subgroup policy as well. +Compared with [exclusive support for subgroup policy](https://github.com/kubernetes-sigs/lws/assets/86417275/ff9fc93d-c738-4c09-abc8-50a7b16d49df), the workflow is almost identical, as follows: +![](./workflow.jpg) + +### Test Plan + + + +[X] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +##### Unit tests + + + + + +Unit tests will cover all introduced functions and complement test cases for modified existing functions. + +##### Integration tests + + + + + +- integration test of pod webhook should cover + - pod affinity/antiaffinity is set correctly when topologyPlacementPolicyType is exclusive +- integration test of pod controller should cover + - pod nodeselector is injected properly when topologyPlacementPolicyType is colocated + - wait until pod is scheduled when topologyPlacementPolicyType is exclusive or colocated + +##### e2e tests + + + +When setting topologyPlacementPolicyType lws deployment will have correct pod placement(including exclusive, none, and colocated). It should also work well with other features enabled, like subgroup policy, failure handling and rolling update. + +### Graduation Criteria + + + +## Implementation History + + + + +## Alternatives + + + +We can add this feature to annotations as well by introducing new annotation "leaderworkerset.sigs.k8s.io/colocated-topology". But this will conflict with exclusive topology when enbaling them in the meanwhile. So adding new field to API spec is recommended. \ No newline at end of file diff --git a/keps/165-colocated-placement/kep.yaml b/keps/165-colocated-placement/kep.yaml new file mode 100644 index 00000000..01d4307e --- /dev/null +++ b/keps/165-colocated-placement/kep.yaml @@ -0,0 +1,28 @@ +title: Colocation Placement Support +kep-number: 165 +authors: + - "vie-serendipity" +status: provisional +creation-date: 2024-07-01 +reviewers: + - "@kerthcet" + - "@liurupeng" + - "@ahg-g" +approvers: + - "@kerthcet" + - "@liurupeng" + - "@ahg-g" + +# The target maturity stage in the current dev cycle for this KEP. +stage: alpha + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v0.3.0" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: "v0.5.0" + beta: "v0.5.0" + stable: "v0.5.0" \ No newline at end of file diff --git a/keps/165-colocated-placement/workflow.jpg b/keps/165-colocated-placement/workflow.jpg new file mode 100644 index 00000000..b724b330 Binary files /dev/null and b/keps/165-colocated-placement/workflow.jpg differ