[KEP] Concurrent Admission by PBundyra · Pull Request #8861 · kubernetes-sigs/kueue

PBundyra · 2026-01-28T15:45:13Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Part of #8691

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

netlify · 2026-01-28T15:45:21Z

✅ Deploy Preview for kubernetes-sigs-kueue ready!

Name	Link
🔨 Latest commit	`4ebf7ec`
🔍 Latest deploy log	https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/6981e43e5d4579000813d99e
😎 Deploy Preview	https://deploy-preview-8861--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2026-01-28T15:45:25Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: PBundyra
Once this PR has been reviewed and has the lgtm label, please assign mimowo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

PBundyra · 2026-01-28T15:45:35Z

cc @mwielgus @mwysokin @mimowo

keps/8691-concurrent-admission/README.md

ichekrygin · 2026-01-28T20:32:16Z

One concern I didn’t see addressed explicitly is the coupling between a Workload and the set of ResourceFlavors once Options are created.

Today, ResourceFlavor lifecycle is largely decoupled from workloads: a non-admitted workload is naturally retried against whatever flavors exist at scheduling time, and RF add/remove events are implicitly reflected in subsequent scheduling attempts.

With Concurrent Admission, each Option appears to bind a workload to a specific set of ResourceFlavors via AllowedResourceFlavors at Option creation time. It’s not clear what the intended behavior is if ResourceFlavors are added to or removed from the ClusterQueue after Options already exist.

Is this coupling intentional (i.e., RF membership is effectively snapshotted per Option), or is the expectation that the Option lifecycle controller reconciles Options in response to RF add/remove events? Either behavior seems reasonable, but the KEP currently doesn’t spell this out, and it feels like an observable semantic change compared to today’s behavior.

It would be helpful to make this assumption explicit in the KEP, both for operator expectations and to avoid ambiguity around RF lifecycle handling.

ichekrygin · 2026-01-28T20:55:29Z

One additional case I didn’t see called out explicitly is how Concurrent Admission is expected to behave for workloads with multiple PodSets.

In Kueue today, ResourceFlavor assignment happens per PodSet, and it is possible for different PodSets within the same Workload to be assigned different ResourceFlavors. With Concurrent Admission introducing Option Workloads that appear to model “attempts” against specific flavors or flavor sets, it’s not clear whether the intent is primarily whole-workload flavor placement (all PodSets landing on the same RF tier), or whether mixed PodSet → ResourceFlavor assignments within a single Option are an expected and supported outcome.

The KEP seems to inherit existing per-PodSet flavor assignment behavior implicitly, but does not discuss this case explicitly or provide examples. It would be helpful to clarify whether this scenario is in scope for Concurrent Admission, and how it is expected to interact with upgrade semantics and OnSuccessPolicy.

keps/8691-concurrent-admission/README.md

mimowo · 2026-01-30T10:18:50Z

keps/8691-concurrent-admission/README.md

+1) Narrowing selection of ResourceFlavors for a given Workload. This however can be also used outside of the Concurrent Admissions feature, creating more flexibility for Kueue.
+2) Preempting sibling Options when admitting more preferable ones.
+
+### Risks and Mitigations


One risk to me is debuggability for the time-based options. It is sometimes hard to know when the accounting period started, etc.

keps/8691-concurrent-admission/README.md

mimowo · 2026-01-30T12:31:21Z

I think this is quite a deep and useful feature. Would you like to maybe present it on the next wg-batch?

keps/8691-concurrent-admission/README.md

ichekrygin · 2026-01-30T21:07:04Z

I think this is quite a deep and useful feature. Would you like to maybe present it on the next wg-batch?

💯 in agreement - this is supper cool! Would love to see a demo.

ichekrygin · 2026-01-30T21:12:59Z

keps/8691-concurrent-admission/README.md

+To achieve that, I configure my ClusterQueue to use the Concurrent Admission with `ExplicitOptions` policy.
+I create a configuration for the Reservation Option with `AllowedResourceFlavors=["Reservation, Default-CPU"]`
+and for the On-Demand Option with `AllowedResourceFlavors=["On-Demand", "Default-CPU"]`.
+


This story is a good setup to flush out the parent Workload vs WorkloadOption cardinality, especially in the presence of multiple PodSets.

Consider a small extension of the example:

GPU: two flavors, as in the story

Reservation

On-Demand

CPU: two flavors

Default-CPU

Special-CPU

Given a Workload with two PodSets (GPU and CPU), what does the resulting WorkloadOption list look like?

Is it the cross product of flavors across PodSets, for example:

Reservation-GPU / Default-CPU

Reservation-GPU / Special-CPU

On-Demand-GPU / Default-CPU

On-Demand-GPU / Special-CPU

If so, this would further amplify option fan-out, since the number of WorkloadOptions would grow as the product of flavor choices per PodSet rather than just the number of GPU flavors. It would be helpful to clarify whether this cross-product behavior is intended, and if not, how option generation is constrained in multi-PodSet scenarios.

Well, it depends on the admin's intention. If we assume that the GPU is the dominant resource—which is often true in real-world setups—and that the migration of jobs is only relevant on the GPU axis (as it is the most expensive and scarce resource), then the number of WorkloadOptions would remain the same. An admin could simply add Special-CPU to the list of AllowedResourceFlavors to allow the workload to be scheduled on it.

If we wanted to migrate on the CPU axis as well, then in this example, it would indeed result in a cross-product of RFs. However, I consider this a less likely setup for real-world use cases

Well, it depends on the admin's intention.

Here, and above comments with "scalability" context, I am considering the worst case scenario, i.e., Big-O.

Still, I'd argue the worst case scenario depends on the use-case. With Concurrent Admission the number of Options per Job is a product of all migration dimension. If the only dimension we want to migrate is GPU, and we treat different CPUs flavors as equal good, then the product is equal 1 (CPU dimension) x #GPU-flavors. If we want to migrate both in GPU and CPU dimensions then indeed, the product is #CPU-flavors x #GPU-flavors.

Theoretically, this API allows to create 2^#RF-2 different Options, because that's the number of all RF subsets excluding an empty one, and the one containing all RFs. However in environment with a lot of different flavors I treat it more as a misconfiguration, rather than the real-world worst case scenario. Misconfiguration is already one the points in the Risk section

I think I understand your point, and it may be partially related to an earlier issue I reported around the runtime complexity of the flavor assignment logic in Kueue: #6121.

That issue focused on the nested-loop structure in assignFlavor, where flavor resolution scales with the number of PodSets, resources per PodSet, and flavors per resource group. While the report itself may be somewhat outdated, I believe the underlying complexity analysis still holds and is worth keeping in mind as workloads and flavor configurations grow more complex.

From that perspective, the complexity already exists today. Splitting admission into multiple Workload objects (for example, per ResourceFlavor) does not introduce a new class of complexity, but instead helps scope and contain the existing evaluation work. Each Option operates over a narrower flavor set, which can make the admission logic easier to reason about and potentially reduce per-attempt cost.

Theoretically, this API allows to create 2^#RF-2 different Options, because that's the number of all RF subsets excluding an empty one, and the one containing all RFs. However in environment with a lot of different flavors I treat it more as a misconfiguration, rather than the real-world worst case scenario. Misconfiguration is already one the points in the Risk section

Yes, but I think it is preferred to guide users against such misconfigurations. Adding validation to prevent blowing up complexity is the strategy in Kueue (say with limits for the number of flavors capped at 64, or resources). Similarly we cap the number of levels in Topology or the number of Clusters in MultiKueue.
This allows us to reason about the complexity. Sure, sometimes it means that we need to relax them if use cases prove higher numbers to be useful. It is much easier to relax than strengthen validaiton.

So, what about limiting the feature if the number of flavors is <= 8, or the number of ExplicitOptions <= 16.

ichekrygin · 2026-02-03T16:43:47Z

keps/8691-concurrent-admission/README.md

+2) Option Workload: A cloned view of the Parent Workload with specific scheduling constraints. Most notably, an Option is restricted to a subset of ResourceFlavors.
+
+### Architecture & Cardinality
+The relationship between a Parent and its Options follows a parent–child model with 1:N cardinality (where $N \ge 1$). While the number of Options is typically determined by the variety of PodSets and ClusterQueue ResourceFlavors, each remains a distinct Kubernetes object persisted in etcd.


Thank you for expanding on this construct. I see clear value in the Parent/Option split, not only for flavor-specific admission and migration, but also as a way to reduce pressure on the current Workload object, which today is mutated by multiple concurrent Kueue controllers and serves several roles at once.

The Parent/Option model provides better scope isolation: the Parent acts as a stable definition and aggregation point, while Options encapsulate admission and scheduling context, without requiring changes to the scheduler or quota logic. This separation also hints at benefits beyond flavor-related scenarios, for example around clearer mutability boundaries and reduced update contention.

Looking ahead, this pattern suggests a possible Phase-2 evolution toward a more explicit admission-focused construct with a reduced and well-defined mutability surface, similar to other Kubernetes parent/child models. For now, treating Parent and Option as the same Workload type feels like a pragmatic choice, as long as the design keeps the door open for such an evolution and doesn’t lock us into this specific representation long-term.

mimowo · 2026-02-04T20:49:15Z

keps/8691-concurrent-admission/README.md

+    OptionStatePending = "Pending"
+
+    // OptionStateAdmitted means the Option has been admitted
+    OptionStateAdmitted = "Admitted"


Couldn't the option mechanism be used with AdmissinoChecks? If so then we distinguish also the "QuotaReserved" state. Does the design consider them as "Pending"? I'm not clear if we need to make that distinction, but this information seems useful for decision making.

mimowo · 2026-02-04T21:09:42Z

keps/8691-concurrent-admission/README.md

+
+At any given point in time, only one Option per Parent may be admitted by Kueue.
+
+To support this, we will introduce a new controller and extend the ClusterQueue API with a new `.spec` field to manage Option activation and deactivation.


What is meant by "activation" and "deactivation" here? I'm confused because the options are talking about "Remove", while deactivation is a technical term. I think it makes sense to actually use deactivation rather than removal in some cases.

mimowo · 2026-02-04T21:16:11Z

keps/8691-concurrent-admission/README.md

+    RemoveLower OnSuccessPolicy = "RemoveLower"
+
+    // Stop all attempts below a defined target RF.
+    RemoveBelowTarget OnSuccessPolicy = "RemoveBelowTarget"


I'm wondering about naming here. Even if removal is preferred (to offload API server) I think deactivation should give an analogous effects, but could provide better debuggabilty. So, I'm wondering if we could keep the naming more flexible and use case oriented, like:

NoMigration

AllowUpgrades

AllowUpgradesAboveTarget

Then it would be secondary decision if we use removal or deactivation. It also seems more natural to an admin who cares about the effect rather than technical details.

wdyt?

Also, I feel it is unclear what it means "lower" or "below" for an option which spans multiple flavors, say one below and one above. And here maybe the naming makes also difference.

For example:

"RemoveLower" would intuitively mean to me: "remove the option as it contains target flavors lower than selected".

"AllowUpgrades" would mean "keep the option, because it allows an upgrade"

mimowo · 2026-02-04T21:17:32Z

keps/8691-concurrent-admission/README.md

+    RemoveBelowTargetConfig *ConcurrentAdmissionRemoveBelowTargetConfig
+}
+
+type OptionCreationCustomization struct {


nit, align the naming with ConcurrentAdmissionExplicitOption

mimowo · 2026-02-04T21:23:52Z

keps/8691-concurrent-admission/README.md

+
+### Graduation Criteria
+
+#### Alpha


Is there going to be some FG reflecting the maturity level. I know the API is opt-in, but without a feature gate indicating the level users may have wrong expectations about the maturity / stability of the new feature. Especially for complex features like this one some indication by feature gate is usually preferred.

mimowo · 2026-02-04T21:24:51Z

keps/8691-concurrent-admission/README.md

+type WorkloadType string
+const (
+    Default              WorkloadType = "Default"
+    ResourceFlavorOption WorkloadType = "ResourceFlavorOption"
+    Parent               WorkloadType = "Parent"
+    ... // possibly more like WorkloadSlice, PrebuiltWorkload
+)


Let's move to alternatives and just link from here to offload the section with technical details. This section for large KEPs tends to grow.

mimowo · 2026-02-04T21:30:35Z

keps/8691-concurrent-admission/README.md

+${original_workload_name}-option-${explicit_option_name}
+```
+
+Note: Option names are designed to be deterministic. If a name collision occurs (due to long Workload/RF names), standard Kubernetes suffix truncation logic will be applied while maintaining the -option- identifier.


What is "standard Kubernetes suffix truncation logic" here? I'm not sure, we implement it in Kueue this ourselves in pkg/controller/jobframework/workload_names.go, so if we don't put an additional truncation it may very well exceed the limit. Not just nit picking - I just don't understand what / how will happen.

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. release-note-none Denotes a PR that doesn't merit a release note. labels Jan 28, 2026

k8s-ci-robot requested review from gabesaba and mimowo January 28, 2026 15:45

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 28, 2026

Init concurrent options KEP

303598d

PBundyra force-pushed the options-kep branch from c58458e to 303598d Compare January 28, 2026 16:20

ichekrygin reviewed Jan 28, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Show resolved Hide resolved

ichekrygin reviewed Jan 28, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Show resolved Hide resolved

ichekrygin reviewed Jan 28, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Show resolved Hide resolved

ichekrygin reviewed Jan 28, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Outdated Show resolved Hide resolved

mimowo reviewed Jan 29, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Outdated Show resolved Hide resolved

mimowo reviewed Jan 29, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Show resolved Hide resolved

mimowo reviewed Jan 29, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Outdated Show resolved Hide resolved

kshalot mentioned this pull request Jan 29, 2026

KEP-8729: Add support for Workload Admission-Time Constraints for Preemption and Borrowing. #8844

Open

pajakd reviewed Jan 29, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Outdated Show resolved Hide resolved

pajakd reviewed Jan 29, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Outdated Show resolved Hide resolved

pajakd reviewed Jan 29, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Show resolved Hide resolved

PBundyra added 2 commits January 29, 2026 15:24

more details

1a1ab16

Add another story

ae0a950

ichekrygin reviewed Jan 29, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Show resolved Hide resolved

ichekrygin reviewed Jan 29, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Outdated Show resolved Hide resolved

Long running wl story

8073e4c

PBundyra force-pushed the options-kep branch from 3b33115 to 8073e4c Compare January 29, 2026 16:21

mimowo reviewed Jan 30, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Show resolved Hide resolved

mimowo reviewed Jan 30, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Outdated Show resolved Hide resolved

mimowo reviewed Jan 30, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Outdated Show resolved Hide resolved

mimowo reviewed Jan 30, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Outdated Show resolved Hide resolved

mimowo reviewed Jan 30, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Outdated Show resolved Hide resolved

mimowo reviewed Jan 30, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Outdated Show resolved Hide resolved

mimowo reviewed Jan 30, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Outdated Show resolved Hide resolved

PBundyra added 4 commits January 30, 2026 15:07

Improve

9e98c0e

Add a diagram

f200c4c

Improvements

610f284

Improvements

351840b

ichekrygin reviewed Jan 30, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Outdated Show resolved Hide resolved

ichekrygin reviewed Jan 30, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Show resolved Hide resolved

ichekrygin reviewed Jan 30, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Show resolved Hide resolved

ichekrygin reviewed Jan 30, 2026

View reviewed changes

keps/8691-concurrent-admission/README.md Show resolved Hide resolved

ichekrygin reviewed Jan 30, 2026

View reviewed changes

Make note about it being opt-in

59df3e1

PBundyra force-pushed the options-kep branch from f2eb00d to 59df3e1 Compare February 2, 2026 11:03

Improvements

4ebf7ec

ichekrygin reviewed Feb 3, 2026

View reviewed changes

mimowo reviewed Feb 4, 2026

View reviewed changes


		At any given point in time, only one Option per Parent may be admitted by Kueue.

		To support this, we will introduce a new controller and extend the ClusterQueue API with a new `.spec` field to manage Option activation and deactivation.

Conversation

PBundyra commented Jan 28, 2026

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

netlify bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kubernetes-sigs-kueue ready!

Uh oh!

k8s-ci-robot commented Jan 28, 2026

Uh oh!

PBundyra commented Jan 28, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ichekrygin commented Jan 28, 2026

Uh oh!

ichekrygin commented Jan 28, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mimowo commented Jan 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ichekrygin commented Jan 30, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mimowo Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

netlify bot commented Jan 28, 2026 •

edited

Loading

mimowo Feb 4, 2026 •

edited

Loading