KEP #8826: Uber Cluster Queues by mwielgus · Pull Request #8864 · kubernetes-sigs/kueue

mwielgus · 2026-01-28T16:15:37Z

What type of PR is this?

/kind documentation
/kind feature

What this PR does / why we need it:

Introduces support for hero (huge) jobs.

Which issue(s) this PR fixes:

Fixes #8826

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

k8s-ci-robot · 2026-01-28T16:15:47Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mwielgus
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

netlify · 2026-01-28T16:16:15Z

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Name	Link
🔨 Latest commit	`4b19785`
🔍 Latest deploy log	https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/697a7714bad3d000084f000d

keps/8826-uber-cluster-queues/kep.yaml

kannon92 · 2026-01-28T20:59:09Z

keps/8826-uber-cluster-queues/README.md

+Benefit: Compliance requirements are met without complex operational runbooks to "clear the
+cluster."
+
+### Risks and Mitigations


One major issue would be abuse of the "Hero" queue.

Sorta related to Resource Starvation but I think its more if someone uses this to effectively skip fair sharing.

What type of abuse do you have on your mind?

I guess to me every user I have ever worked with treats their jobs as super important and would love to skip the line.

So I guess the main protection would be that admins would only create a localqueue pointing to this Hero Queue on users they trust won't just submit blindly to this Queue.

We don't really have any enforcement but it is sorta the spiderman analogy "With great power comes great responsibility".

Access to UCQ can be restricted only to these individuals (namespaces/LQ) who the corresponding part of the organization trust. And yes, these people get great power over that set of quotas.

kannon92 · 2026-01-28T21:26:39Z

keps/8826-uber-cluster-queues/README.md

+* Nominal quotas stop to be guaranteed.
+* Well understood rules start to have exceptions.
+
+## Alternatives


It isn't clear to me why you can't model this as a dedicated ClusterQueue and a WorkloadPriorityClass.

Could you create a ClusterQueue that only special "heroes" submit too and mark the workload priority class as critical?

No matter how hight the priority is, it doesn't get into nominal quota. The whole trick is how to get into nominal quota with an outside workload and prevent their reclamation.

ichekrygin · 2026-01-28T22:45:49Z

keps/8826-uber-cluster-queues/README.md

+Limitations of Current Quota Models The necessity for the UberClusterQueue stems from specific
+rigidities in the current preemption logic.
+
+* Guaranteed Quota Immunity: In the standard model, a workload running within its ClusterQueue's


With Uber ClusterQueues, nominal quota is no longer a hard protection boundary but a best-effort guarantee subject to administrative override. That’s a significant change in the guarantees Kueue provides to ClusterQueue owners. I think we should be explicit about this shift, including what guarantees remain, what guarantees are weakened, and how operators are expected to communicate and govern the use of Uber CQs.

Uber CQ doesn’t just add power, it changes the trust model. ClusterQueue owners now have to trust that the override mechanism will be used sparingly and responsibly, because the system itself can no longer enforce absolute protection.

I agree that this is a significant change, however the alternatives work almost the same and the end result is the same.

In order to allow hero jobs, users cannot have absolute everlasting guarantees, because at some point in time that hero job might be started. And then their workloads will be interrupted. Either because of UCQ, or because someone temporarily changed quotas or because they used Fair Sharing with weights and nothing was ever guaranteed.

ichekrygin · 2026-01-28T22:51:31Z

One concern I have with the Uber ClusterQueue approach is transparency. For the “wartime” scenarios this KEP targets, I actually want disruption to be explicit and explainable. With Hold / HoldAndDrain, it’s immediately visible why workloads are evicted or not admitted, and the intent is clearly declared by an operator action.

With Uber CQ, the override is implicit: peacetime configuration remains unchanged, yet workloads in other ClusterQueues can be evicted or lose capacity due to someone else’s configuration. From a CQ owner’s perspective, guarantees effectively change without any change to their own spec, which makes reasoning, debugging, and accountability harder. Even if the behavior is correct, it is much less discoverable why it happened.

If this proposal moves forward, I think we need first-class signals (status, conditions, events) that explicitly indicate when a cohort and impacted CQs are effectively in a “wartime” mode and clearly attribute preemptions to an Uber CQ override.

ichekrygin · 2026-01-28T22:53:48Z

Another concern I have is precedent and escalation. Once we introduce a first-class mechanism that allows a ClusterQueue to override nominal quota guarantees, it becomes hard to draw a principled line against “hero’s hero” or multiple Uber ClusterQueues within the same cohort. Even if the intent is rare, one-off use, the API is permanent and will predictably attract requests for additional tiers, ordering between Uber CQs, or broader access over time.

At that point, we risk turning a narrowly scoped exception into an implicit hierarchy of dominance between ClusterQueues, which feels at odds with Kueue’s current model where guarantees are explicit, local, and stable. If this proposal moves forward, I think we need very strong guardrails (for example, enforcing a single Uber CQ per cohort subtree, time-bounded activation, or similar) to prevent this kind of escalation.

kannon92 · 2026-01-28T23:16:31Z

I seem to remember @dgrove-oss discussing with me how they implement something similar in IBM.

Maybe you have some thoughts here?

mwielgus · 2026-01-28T23:59:28Z

One concern I have with the Uber ClusterQueue approach is transparency. For the “wartime” scenarios this KEP targets, I actually want disruption to be explicit and explainable. With Hold / HoldAndDrain, it’s immediately visible why workloads are evicted or not admitted, and the intent is clearly declared by an operator action.

With Uber CQ, the override is implicit: peacetime configuration remains unchanged, yet workloads in other ClusterQueues can be evicted or lose capacity due to someone else’s configuration. From a CQ owner’s perspective, guarantees effectively change without any change to their own spec, which makes reasoning, debugging, and accountability harder. Even if the behavior is correct, it is much less discoverable why it happened.

If this proposal moves forward, I think we need first-class signals (status, conditions, events) that explicitly indicate when a cohort and impacted CQs are effectively in a “wartime” mode and clearly attribute preemptions to an Uber CQ override.

I see your point. The reason why there is no hard Hold/Hold and drain is that a CQ may be only partially affected. I try to distribute the impact of UCQ workload across many CQ. So this is not a binary scenario. Hero job may only need 50% of the subtree/CQ.

Definitely we can add some status information into the CQ - that a UCQ is around and running workloads so things may look weird. The KEP already mentions about workload observability and metrics.

mwielgus · 2026-01-29T00:05:26Z

Another concern I have is precedent and escalation. Once we introduce a first-class mechanism that allows a ClusterQueue to override nominal quota guarantees, it becomes hard to draw a principled line against “hero’s hero” or multiple Uber ClusterQueues within the same cohort. Even if the intent is rare, one-off use, the API is permanent and will predictably attract requests for additional tiers, ordering between Uber CQs, or broader access over time.

At that point, we risk turning a narrowly scoped exception into an implicit hierarchy of dominance between ClusterQueues, which feels at odds with Kueue’s current model where guarantees are explicit, local, and stable. If this proposal moves forward, I think we need very strong guardrails (for example, enforcing a single Uber CQ per cohort subtree, time-bounded activation, or similar) to prevent this kind of escalation.

I can imagine use cases where there are top-level UCQ and more local, inferior UCQ - it would work kind-of the same. Everything below or next to UCQ is treated as expandable, even if it is another below UCQ. 2xUCQ in the same cohort is weird and brings no point, can be banned.

Time bounded - i would give the full control of the execution to the users selected by the organization. Someone vetted that they know what they are doing. If they need to run UCQ workloads for 5h, so be it.

ichekrygin · 2026-01-29T01:58:03Z

I wanted to add a quick note on intent. I didn’t want to spam or monopolize this KEP discussion with a deep dive into alternatives, which is why I moved the detailed exploration of a different approach into a separate issue.

To be clear, I’m not trying to block this KEP. I think the motivating use case is real and important. My comments here are mostly about surfacing concerns around guarantees, precedent, and transparency, and exploring whether there might be other ways to address the same operational problem while preserving some of Kueue’s existing invariants.

Happy to continue feedback on this proposal on its own merits, and equally happy to discuss alternatives in parallel without derailing the main thread.

kannon92 · 2026-01-30T16:47:26Z

keps/8826-uber-cluster-queues/README.md

+* Borrowing Complexity: The current borrowing logic is constrained by fair sharing weights. A Hero
+job should not be constrained by "fairness"—it is inherently unfair.
+
+### Goals


For Hero workloads what are we doing if there is a queue of Hero workloads submitted to the same CQ.

Would it be treated as FIFO?

Regular rules apply. You can have either FIFO or BestEffort. A superhero job may preempt regular hero job based on priority. "Uberness" is between hero and regular workloads.

mimowo · 2026-02-04T13:00:23Z

keps/8826-uber-cluster-queues/README.md

+## Design Details
+
+
+### API


One area I would like to have discussed a bit more in the KEP, maybe in the Notes section is how plan to extend the API with the new configuration options. For example if we have use cases for excluding a certain CQ from the mechanism, or have some weights between the CQs which balance the quota taken from the queues.

I find this one of the main advantages of the alternative ResourceQuotaLease KEP proposal by @ichekrygin of introducing the dedicated CRD, that the configuration place is natural. And the lifetime of the custom configuration is nicely managed. Here the lifetime of the custom configuration is bound to CQ, so I think we should think ahead how we make the configuration intuitive to users.

ichekrygin · 2026-02-04T18:45:34Z

Definitely we can add some status information into the CQ - that a UCQ is around and running workloads so things may look weird. The KEP already mentions about workload observability and metrics.

Flushing out UCQ notifications to users could be a good mechanism to validate this KEP.

By making UCQ impact explicit and user-visible, especially at the ClusterQueue level, we can test whether the proposed behavior is understandable, discoverable, and actionable for workload owners who only have CQ-scoped visibility. If users can clearly see when and why their nominal capacity is affected by a UCQ, it becomes much easier to reason about guarantees, debug unexpected disruption, and build trust in the mechanism.

In that sense, well-defined UCQ notifications are not just an observability detail, they are a validation tool for whether the model itself is sound and usable in practice.

amy · 2026-02-05T20:48:53Z

keps/8826-uber-cluster-queues/README.md

+  - [Goals](#goals)
+  - [Non-Goals](#non-goals)
+- [Proposal](#proposal)
+  - [User Stories (Optional)](#user-stories-optional)


I'd be curious if you've seen examples where users are mixing inference and training workloads on a cluster. Where in some circumstance, inference workloads need to act as an "uber" workload that preempts training workloads. So for example maybe there's a spillover from the normal inference clusters during high traffic times.

That's an interesting use case :). I haven't heard about that need, but sure, once can put whatever they like to UCQ, inference indluded.

amy · 2026-02-05T21:03:37Z

+1 to @ichekrygin

Flushing out UCQ notifications to users could be a good mechanism to validate this KEP.

An overall concern I have for Kueue is instrumentation for scheduling decisions. Given in the past we've seen various bugs where its difficult to validate what the expected behavior should be. With the more explicit api contract @ichekrygin is proposing with leases, as an in between step that uber queues could interact to manipulate, we don't have to wonder as much about Kueue's internal in memory state.

Between this thread and the Lease thread... just want to confirm, it looks like lease is a pre-req for uber queues?

mwielgus · 2026-02-05T21:06:39Z

+1 to @ichekrygin

Flushing out UCQ notifications to users could be a good mechanism to validate this KEP.

An overall concern I have for Kueue is instrumentation for scheduling decisions. Given in the past we've seen various bugs where its difficult to validate what the expected behavior should be. With the more explicit api contract @ichekrygin is proposing with leases, as an in between step that uber queues could interact to manipulate, we don't have to wonder as much about Kueue's internal in memory state.

Between this thread and the Lease thread... just want to confirm, it looks like lease is a pre-req for uber queues?

Yes, some form of lease is required. UCQ does it automatically and kind-of behind the scenes, but sure, we could make it more explicit and visible.

ichekrygin · 2026-02-05T21:12:31Z

Yes, some form of lease is required. UCQ does it automatically and kind-of behind the scenes, but sure, we could make it more explicit and visible.

It would be very useful to flesh those details out explicitly in the KEP.

amy · 2026-02-05T22:06:30Z

Yes, some form of lease is required. UCQ does it automatically and kind-of behind the scenes, but sure, we could make it more explicit and visible.

@mwielgus More specifically. That leases can be used independently of UCQ? That's what I mean about pre-req.

Like @mimowo is mentioning in this comment for the interaction of UCQ & Lease: #8869 (comment)

I think we could somehow achieve it in the ResourceQuotaLease model. Maybe there is room to combine the two ideas: UberCQ configuration can say "Create ResourceQuotaLease when there is a workload pending in the CQ".

mimowo · 2026-02-06T16:18:51Z

keps/8826-uber-cluster-queues/README.md

+    // +optional
+    // +kubebuilder:default=DefaultPreemptionRules
+    // +kubebuilder:validation:Enum=DefaultPreemptionRules;UberClusterQueueRules
+    Rules PreemptionRules `json:"rules,omitempty"`


I find this "PreemptionRules" a bit misleading, because it dictates not just preemption, but also scheduling more broadly, for example this paragraph shows that also borrowing works different: https://github.com/kubernetes-sigs/kueue/pull/8864/changes#diff-55ae50e78b080bbdc5104cae08ba81a13caa5853e0094fae8ce10237bc3bec9eR126-R128

Maybe this is about calling the options "SchedulingRules"

mimowo · 2026-02-06T16:25:59Z

Yeah, the more I think about it the more I see some need for CRD, because:

the UCQ feels "too magical" sentiment
it is unclear to me where the extra configuration will be added, and I imagine the configuration can be big over time, but there is no "natural place", so I'm worried about overloading the spec.
with CRD we could for example activate / deactivate the mode easily. The UCQ requires deleting the CQ, or at least unpinning it
the extra CRD could evolve to naturally cover the use cases of other users mentioned in Support for Temporary Quota Overrides in ClusterQueue #8654, because it is a more generic model
I think the "automatic" mode could also exist based on the CRD. The CRD, when present could appoint a selected CQ as "uber" (inversion of control). The appointed CQ could have exactly the same "automatic" rights as in this KEP.

mwielgus · 2026-02-06T21:53:42Z

Yeah, the more I think about it the more I see some need for CRD, because:

the UCQ feels "too magical" sentiment

it is unclear to me where the extra configuration will be added, and I imagine the configuration can be big over time, but there is no "natural place", so I'm worried about overloading the spec.

with CRD we could for example activate / deactivate the mode easily. The UCQ requires deleting the CQ, or at least unpinning it

the extra CRD could evolve to naturally cover the use cases of other users mentioned in Support for Temporary Quota Overrides in ClusterQueue #8654, because it is a more generic model

I think the "automatic" mode could also exist based on the CRD. The CRD, when present could appoint a selected CQ as "uber" (inversion of control). The appointed CQ could have exactly the same "automatic" rights as in this KEP.

What do you want the CRD for?

k8s-ci-robot requested review from gabesaba and pajakd January 28, 2026 16:15

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 28, 2026

mwielgus force-pushed the uber branch from 16ea00b to 10b94fc Compare January 28, 2026 16:32

kannon92 reviewed Jan 28, 2026

View reviewed changes

keps/8826-uber-cluster-queues/kep.yaml Outdated Show resolved Hide resolved

KEP kubernetes-sigs#8826: Uber Cluster Queues

4b19785

mwielgus force-pushed the uber branch from 10b94fc to 4b19785 Compare January 28, 2026 20:52

kannon92 reviewed Jan 28, 2026

View reviewed changes

ichekrygin reviewed Jan 28, 2026

View reviewed changes

ichekrygin mentioned this pull request Jan 29, 2026

DRAFT: Temporary Nominal Quota Reassignment via ResourceQuotaLease (Alternative to Uber ClusterQueue) #8869

Open

kannon92 reviewed Jan 30, 2026

View reviewed changes

mimowo reviewed Feb 4, 2026

View reviewed changes

amy reviewed Feb 5, 2026

View reviewed changes

mimowo reviewed Feb 6, 2026

View reviewed changes

Conversation

mwielgus commented Jan 28, 2026

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

k8s-ci-robot commented Jan 28, 2026

Uh oh!

netlify bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ichekrygin commented Jan 28, 2026

Uh oh!

ichekrygin commented Jan 28, 2026

Uh oh!

kannon92 commented Jan 28, 2026

Uh oh!

mwielgus commented Jan 28, 2026

Uh oh!

mwielgus commented Jan 29, 2026

Uh oh!

ichekrygin commented Jan 29, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ichekrygin commented Feb 4, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amy commented Feb 5, 2026

Uh oh!

mwielgus commented Feb 5, 2026

Uh oh!

ichekrygin commented Feb 5, 2026

Uh oh!

amy commented Feb 5, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mimowo commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mwielgus commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

netlify bot commented Jan 28, 2026 •

edited

Loading

mimowo commented Feb 6, 2026 •

edited

Loading