Skip to content

Conversation

@SarCode
Copy link

@SarCode SarCode commented Oct 27, 2025

Adds nodeSelector field to HighNodeUtilizationArgs and LowNodeUtilizationArgs to allow filtering nodes per plugin.

This enables different plugins in the same profile to target different sets of nodes based on labels, solving the limitation where only global nodeSelector was available.

Use case: Run HighNodeUtilization only on nodes blocked by cluster-autoscaler while other plugins run on all nodes.

Changes:

  • Add NodeSelector field to both plugin Args structs
  • Implement node filtering logic in Balance() methods
  • Add nodeMatchesSelector helper function
  • Generate deepcopy code for new fields
  • Add comprehensive tests for nodeSelector logic

Addresses: #1486

Description

Checklist

Please ensure your pull request meets the following criteria before submitting
for review, these items will be used by reviewers to assess the quality and
completeness of your changes:

  • Code Readability: Is the code easy to understand, well-structured, and consistent with project conventions?
  • Naming Conventions: Are variable, function, and structs descriptive and consistent?
  • Code Duplication: Is there any repeated code that should be refactored?
  • Function/Method Size: Are functions/methods short and focused on a single task?
  • Comments & Documentation: Are comments clear, useful, and not excessive? Were comments updated where necessary?
  • Error Handling: Are errors handled appropriately ?
  • Testing: Are there sufficient unit/integration tests?
  • Performance: Are there any obvious performance issues or unnecessary computations?
  • Dependencies: Are new dependencies justified ?
  • Logging & Monitoring: Is logging used appropriately (not too verbose, not too silent)?
  • Backward Compatibility: Does this change break any existing functionality or APIs?
  • Resource Management: Are resources (files, connections, memory) managed and released properly?
  • PR Description: Is the PR description clear, providing enough context and explaining the motivation for the change?
  • Documentation & Changelog: Are README and docs updated if necessary?

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 27, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Oct 27, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @SarCode!

It looks like this is your first PR to kubernetes-sigs/descheduler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/descheduler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 27, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @SarCode. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign ingvagabund for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Adds nodeSelector field to HighNodeUtilizationArgs and
LowNodeUtilizationArgs to allow filtering nodes per plugin.

This enables different plugins in the same profile to target
different sets of nodes based on labels, solving the limitation
where only global nodeSelector was available.

Use case: Run HighNodeUtilization only on nodes blocked by
cluster-autoscaler while other plugins run on all nodes.

Changes:
- Add NodeSelector field to both plugin Args structs
- Implement node filtering logic in Balance() methods
- Add nodeMatchesSelector helper function
- Generate deepcopy code for new fields
- Add comprehensive tests for nodeSelector logic

Addresses: kubernetes-sigs#1486
@SarCode SarCode force-pushed the feature/per-plugin-node-selector branch from 65eef0b to 5fb5911 Compare October 27, 2025 13:57
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Oct 27, 2025
@SarCode
Copy link
Author

SarCode commented Oct 28, 2025

/ok-to-test

@k8s-ci-robot
Copy link
Contributor

@SarCode: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@SarCode
Copy link
Author

SarCode commented Oct 28, 2025

Apologies if I am being too eager but is there a way I could get a review on this, it's my first open source contribution so really excited for this 😁

@googs1025 or @ingvagabund maybe?

@googs1025
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 29, 2025
@SarCode
Copy link
Author

SarCode commented Oct 29, 2025

/retest

@SarCode
Copy link
Author

SarCode commented Oct 29, 2025

/retest

@SarCode
Copy link
Author

SarCode commented Oct 29, 2025

Formatting issues fixed 😃

@SarCode
Copy link
Author

SarCode commented Oct 30, 2025

@googs1025, the tests went through 👌

@ingvagabund
Copy link
Contributor

@SarCode thank you for your first contribution here and welcome to the open source community.

The current approach is less practical since only two plugin will benefit from it. Also, each plugin will need to specify the node selector which is more suitable for a profile level field. Saying that I recommend to take the more generic approach and define a node selector field on a profile level.

@SarCode
Copy link
Author

SarCode commented Oct 30, 2025

Thank you for the kind words @ingvagabund !

I was thinking of the same thing, but I thought it would be easier if I contained it to just Utilization use-case, I will make some changes and make it more generic 👌

@SarCode
Copy link
Author

SarCode commented Oct 30, 2025

What if we add optional profile-level nodeSelector while keeping existing plugin-level support:

type DeschedulerProfile struct {
    Name          string         `json:"name"`
    PluginConfigs []PluginConfig `json:"pluginConfig"`
    Plugins       Plugins        `json:"plugins"`
    NodeSelector  map[string]string `json:"nodeSelector,omitempty"` // NEW
}

Will look something like this

profiles:
- name: default
  nodeSelector:           # Applies to ALL plugins in profile
    node-type: worker
  pluginConfig:
  - name: LowNodeUtilization
    args: {}              # No plugin-level nodeSelector needed

@ingvagabund
Copy link
Contributor

NodeSelector needs to be a string to keep with the current globally scoped node selector.

@ingvagabund
Copy link
Contributor

What if we add optional profile-level nodeSelector while keeping existing plugin-level support

Would there be any use for the plugin-level node selector?

@songtao98
Copy link

Thank you for your contribution since I barely have time to fix it... I will review it recently as soon as possible, too.

@songtao98
Copy link

songtao98 commented Oct 31, 2025

What if we add optional profile-level nodeSelector while keeping existing plugin-level support

Would there be any use for the plugin-level node selector?

Agree. Just profile-level node selector seems to be ok. Because from my perspective, if someone wants to configure LowNodeUtilization with "no" nodeSelector while configure another plugin with "some" nodeSelector, it's better devided into 2 seperate profiles.

@SarCode
Copy link
Author

SarCode commented Oct 31, 2025

If I understand this correctly, the descheduler should run with separate profiles if they want different nodeSelector?

apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
profiles:
# Profile 1: For worker nodes only
- name: "worker-nodes"
  nodeSelector: "node-type=worker"  # Profile-level nodeSelector
  pluginConfig:
  - name: "LowNodeUtilization"
    args:
      thresholds:
        cpu: 20
        memory: 20
      targetThresholds:
        cpu: 50
        memory: 50
  plugins:
    deschedule:
      enabled:
      - "LowNodeUtilization"

# Profile 2: For master nodes only  
- name: "master-nodes"
  nodeSelector: "node-type=master"  # Different nodeSelector
  pluginConfig:
  - name: "HighNodeUtilization"
    args:
      thresholds:
        cpu: 80
        memory: 80
  plugins:
    balance:
      enabled:
      - "HighNodeUtilization"

Does descheduler supports this right now? cause if I remember correctly nodeSelector is global for all profiles

@SarCode
Copy link
Author

SarCode commented Oct 31, 2025

Looking at the docs it seems like profile right now does have a nodeSelector config

create your own Evictor Plugin or use the Default one provided by Descheduler. Other uses for the Evictor plugin can be to sort, filter, validate or group pods by different criteria, and that's why this is handled by a plugin and not configured in the top level config

https://github.com/kubernetes-sigs/descheduler?tab=readme-ov-file#evictor-plugin-configuration-default-evictor

@songtao98
Copy link

songtao98 commented Oct 31, 2025

If I understand this correctly, the descheduler should run with separate profiles if they want different nodeSelector?

apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
profiles:
# Profile 1: For worker nodes only
- name: "worker-nodes"
  nodeSelector: "node-type=worker"  # Profile-level nodeSelector
  pluginConfig:
  - name: "LowNodeUtilization"
    args:
      thresholds:
        cpu: 20
        memory: 20
      targetThresholds:
        cpu: 50
        memory: 50
  plugins:
    deschedule:
      enabled:
      - "LowNodeUtilization"

# Profile 2: For master nodes only  
- name: "master-nodes"
  nodeSelector: "node-type=master"  # Different nodeSelector
  pluginConfig:
  - name: "HighNodeUtilization"
    args:
      thresholds:
        cpu: 80
        memory: 80
  plugins:
    balance:
      enabled:
      - "HighNodeUtilization"

Does descheduler supports this right now? cause if I remember correctly nodeSelector is global for all profiles

Yes, your example is just what I want to implement and proposed inside #1486 .
I don't think descheduler supports this right now. That's why we implement it here :)

@songtao98
Copy link

songtao98 commented Oct 31, 2025

Looking at the docs it seems like profile right now does have a nodeSelector config

create your own Evictor Plugin or use the Default one provided by Descheduler. Other uses for the Evictor plugin can be to sort, filter, validate or group pods by different criteria, and that's why this is handled by a plugin and not configured in the top level config

https://github.com/kubernetes-sigs/descheduler?tab=readme-ov-file#evictor-plugin-configuration-default-evictor

If I understand it right, this is a plugin-level nodeSelector only for DefaultEvictor.

@SarCode
Copy link
Author

SarCode commented Oct 31, 2025

Looking at the docs it seems like profile right now does have a nodeSelector config

create your own Evictor Plugin or use the Default one provided by Descheduler. Other uses for the Evictor plugin can be to sort, filter, validate or group pods by different criteria, and that's why this is handled by a plugin and not configured in the top level config

https://github.com/kubernetes-sigs/descheduler?tab=readme-ov-file#evictor-plugin-configuration-default-evictor

If I understand it right, this is a plugin-level nodeSelector only for default evictor

I tried the two profile config with different nodeSelector and it worked

apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
profiles:
  # Profile 1: HighNodeUtilization targeting only worker nodes
  - name: high-utilization-workers
    pluginConfig:
      - name: DefaultEvictor
        args:
          nodeSelector: "type=worker"  # Only process worker nodes
          evictLocalStoragePods: true
          ignorePvcPods: true
          nodeFit: true
      - name: HighNodeUtilization
        args:
          thresholds:
            cpu: 50
            memory: 50
    plugins:
      balance:
        enabled:
          - HighNodeUtilization

  # Profile 2: Other plugins targeting all non-control-plane nodes
  - name: general-policies
    pluginConfig:
      - name: DefaultEvictor
        args:
          nodeSelector: "!node-role.kubernetes.io/control-plane"  # All nodes except control-plane
          evictLocalStoragePods: true
          ignorePvcPods: true
          nodeFit: true
      - name: RemovePodsViolatingNodeAffinity
        args:
          nodeAffinityType:
            - requiredDuringSchedulingIgnoredDuringExecution
      - name: RemovePodsViolatingNodeTaints
        args:
          includePreferNoSchedule: true
      - name: RemovePodsViolatingInterPodAntiAffinity
      - name: RemovePodsViolatingTopologySpreadConstraint
    plugins:
      balance:
        enabled:
          - RemovePodsViolatingTopologySpreadConstraint
      deschedule:
        enabled:
          - RemovePodsViolatingNodeTaints
          - RemovePodsViolatingNodeAffinity
          - RemovePodsViolatingInterPodAntiAffinity

The logs confirmed:

Profile 1 correctly processed only worker nodes for HighNodeUtilization
Profile 2 correctly processed all non-control-plane nodes for other plugins

@songtao98
Copy link

Looking at the docs it seems like profile right now does have a nodeSelector config

create your own Evictor Plugin or use the Default one provided by Descheduler. Other uses for the Evictor plugin can be to sort, filter, validate or group pods by different criteria, and that's why this is handled by a plugin and not configured in the top level config

https://github.com/kubernetes-sigs/descheduler?tab=readme-ov-file#evictor-plugin-configuration-default-evictor

If I understand it right, this is a plugin-level nodeSelector only for default evictor

I tried the two profile config with different nodeSelector and it worked

apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
profiles:
  # Profile 1: HighNodeUtilization targeting only worker nodes
  - name: high-utilization-workers
    pluginConfig:
      - name: DefaultEvictor
        args:
          nodeSelector: "type=worker"  # Only process worker nodes
          evictLocalStoragePods: true
          ignorePvcPods: true
          nodeFit: true
      - name: HighNodeUtilization
        args:
          thresholds:
            cpu: 50
            memory: 50
    plugins:
      balance:
        enabled:
          - HighNodeUtilization

  # Profile 2: Other plugins targeting all non-control-plane nodes
  - name: general-policies
    pluginConfig:
      - name: DefaultEvictor
        args:
          nodeSelector: "!node-role.kubernetes.io/control-plane"  # All nodes except control-plane
          evictLocalStoragePods: true
          ignorePvcPods: true
          nodeFit: true
      - name: RemovePodsViolatingNodeAffinity
        args:
          nodeAffinityType:
            - requiredDuringSchedulingIgnoredDuringExecution
      - name: RemovePodsViolatingNodeTaints
        args:
          includePreferNoSchedule: true
      - name: RemovePodsViolatingInterPodAntiAffinity
      - name: RemovePodsViolatingTopologySpreadConstraint
    plugins:
      balance:
        enabled:
          - RemovePodsViolatingTopologySpreadConstraint
      deschedule:
        enabled:
          - RemovePodsViolatingNodeTaints
          - RemovePodsViolatingNodeAffinity
          - RemovePodsViolatingInterPodAntiAffinity

The logs confirmed:

Profile 1 correctly processed only worker nodes for HighNodeUtilization Profile 2 correctly processed all non-control-plane nodes for other plugins

@SarCode Thanks for your verification! But just as I described, this use-case is trying to configure DefaultEvictor in 2 profiles. The nodeSelector is DefaultEvictor's argument. It's a tricky way to meet the initial requirement because now only DefaultEvictor implements EvictorPlugin interface. If someday we have a "FooEvictor" that implements EvictorPlugin, then the "FooEvictor" must supports nodeSelector as well.

So what I hope to do is add nodeSelector in DeschedulerProfile maybe like this:

type DeschedulerProfile struct {
	Name          string
	PluginConfigs []PluginConfig
	Plugins       Plugins
    NodeSelector *metav1.LabelSelector (or *string)
}

This can prevent that every plugin need to support nodeSelector one by one.

@SarCode
Copy link
Author

SarCode commented Nov 5, 2025

My intention behind this all is to run HighNodeUtilization only on nodes which can't be removed by cluster-autoscaler maybe due to local-storage or whatever.

Meanwhile all other plugins can run on all nodes.

@ingvagabund
Copy link
Contributor

NodeSelector on the profile level can skip many nodes. Whereas, if it is used through the DefaultEvictor all nodes still gets processed. Which will not be that effective in big clusters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants