Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Node Repair implementation #1793

Merged

Conversation

engedaam
Copy link
Contributor

@engedaam engedaam commented Oct 30, 2024

Fixes #N/A

Description

  • RFC: RFC: Node Auto Repair #1768
  • This PR is the implementation of the recommend solution defined in the node repair RFC
  • Defining a cloud provider interface RepairPolicy that will support node conditions that Karpenter will forcefully terminate nodes. The cloud provider policies will be unhealthy conditions a node can enter and the duration for Karpenter to react.

How was this change tested?

  • make resubmit

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 30, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 30, 2024
@coveralls
Copy link

coveralls commented Oct 30, 2024

Pull Request Test Coverage Report for Build 11886352800

Details

  • 70 of 95 (73.68%) changed or added relevant lines in 6 files are covered.
  • 4 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.07%) to 81.028%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controllers/nodeclaim/lifecycle/controller.go 4 7 57.14%
pkg/controllers/controllers.go 0 8 0.0%
pkg/controllers/node/health/controller.go 51 65 78.46%
Files with Coverage Reduction New Missed Lines %
pkg/test/expectations/expectations.go 2 94.73%
pkg/scheduling/requirements.go 2 98.01%
Totals Coverage Status
Change from base Build 11886240983: -0.07%
Covered Lines: 8734
Relevant Lines: 10779

💛 - Coveralls

@engedaam engedaam changed the title feat: Node Auto Repair implementation feat: Node Repair implementation Nov 7, 2024
@engedaam engedaam force-pushed the node-repair-implementation branch from 4635c80 to 192984f Compare November 7, 2024 23:39
@engedaam engedaam marked this pull request as ready for review November 7, 2024 23:53
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 7, 2024
@k8s-ci-robot k8s-ci-robot requested a review from jmdeal November 7, 2024 23:53
@engedaam engedaam force-pushed the node-repair-implementation branch 2 times, most recently from 2338123 to 8cefba7 Compare November 8, 2024 00:07
pkg/controllers/controllers.go Outdated Show resolved Hide resolved
pkg/controllers/controllers.go Outdated Show resolved Hide resolved
pkg/cloudprovider/types.go Outdated Show resolved Hide resolved
pkg/cloudprovider/types.go Outdated Show resolved Hide resolved
pkg/cloudprovider/types.go Outdated Show resolved Hide resolved
pkg/controllers/node/health/controller.go Outdated Show resolved Hide resolved
pkg/controllers/node/health/controller.go Outdated Show resolved Hide resolved
pkg/controllers/node/health/controller.go Show resolved Hide resolved
pkg/controllers/node/health/controller.go Show resolved Hide resolved
pkg/controllers/node/health/controller.go Outdated Show resolved Hide resolved
@engedaam engedaam force-pushed the node-repair-implementation branch 9 times, most recently from c8bed26 to 390c056 Compare November 8, 2024 16:14
@engedaam engedaam force-pushed the node-repair-implementation branch from 390c056 to 562ed1f Compare November 10, 2024 02:27
@k8s-ci-robot k8s-ci-robot added do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 10, 2024
@engedaam engedaam force-pushed the node-repair-implementation branch from 7275033 to 03e110a Compare November 10, 2024 16:07
@engedaam engedaam force-pushed the node-repair-implementation branch 4 times, most recently from 445cb11 to be92bc0 Compare November 17, 2024 01:58
@engedaam engedaam force-pushed the node-repair-implementation branch 3 times, most recently from f0186f9 to cab6157 Compare November 17, 2024 03:36
pkg/controllers/node/health/controller.go Outdated Show resolved Hide resolved
pkg/controllers/node/health/controller.go Outdated Show resolved Hide resolved
pkg/controllers/node/health/controller.go Outdated Show resolved Hide resolved
@engedaam engedaam force-pushed the node-repair-implementation branch from cab6157 to 200da87 Compare November 17, 2024 15:07
@engedaam engedaam force-pushed the node-repair-implementation branch 3 times, most recently from 14778f1 to a51bbba Compare November 17, 2024 23:01
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 17, 2024
@engedaam engedaam force-pushed the node-repair-implementation branch from a51bbba to 399a096 Compare November 18, 2024 02:10
@engedaam engedaam force-pushed the node-repair-implementation branch from 399a096 to 66b9eda Compare November 18, 2024 05:06
Copy link
Member

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Nov 18, 2024
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 18, 2024
Copy link
Member

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 18, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: engedaam, jonathan-innis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 8ce869c into kubernetes-sigs:main Nov 18, 2024
12 checks passed
@engedaam engedaam deleted the node-repair-implementation branch November 18, 2024 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants