Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Wait for instance termination before deleting nodeclaim #1195

Merged

Conversation

jigisha620
Copy link
Contributor

@jigisha620 jigisha620 commented Apr 19, 2024

Fixes #N/A

Description
Finalizers on nodeClaim and node should not be removed until the underlying instance is deleted to avoid leaking any resources. The current approach relies on retryable error being emitted by cloudProvider.Delete() and continues reconciliation until this error is received. However, that is not an ideal approach. When this approach was tested for AWS provider we found that some instances could take too long to delete.Hence in this PR, instead a status condition terminating is added on nodeclaim and if the status exists then we call cloudProvider.Get() to check if the instance is terminated. If it is terminated, then we can remove finalizer from the nodeClaim.

How was this change tested?
Tested on my local cluster and ran unit tests

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 19, 2024
@k8s-ci-robot k8s-ci-robot requested review from engedaam and jmdeal April 19, 2024 22:36
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 19, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @jigisha620. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 19, 2024
@coveralls
Copy link

coveralls commented Apr 19, 2024

Pull Request Test Coverage Report for Build 9487162594

Details

  • 75 of 95 (78.95%) changed or added relevant lines in 7 files are covered.
  • 9 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.06%) to 81.336%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/utils/node/node.go 4 6 66.67%
pkg/controllers/nodeclaim/termination/controller.go 17 21 80.95%
pkg/utils/termination/termination.go 24 29 82.76%
pkg/controllers/node/termination/controller.go 13 22 59.09%
Files with Coverage Reduction New Missed Lines %
pkg/test/expectations/expectations.go 2 93.69%
pkg/cloudprovider/types.go 7 85.8%
Totals Coverage Status
Change from base Build 9486831375: -0.06%
Covered Lines: 8306
Relevant Lines: 10212

💛 - Coveralls

Copy link
Member

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work! I didn't look at the testing, but overall the core code looks good. This is also going to be amazing for measuring how long the instance terminations take if we can matter the status transition time -> delete call time. Does it make sense to add a metric to measure this before actually removing the finalizer?

pkg/apis/v1beta1/nodeclaim_status.go Outdated Show resolved Hide resolved
pkg/utils/nodeclaim/nodeclaim.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclaim/termination/controller.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclaim/termination/controller.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclaim/termination/controller.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclaim/termination/controller.go Outdated Show resolved Hide resolved
pkg/controllers/node/termination/controller.go Outdated Show resolved Hide resolved
pkg/controllers/node/termination/controller.go Outdated Show resolved Hide resolved
@jigisha620 jigisha620 force-pushed the feat-nodoeclaim-terminating branch 4 times, most recently from 90c7dc3 to 67b8124 Compare April 25, 2024 04:45
Copy link
Contributor

@engedaam engedaam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

pkg/controllers/node/termination/controller.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclaim/termination/metrics.go Outdated Show resolved Hide resolved
pkg/utils/nodeclaim/nodeclaim.go Outdated Show resolved Hide resolved
pkg/utils/nodeclaim/nodeclaim.go Outdated Show resolved Hide resolved
pkg/utils/nodeclaim/nodeclaim.go Outdated Show resolved Hide resolved
pkg/utils/nodeclaim/nodeclaim.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclaim/termination/suite_test.go Outdated Show resolved Hide resolved
@jigisha620 jigisha620 force-pushed the feat-nodoeclaim-terminating branch 3 times, most recently from 8c83b5c to 5925061 Compare April 27, 2024 02:55
pkg/utils/nodeclaim/nodeclaim.go Outdated Show resolved Hide resolved
pkg/utils/nodeclaim/nodeclaim.go Outdated Show resolved Hide resolved
pkg/metrics/constants.go Outdated Show resolved Hide resolved
pkg/utils/node/node.go Show resolved Hide resolved
pkg/controllers/nodeclaim/termination/controller.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclaim/termination/metrics.go Outdated Show resolved Hide resolved
@jigisha620 jigisha620 force-pushed the feat-nodoeclaim-terminating branch from 5925061 to 3d1d85b Compare April 30, 2024 00:11
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 30, 2024
@jigisha620 jigisha620 force-pushed the feat-nodoeclaim-terminating branch from 3d1d85b to b60d4eb Compare April 30, 2024 23:31
pkg/controllers/nodeclaim/termination/suite_test.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclaim/termination/suite_test.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclaim/termination/suite_test.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclaim/termination/suite_test.go Outdated Show resolved Hide resolved
pkg/utils/node/suite_test.go Outdated Show resolved Hide resolved
@jigisha620 jigisha620 force-pushed the feat-nodoeclaim-terminating branch from b60d4eb to 2d26023 Compare May 2, 2024 17:36
@jigisha620 jigisha620 force-pushed the feat-nodoeclaim-terminating branch from 2d26023 to 6ac7eaa Compare May 6, 2024 16:26
pkg/metrics/constants.go Outdated Show resolved Hide resolved
pkg/utils/nodeclaim/nodeclaim.go Outdated Show resolved Hide resolved
pkg/utils/nodeclaim/nodeclaim.go Outdated Show resolved Hide resolved
pkg/utils/nodeclaim/nodeclaim.go Outdated Show resolved Hide resolved
pkg/utils/nodeclaim/nodeclaim.go Outdated Show resolved Hide resolved
pkg/utils/nodeclaim/nodeclaim.go Outdated Show resolved Hide resolved
pkg/utils/node/node.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclaim/termination/metrics.go Outdated Show resolved Hide resolved
pkg/controllers/node/termination/controller.go Outdated Show resolved Hide resolved
pkg/cloudprovider/fake/cloudprovider.go Outdated Show resolved Hide resolved
@jigisha620 jigisha620 force-pushed the feat-nodoeclaim-terminating branch 3 times, most recently from 67adb7e to 9ce1415 Compare May 9, 2024 12:11
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 10, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 11, 2024
pkg/utils/termination/suite_test.go Outdated Show resolved Hide resolved
pkg/utils/termination/suite_test.go Show resolved Hide resolved
pkg/utils/termination/suite_test.go Outdated Show resolved Hide resolved
pkg/utils/termination/suite_test.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclaim/termination/controller.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclaim/termination/suite_test.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclaim/termination/suite_test.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclaim/termination/suite_test.go Outdated Show resolved Hide resolved
@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 11, 2024
@jigisha620 jigisha620 force-pushed the feat-nodoeclaim-terminating branch from b92e786 to e997a44 Compare June 11, 2024 16:43
Copy link
Member

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for all of the testing, I would tidy up the description of the test. I should be able to read the description of the test and immediately reason about what it should be doing and validating without having to read through the details of the test itself

pkg/controllers/nodeclaim/termination/suite_test.go Outdated Show resolved Hide resolved
@jigisha620 jigisha620 force-pushed the feat-nodoeclaim-terminating branch 4 times, most recently from 1f6cd79 to 54c39e7 Compare June 11, 2024 20:52
pkg/controllers/nodeclaim/termination/controller.go Outdated Show resolved Hide resolved
pkg/utils/termination/suite_test.go Outdated Show resolved Hide resolved
pkg/utils/termination/suite_test.go Outdated Show resolved Hide resolved
pkg/utils/termination/suite_test.go Show resolved Hide resolved
pkg/utils/termination/suite_test.go Show resolved Hide resolved
pkg/utils/termination/suite_test.go Show resolved Hide resolved
pkg/utils/termination/suite_test.go Show resolved Hide resolved
@jigisha620 jigisha620 force-pushed the feat-nodoeclaim-terminating branch from 54c39e7 to d28a984 Compare June 12, 2024 01:10
Copy link
Member

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 12, 2024
@jonathan-innis
Copy link
Member

/hold Waiting for the E2E tests in the downstream repo in the AWS Provider to pass with these new changes

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 12, 2024
@jonathan-innis
Copy link
Member

/unhold Tests passed and are running in a reasonable time. This should be GTG

Copy link
Contributor Author

@jigisha620 jigisha620 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 12, 2024
@jigisha620 jigisha620 force-pushed the feat-nodoeclaim-terminating branch from 84087a7 to 2e5f91e Compare June 12, 2024 17:35
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 12, 2024
Copy link
Member

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 12, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jigisha620, jonathan-innis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 24c9761 into kubernetes-sigs:main Jun 12, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants