Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opcap fails to delete namespaces created for operators. #329

Closed
acmenezes opened this issue Dec 9, 2022 · 4 comments · Fixed by #336
Closed

opcap fails to delete namespaces created for operators. #329

acmenezes opened this issue Dec 9, 2022 · 4 comments · Fixed by #336
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@acmenezes
Copy link
Contributor

Bug Description

opcap fails to delete namespaces between individual operator audits

Version and Command Invocation

v0.2.0 opcap check

Steps to Reproduce:

  1. Running against a full size cluster opcap check

Expected Result

All resources created by opcap to be deleted after each operator audit.

Actual Result

Multiple audits for individual operators throw the following error:

{"level":"error","ts":1670553181.705024,"caller":"logger/logger.go:62","message":"cleanup failed: could not delete namespace: opcap-infoscale-licensing-operator-allnamespaces: Internal error occurred: admission plugin \"ValidatingAdmissionWebhook\" failed to complete validation in 13s","stacktrace":"github.com/opdev/opcap/internal/logger.Errorf\n\t/home/alex/go/src/github.com/acmenezes/opcap/internal/logger/logger.go:62\ngithub.com/opdev/opcap/internal/capability.cleanup\n\t/home/alex/go/src/github.com/acmenezes/opcap/internal/capability/auditor.go:162\ngithub.com/opdev/opcap/internal/capability.RunAudits\n\t/home/alex/go/src/github.com/acmenezes/opcap/internal/capability/auditor.go:226\ngithub.com/opdev/opcap/cmd.runAudits\n\t/home/alex/go/src/github.com/acmenezes/opcap/cmd/check.go:76\ngithub.com/opdev/opcap/cmd.checkRunE\n\t/home/alex/go/src/github.com/acmenezes/opcap/cmd/check.go:71\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/alex/go/pkg/mod/github.com/spf13/[email protected]/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/alex/go/pkg/mod/github.com/spf13/[email protected]/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/alex/go/pkg/mod/github.com/spf13/[email protected]/command.go:968\ngithub.com/spf13/cobra.(*Command).ExecuteContext\n\t/home/alex/go/pkg/mod/github.com/spf13/[email protected]/command.go:961\ngithub.com/opdev/opcap/cmd.Execute\n\t/home/alex/go/src/github.com/acmenezes/opcap/cmd/root.go:44\nmain.main\n\t/home/alex/go/src/github.com/acmenezes/opcap/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}

Additional Context

The cause can be related to timing issues like trying to delete or create resources too fast and/or related to finalizers that are not being removed for an unknown reason and preventing the cluster to finish the delete operation.

@acmenezes acmenezes added the kind/bug Categorizes issue or PR as related to a bug. label Dec 9, 2022
@acmenezes acmenezes self-assigned this Dec 9, 2022
@madorn
Copy link
Contributor

madorn commented Dec 9, 2022

We are relying on the deletion of the Namespace to cleanup the lingering Operator CSV and associated Operator controller Deployment. This can often result in Namespace stuck in Terminating status when Namespace controller attempts resource cleanup.

Per discussion with @acmenezes and @bcrochet, let's add an explicit deletion of the Operator CSV immediately after options.client.DeleteSubscription in the operator_cleanup.go.

@acmenezes
Copy link
Contributor Author

We are relying on the deletion of the Namespace to cleanup the lingering Operator CSV and associated Operator controller Deployment. This can often result in Namespace stuck in Terminating status when Namespace controller attempts resource cleanup.

Per discussion with @acmenezes and @bcrochet, let's add an explicit deletion of the Operator CSV immediately after options.client.DeleteSubscription in the operator_cleanup.go.

Right @madorn I'll investigate that option. Although it looks like an intermittent problem. I was able to run it in full this afternoon with all namespaces being cleared correctly and all resources cleaned up well.

@bcrochet
Copy link
Contributor

bcrochet commented Dec 9, 2022

Should also check that operands are being deleted. Currently, the deletion is fire and forget. Could implement a goroutine to fire off for each custom resource, and wait for completion or a time out.

@madorn
Copy link
Contributor

madorn commented Dec 17, 2022

Opened up #337 per @bcrochet's suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants