Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: vm deletion when vmid is empty #434

Open
DvdChe opened this issue Feb 6, 2025 · 5 comments · May be fixed by #440
Open

[Bug]: vm deletion when vmid is empty #434

DvdChe opened this issue Feb 6, 2025 · 5 comments · May be fixed by #440
Labels
bug Something isn't working

Comments

@DvdChe
Copy link

DvdChe commented Feb 6, 2025

What happened

When we create a faulty machine deployment, CAPO tries to create related oscmachine but it remains stuck with no status even if we try a manual deletion.

It seems the ReconcileDeleteVm falls in condition where no decision is made whereas it should remove the OscMachine.

Step to reproduce

  1. Create a faulty MachineDeployment with no existing k8s version, e/g: spec.template.version: "yolo".

Capo will create related OscMachine and will remains stuck with no status and vm.resourceId .

  1. Fix the MachineDeployment

  2. MachineSet will be marked Deleting but OscMachine will be stuck and won't be deleted

Expected to happen

MachineDeployment should perform rollout of MachineSet and OscMachines attached to previous MachineSet should be destroyed

Add anything

cluster-api output

Environment

- Kubernetes version: (use `kubectl version`): 1.29
- OS (e.g. from `/etc/os-release`):
- Kernel (e.g. `uname -a`):
- cluster-api-provider-outscale version: main
- cluster-api version: v1.9.4
- Install tools:
- Kubernetes Distribution:
- Kubernetes Diestribution version:
@DvdChe DvdChe added the bug Something isn't working label Feb 6, 2025
@jfbus
Copy link
Contributor

jfbus commented Feb 7, 2025

Do you have logs from the caposc controller ? The reconciler is probably stuck in an error loop.

(As a side note: the official name is caposc, capo is the OpenStack provider)

@DvdChe
Copy link
Author

DvdChe commented Feb 7, 2025

I have no logs anymore but we seen that the controller make a http GET on osc api to fetch VM informatio, with vmid parameter empty and API respond a 404 continuously.

After fix mentionned in #435 , faulty oscmachine is removed.

@jfbus
Copy link
Contributor

jfbus commented Feb 7, 2025

The 404 error might come from https://github.com/outscale/cluster-api-provider-outscale/blob/main/controllers/oscmachine_keypair_controller.go#L122

reconcileDeleteKeypair does not properly handle not found errors, for which the result is not nil, nil but nil, [an error]

@DvdChe
Copy link
Author

DvdChe commented Feb 7, 2025

We're using pre-existing keypair so this function is not called , or deletion is not supposed to happens

@jfbus
Copy link
Contributor

jfbus commented Feb 7, 2025

The function is always called. The flow is:

  1. get key pair from vm
  2. check if key pair needs to be deleted
  3. delete it

steps 1 and 2 could be swapped, which would fix your edge case where vm does not exist (and therefore has no keypair)

I'll prepare a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
2 participants