-
-
Notifications
You must be signed in to change notification settings - Fork 225
fix(vm): detach and delete network device when its config block is removed #2260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Closes #2259 |
|
@bpg I'm not sure removing |
|
Hey @hsnprsd 👋🏼 Could you pls check and sign DCO on your commits? About |
|
Hi @bpg 👋🏻 I'm not sure how to keep the clone functionality working. I need to set the field to |
|
@hsnprsd could you give me an example of a test config you have issues with? |
|
I cannot reproduce the original issue with this config: When I apply this, the VM is created successfully. When I comment the second network device block, the network device is removed as expected from the VM in proxmox and when I uncomment it again, it is added again as expected. EDIT: Removing a network device in proxmox also leads opentofu to re-add it again on the next plan |
|
You should try removing both devices at the same time. @TobiPeterG |
Oh wow, you are right. Deleting them both at the same time leads to no visible changes. Interesting. I'll test your branch. |
|
Can confirm, your branch fixes the issue. |
Can you provide an opentofu config for this case where it fails with a cloned VM? |
c506fb0 to
45747bd
Compare
…moved Signed-off-by: Ehsan Poursaeed <[email protected]>
Signed-off-by: Pavel Boldyrev <[email protected]>
Signed-off-by: Pavel Boldyrev <[email protected]>
45747bd to
56e2053
Compare
| }), | ||
| ), | ||
| }}}, | ||
| {"clone with network devices", []resource.TestStep{{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a failing test for cloned VM scenario @TobiPeterG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--- FAIL: TestAccResourceVMClone/clone_with_network_devices (5.15s)
resource_vm_test.go:1019: Step 1/1 error: After applying this test step, the non-refresh plan was not empty.
stdout:
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
# proxmox_virtual_environment_vm.clone will be updated in-place
~ resource "proxmox_virtual_environment_vm" "clone" {
id = "110"
~ ipv4_addresses = [] -> (known after apply)
~ ipv6_addresses = [] -> (known after apply)
~ network_interface_names = [] -> (known after apply)
# (25 unchanged attributes hidden)
- network_device {
- bridge = "vmbr0" -> null
- disconnected = false -> null
- enabled = true -> null
- firewall = false -> null
- mac_address = "BC:24:11:57:93:1C" -> null
- model = "virtio" -> null
- mtu = 0 -> null
- queues = 0 -> null
- rate_limit = 0 -> null
- vlan_id = 0 -> null
# (1 unchanged attribute hidden)
}
# (1 unchanged block hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
FAIL
FAIL github.com/bpg/terraform-provider-proxmox/fwprovider/test 10.549s
FAIL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please elaborate why the behaviour is wrong. If I understand it correctly, you create a VM with one network interface and then clone it to a VM without one. I would expect opentofu to remove the network interface from the VM when it's not in the config for the new VM. If you wanted the new VM to have the interface, you would add it in the config. What happens in this case?
More interesting is the question what happens when you clone a VM with multiple network interfaces and configure the cloned VM to also have these interfaces. Are they correctly mapped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether we want to keep the inherited network devices in the cloned VM or we expect the user to specify them. That is not my decision to make as I don't know what the intent was. If network devices should be specified whether or not they are cloned from a source VM, this PR solves the issue of removing extra network devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that we want to define the state of the infrastructure through opentofu, everything that is in the opentofu config should be there in reality and everything that is there in reality should be configured through opentofu. As a user, I would be confused when my opentofu config doesn't mention a network interface, but it actually exists on my infrastructure. But I guess it's bpg's call to make.
I would also wonder: How would you remove a network interface from a VM you clone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but this PR introduces a breaking change — anyone updating the provider would suddenly see Terraform trying to remove all network devices from cloned VMs that don’t have network devices specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent was that if a clone has nothing except a reference to the template, it should create an exact copy with all devices inherited from the template. There are a few tests that capture this behavior, and it’s also somewhat documented here: https://registry.terraform.io/providers/bpg/proxmox/latest/docs/guides/clone-vm
And I don't think this behaviour should be changed.
So we can’t just delete a device if the clone doesn’t have it. Unfortunately, there’s no good workaround to capture this behaviour in the legacy VM resource, since devices are represented as a list, not a map as they should be.
In an ideal world (FWK provider, v1.+), we’d be able to define something like:
network {
"net0" = {
// network device attributes
}
}in the template, and then:
network {
"net0" = {} // or null
}in the clone to explicitly remove it.
I understand the desire to have a “universal” template that includes everything, and then trim the excess when cloning to a specific VM. However, complex three-way merging of configurations (template + clone + defaults) when adding, changing, or deleting things is a non-trivial and error-prone task.
My go-to approach is to use cloud-init to configure the base OS image the way I need, skipping cloning altogether. I know there are tons of cases where this may not work, but I just wanted to provide some rationale for why cloning is, at best, a bit dicey.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My go-to approach is to use cloud-init to configure the base OS image the way I need, skipping cloning altogether.
I do this too. As a result, I'm not really qualified to jump in on this discussion since I don't use cloning, nor have I looked into how the Proxmox API exposes cloning, but I've run into similar minor issues when importing resources from public cloud providers....
Occasionally when importing a public cloud resource, I find that running a plan immediately after results in a proposed "change". Most of the time I don't care and just run apply to commit that setting into state, but other times I find that I don't actually want that setting and I need to edit my Terraform config to explicitly add that setting so I can change it to something else. Since the import succeeded with the config I had written, I've been assuming these minor anomalies are just due to added return values from the API call, or perhaps a mismatch in default values stored in the provider vs. the API vs. the cloud provider's web UI.
To me, Proxmox cloning feels like a server-side import of a VM resource. I wonder if it's worth considering cloning in this way? Ideally, the clone block would just signal to the provider to call the appropriate clone API to initially create the cloned vm, but the end result is a vm resource with all the required config a vm resource requires. Should some config be missing, modified, or added, those differences can either be applied at time of cloning, or on the next apply.
The intent was that if a clone has nothing except a reference to the template, it should create an exact copy with all devices inherited from the template.
I don't use cloning so I'm not really sure how the functionality works, but I suspect this matches how Proxmox works. But from a Terraform provider prospective, I think this inheritance is problematic. It seems tricky for both the provider and humans to reason about inherited config vs. explicit config and how to make changes to inherited config in the future. I've seen this issue come up occasionally in bug reports such as #1988. I think it's worth considering failing a cloned vm resource attempt if the vm resource block doesn't exactly match the source template (or with whatever minor changes that can be easily applied like a change in memory, or dropping a network interface).
Obviously this has massive backwards compatibility problems (and there are certainly nuances I'm unaware of), but I thought I'd throw out my idea that cloning could be considered a server-side resource "import". It certainly adds some friction when creating a cloned resource, but the end result would be a proper explicitly-defined vm resource that behaves like any other non-cloned vm resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent was that if a clone has nothing except a reference to the template, it should create an exact copy with all devices inherited from the template. There are a few tests that capture this behavior, and it’s also somewhat documented here: https://registry.terraform.io/providers/bpg/proxmox/latest/docs/guides/clone-vm
And I don't think this behaviour should be changed.
Could you please elaborate why this shouldn't change? Why wouldn't you just create a opentofu config matching the setup of the cloned VM to have an exact copy?
So we can’t just delete a device if the clone doesn’t have it. Unfortunately, there’s no good workaround to capture this behaviour in the legacy VM resource, since devices are represented as a list, not a map as they should be.
In an ideal world (FWK provider, v1.+), we’d be able to define something like:
network { "net0" = { // network device attributes } }in the template, and then:
network { "net0" = {} // or null }in the clone to explicitly remove it.
Aren't explicit removings defined by not specifying an attribute? That's how it work with other providers I use as well. When it's not defined to be there, it's removed.
…te VM Signed-off-by: Ehsan Poursaeed <[email protected]>
c00b9a6 to
be2fd0c
Compare
Contributor's Note
/docsfor any user-facing features or additions./fwprovider/testsfor any new or updated resources / data sources.make exampleto verify that the change works as expected.Proof of Work
Community Note