Skip to content

Conversation

@hsnprsd
Copy link
Contributor

@hsnprsd hsnprsd commented Oct 20, 2025

Contributor's Note

  • I have added / updated documentation in /docs for any user-facing features or additions.
  • I have added / updated acceptance tests in /fwprovider/tests for any new or updated resources / data sources.
  • I have ran make example to verify that the change works as expected.

Proof of Work

Community Note

  • Please vote on this pull request by adding a 👍 reaction to the original pull request comment to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for pull request followers and do not help prioritize the request

@hsnprsd
Copy link
Contributor Author

hsnprsd commented Oct 20, 2025

Closes #2259

@hsnprsd
Copy link
Contributor Author

hsnprsd commented Oct 21, 2025

@bpg I'm not sure removing computed: true is the right move. From what I understand, the network device list should exactly match the network_device blocks defined by the user—no extra or missing entries. The list itself shouldn’t be computed, but its inner fields can be.

@pull-request-size pull-request-size bot added size/L and removed size/M labels Oct 21, 2025
@hsnprsd hsnprsd marked this pull request as ready for review October 21, 2025 11:54
@bpg
Copy link
Owner

bpg commented Oct 22, 2025

Hey @hsnprsd 👋🏼
Thanks for contributing this!!

Could you pls check and sign DCO on your commits?

About Computed -- the list has to be computed as it can be updated by the provider when cloning from a template.

@hsnprsd
Copy link
Contributor Author

hsnprsd commented Oct 24, 2025

Hi @bpg 👋🏻

I'm not sure how to keep the clone functionality working. I need to set the field to Computed: false so Terraform will trigger an update when there aren't any network_device blocks in the config. Curious if you have any ideas on how to handle this.

@bpg
Copy link
Owner

bpg commented Oct 25, 2025

@hsnprsd could you give me an example of a test config you have issues with?

@TobiPeterG
Copy link
Contributor

TobiPeterG commented Oct 27, 2025

I cannot reproduce the original issue with this config:

resource "proxmox_virtual_environment_vm" "vm" {
  name         = "NAME"
  protection   = false
  node_name    = "NODE"
  pool_id      = "POOL"

  agent {
    # read 'Qemu guest agent' section, change to true only when ready
    enabled = false
  }

  cpu {
    cores   = 4
    type    = "host"
  }

  memory {
    dedicated = 2048
  }

  clone {
    node_name = "NODE"
    vm_id     = VMID

  }

  network_device {
      model   = "virtio"
      bridge  = "vmbr1"
      vlan_id = ID1
      mac_address = "02:6c:2f:73:ce:d4"
  }

  network_device {
      model   = "virtio"
      bridge  = "vmbr0"
      vlan_id = ID2
      mac_address = "02:6d:2a:74:cf:d5"
  }

  operating_system {
    type = "l26"
  }

  disk {
    datastore_id = "pool_nvme"
    interface    = "scsi0"
  }

  lifecycle {
    ignore_changes = [
      node_name,
      clone
    ]
  }
}

When I apply this, the VM is created successfully. When I comment the second network device block, the network device is removed as expected from the VM in proxmox and when I uncomment it again, it is added again as expected.

EDIT: Removing a network device in proxmox also leads opentofu to re-add it again on the next plan

@hsnprsd
Copy link
Contributor Author

hsnprsd commented Oct 27, 2025

You should try removing both devices at the same time. @TobiPeterG

@TobiPeterG
Copy link
Contributor

You should try removing both devices at the same time. @TobiPeterG

Oh wow, you are right. Deleting them both at the same time leads to no visible changes. Interesting. I'll test your branch.

@TobiPeterG
Copy link
Contributor

Can confirm, your branch fixes the issue.

@TobiPeterG
Copy link
Contributor

Hi @bpg 👋🏻

I'm not sure how to keep the clone functionality working. I need to set the field to Computed: false so Terraform will trigger an update when there aren't any network_device blocks in the config. Curious if you have any ideas on how to handle this.

Can you provide an opentofu config for this case where it fails with a cloned VM?
Or instructions on what to do with the cloned VM (or how to configure it beforehand)?

@bpg bpg added the ⌛ pending author's response Requested additional information from the reporter label Oct 28, 2025
@hsnprsd hsnprsd force-pushed the remove-network-device branch from c506fb0 to 45747bd Compare October 29, 2025 11:12
hsnprsd and others added 3 commits October 29, 2025 14:50
@hsnprsd hsnprsd force-pushed the remove-network-device branch from 45747bd to 56e2053 Compare October 29, 2025 11:20
}),
),
}}},
{"clone with network devices", []resource.TestStep{{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a failing test for cloned VM scenario @TobiPeterG

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    --- FAIL: TestAccResourceVMClone/clone_with_network_devices (5.15s)
        resource_vm_test.go:1019: Step 1/1 error: After applying this test step, the non-refresh plan was not empty.
            stdout:
            
            
            Terraform used the selected providers to generate the following execution
            plan. Resource actions are indicated with the following symbols:
              ~ update in-place
            
            Terraform will perform the following actions:
            
              # proxmox_virtual_environment_vm.clone will be updated in-place
              ~ resource "proxmox_virtual_environment_vm" "clone" {
                    id                      = "110"
                  ~ ipv4_addresses          = [] -> (known after apply)
                  ~ ipv6_addresses          = [] -> (known after apply)
                  ~ network_interface_names = [] -> (known after apply)
                    # (25 unchanged attributes hidden)
            
                  - network_device {
                      - bridge       = "vmbr0" -> null
                      - disconnected = false -> null
                      - enabled      = true -> null
                      - firewall     = false -> null
                      - mac_address  = "BC:24:11:57:93:1C" -> null
                      - model        = "virtio" -> null
                      - mtu          = 0 -> null
                      - queues       = 0 -> null
                      - rate_limit   = 0 -> null
                      - vlan_id      = 0 -> null
                        # (1 unchanged attribute hidden)
                    }
            
                    # (1 unchanged block hidden)
                }
            
            Plan: 0 to add, 1 to change, 0 to destroy.
FAIL
FAIL    github.com/bpg/terraform-provider-proxmox/fwprovider/test       10.549s
FAIL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate why the behaviour is wrong. If I understand it correctly, you create a VM with one network interface and then clone it to a VM without one. I would expect opentofu to remove the network interface from the VM when it's not in the config for the new VM. If you wanted the new VM to have the interface, you would add it in the config. What happens in this case?

More interesting is the question what happens when you clone a VM with multiple network interfaces and configure the cloned VM to also have these interfaces. Are they correctly mapped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether we want to keep the inherited network devices in the cloned VM or we expect the user to specify them. That is not my decision to make as I don't know what the intent was. If network devices should be specified whether or not they are cloned from a source VM, this PR solves the issue of removing extra network devices.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that we want to define the state of the infrastructure through opentofu, everything that is in the opentofu config should be there in reality and everything that is there in reality should be configured through opentofu. As a user, I would be confused when my opentofu config doesn't mention a network interface, but it actually exists on my infrastructure. But I guess it's bpg's call to make.

I would also wonder: How would you remove a network interface from a VM you clone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but this PR introduces a breaking change — anyone updating the provider would suddenly see Terraform trying to remove all network devices from cloned VMs that don’t have network devices specified.

Copy link
Owner

@bpg bpg Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent was that if a clone has nothing except a reference to the template, it should create an exact copy with all devices inherited from the template. There are a few tests that capture this behavior, and it’s also somewhat documented here: https://registry.terraform.io/providers/bpg/proxmox/latest/docs/guides/clone-vm

And I don't think this behaviour should be changed.

So we can’t just delete a device if the clone doesn’t have it. Unfortunately, there’s no good workaround to capture this behaviour in the legacy VM resource, since devices are represented as a list, not a map as they should be.

In an ideal world (FWK provider, v1.+), we’d be able to define something like:

  network {
     "net0" = {
        // network device attributes
     }
  }

in the template, and then:

  network {
     "net0" = {} // or null
  }

in the clone to explicitly remove it.

I understand the desire to have a “universal” template that includes everything, and then trim the excess when cloning to a specific VM. However, complex three-way merging of configurations (template + clone + defaults) when adding, changing, or deleting things is a non-trivial and error-prone task.

My go-to approach is to use cloud-init to configure the base OS image the way I need, skipping cloning altogether. I know there are tons of cases where this may not work, but I just wanted to provide some rationale for why cloning is, at best, a bit dicey.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My go-to approach is to use cloud-init to configure the base OS image the way I need, skipping cloning altogether.

I do this too. As a result, I'm not really qualified to jump in on this discussion since I don't use cloning, nor have I looked into how the Proxmox API exposes cloning, but I've run into similar minor issues when importing resources from public cloud providers....

Occasionally when importing a public cloud resource, I find that running a plan immediately after results in a proposed "change". Most of the time I don't care and just run apply to commit that setting into state, but other times I find that I don't actually want that setting and I need to edit my Terraform config to explicitly add that setting so I can change it to something else. Since the import succeeded with the config I had written, I've been assuming these minor anomalies are just due to added return values from the API call, or perhaps a mismatch in default values stored in the provider vs. the API vs. the cloud provider's web UI.

To me, Proxmox cloning feels like a server-side import of a VM resource. I wonder if it's worth considering cloning in this way? Ideally, the clone block would just signal to the provider to call the appropriate clone API to initially create the cloned vm, but the end result is a vm resource with all the required config a vm resource requires. Should some config be missing, modified, or added, those differences can either be applied at time of cloning, or on the next apply.

The intent was that if a clone has nothing except a reference to the template, it should create an exact copy with all devices inherited from the template.

I don't use cloning so I'm not really sure how the functionality works, but I suspect this matches how Proxmox works. But from a Terraform provider prospective, I think this inheritance is problematic. It seems tricky for both the provider and humans to reason about inherited config vs. explicit config and how to make changes to inherited config in the future. I've seen this issue come up occasionally in bug reports such as #1988. I think it's worth considering failing a cloned vm resource attempt if the vm resource block doesn't exactly match the source template (or with whatever minor changes that can be easily applied like a change in memory, or dropping a network interface).

Obviously this has massive backwards compatibility problems (and there are certainly nuances I'm unaware of), but I thought I'd throw out my idea that cloning could be considered a server-side resource "import". It certainly adds some friction when creating a cloned resource, but the end result would be a proper explicitly-defined vm resource that behaves like any other non-cloned vm resource.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent was that if a clone has nothing except a reference to the template, it should create an exact copy with all devices inherited from the template. There are a few tests that capture this behavior, and it’s also somewhat documented here: https://registry.terraform.io/providers/bpg/proxmox/latest/docs/guides/clone-vm

And I don't think this behaviour should be changed.

Could you please elaborate why this shouldn't change? Why wouldn't you just create a opentofu config matching the setup of the cloned VM to have an exact copy?

So we can’t just delete a device if the clone doesn’t have it. Unfortunately, there’s no good workaround to capture this behaviour in the legacy VM resource, since devices are represented as a list, not a map as they should be.

In an ideal world (FWK provider, v1.+), we’d be able to define something like:

  network {
     "net0" = {
        // network device attributes
     }
  }

in the template, and then:

  network {
     "net0" = {} // or null
  }

in the clone to explicitly remove it.

Aren't explicit removings defined by not specifying an attribute? That's how it work with other providers I use as well. When it's not defined to be there, it's removed.

@hsnprsd hsnprsd force-pushed the remove-network-device branch from c00b9a6 to be2fd0c Compare October 29, 2025 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⌛ pending author's response Requested additional information from the reporter size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants