Skip to content

Conversation

@raaizik
Copy link
Member

@raaizik raaizik commented Nov 7, 2025

Update flannel CNI plugin version from v0.24.0 to v0.27.0 to align with upstream Lima changes.

Resolves #1704

  • Test

@raaizik raaizik force-pushed the ra-1704 branch 2 times, most recently from 6a6b3ea to c1113c5 Compare November 7, 2025 01:22
Copy link
Member

@nirs nirs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks trivial, but did you test it? You need macOS for testing.

@raaizik
Copy link
Member Author

raaizik commented Nov 9, 2025

Looks trivial, but did you test it? You need macOS for testing.

I've tested with all of the latest versions installed in the main branch. (Regardless of this PR. It failed for the same reasons on this branch so I decided to check main.) There seems to be an issue with starting one of the clusters via limactl. I have the socket_vmnet launchd service installed. The start process hangs for several minutes and then errors on the limactl start command. It occurs while being stuck on the "kubeadm completed" optional requirement:

2025-11-09 19:46:17,111 DEBUG   [ex2] [hostagent] Waiting for the optional requirement 4 of 5: "kubeadm completed"
2025-11-09 19:51:23,787 DEBUG   [ex1] [hostagent] Waiting for the optional requirement 4 of 5: "kubeadm completed"

There's also this which might indicate some issue with binding to the SSH port (not sure it's related, though):

2025-11-09 18:34:35,870 DEBUG   [ex2] [hostagent] Failed to detect SSH server on vsock port, falling back to usernet forwarder {'error': 'Error Domain=NSPOSIXErrorDomain Code=54 Description="The operation couldn’t be completed. Connection reset by peer" UserInfo={\n}'}
2025-11-09 18:34:35,960 DEBUG   [ex1] [hostagent] Failed to detect SSH server on vsock port, falling back to usernet forwarder {'error': 'Error Domain=NSPOSIXErrorDomain Code=54 Description="The operation couldn’t be completed. Connection reset by peer" UserInfo={\n}'}
2025-11-09 18:34:36,377 DEBUG   [ex2] SSH Local Port: 56904
2025-11-09 18:34:36,384 DEBUG   [ex2] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
2025-11-09 18:34:36,474 DEBUG   [ex1] SSH Local Port: 56906

Thoughts?

@nirs
Copy link
Member

nirs commented Nov 24, 2025

Looks trivial, but did you test it? You need macOS for testing.

I've tested with all of the latest versions installed in the main branch. (Regardless of this PR. It failed for the same reasons on this branch so I decided to check main.) There seems to be an issue with starting one of the clusters via limactl. I have the socket_vmnet launchd service installed. The start process hangs for several minutes and then errors on the limactl start command. It occurs while being stuck on the "kubeadm completed" optional requirement:
...

When we time out waiting for kubeadm/ssh it usually means we don't have an IP.

To debug this issue is useful to use the small envs/vm.yaml environment. When it works we can delete it and start the full regional-dr.yaml environemnt.

To identify the issue list the vms after starting drenv:

limactl list

This should show all the vms in Running state.

To check if we have an ip use:

limactl shell <vm name> ip a

This should show:

$ limactl shell cluster ip a
...
3: lima0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:55:55:df:b6:28 brd ff:ff:ff:ff:ff:ff
    inet 192.168.105.2/24 metric 100 brd 192.168.105.255 scope global dynamic lima0
       valid_lft 3591sec preferred_lft 3591sec
    inet6 fdc1:b4b6:9262:daea:5055:55ff:fedf:b628/64 scope global dynamic mngtmpaddr noprefixroute 
       valid_lft 2591994sec preferred_lft 604794sec
    inet6 fe80::5055:55ff:fedf:b628/64 scope link 
       valid_lft forever preferred_lft forever

When we don't have IP you will not see a IPv4 address for interface lima0:

$ limactl shell cluster ip a
...
3: lima0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:55:55:df:b6:28 brd ff:ff:ff:ff:ff:ff
    inet6 fdc1:b4b6:9262:daea:5055:55ff:fedf:b628/64 scope global dynamic mngtmpaddr noprefixroute 
       valid_lft 2591961sec preferred_lft 604761sec
    inet6 fe80::5055:55ff:fedf:b628/64 scope link 
       valid_lft forever preferred_lft forever

We don't know why we don't get an IP in some cases, but we know it happens much more for managed machines, or after connecting to the vpn (typically it breaks networking for drenv).

The workaround we use are:

  • terminate socketfilterfw process

    sudo killall socketfilterfw
    
  • terminate bootpd

    sudo killall bootpd
    

The typical flow is:

  1. Start the vms
  2. Got no IP
  3. Interrupt drenv start, delete the vms
  4. Terminate socketfilterfw and bootpd
  5. Start the vm

In most cases this fixes the issue.

If this does not help, I had some success by also flushing the firewall rules and reloading all rules:

sudo pfctl -F all
sudo pfctl -f /etc/pf.conf

If nothing helps, reboot usually fixes the issue.

On non-managed machine this issue is very rare, but it is usually happen after you connect to a VPN using ThunnalBlick.

This is not a lima/socket_vment issue, it happens also with minikube/vment-helper. This is likely Apple bug.

@nirs
Copy link
Member

nirs commented Nov 24, 2025

@raaizik you probably used lima 2.x - I upgraded today to lima 2.0 and found that drenv is broken with it. Fixed by #2346.

@nirs
Copy link
Member

nirs commented Nov 25, 2025

@raaizik Please rebase on main for testing this change.

@raaizik
Copy link
Member Author

raaizik commented Nov 25, 2025

@raaizik Please rebase on main for testing this change.

Yeah, I know. Thanks

drenv: Update flannel version to v0.27.0

Update flannel CNI plugin version from v0.24.0 to v0.27.0 to align
with upstream Lima changes. This follows the update in lima-vm/lima#2991
which updated the flannel version in the upstream k8s.yaml template.

Resolves RamenDR#1704

Signed-off-by: raaizik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

drenv macOS: Update lima k8s template

3 participants