Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm preflight support for nftables kube-proxy / hosts with no iptables #3132

Closed
danwinship opened this issue Dec 6, 2024 · 18 comments · Fixed by kubernetes/kubernetes#129131
Labels
area/kube-proxy area/preflight priority/backlog Higher priority than priority/awaiting-more-evidence. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Milestone

Comments

@danwinship
Copy link

The kubeadm preflight checks currently consider the iptables binary to be mandatory. This is not correct if you are running kube-proxy in nftables mode, and while it mostly doesn't cause problems now (since pretty much everyone has iptables installed anyway), it will eventually be a problem on RHEL10, where there won't be any iptables binaries.

It appears that kubeadm has a standard way of overriding the kube-proxy configuration, so it should be able to parse the provided config to see if it overrides the mode to be nftables, and in that case, it should require the nft binary and not require the iptables binary.

(For extra credit: ipvs mode requires the iptables and ipset binaries. It doesn't actually require any IPVS-specific binaries because it uses the netlink API to program IPVS.)

/sig cluster-lifecycle
/area kubeadm

@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. area/kubeadm labels Dec 6, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Dec 6, 2024
@neolit123
Copy link
Member

/transfer kubeadm

@k8s-ci-robot k8s-ci-robot transferred this issue from kubernetes/kubernetes Dec 6, 2024
@neolit123 neolit123 added priority/backlog Higher priority than priority/awaiting-more-evidence. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. area/kube-proxy area/preflight and removed kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 6, 2024
@neolit123
Copy link
Member

It appears that kubeadm has a standard way of overriding the kube-proxy configuration, so it should be able to parse the provided config to see if it overrides the mode to be nftables, and in that case, it should require the nft binary and not require the iptables binary.

parsing the kube-proxy config and then making a choice sounds like the correct thing to do, but it might be a bit entangled code wise. maybe we can go for a simpler solution. how about we throw a warning when both iptables and nft don't exists and ignore the kube-proxy choices?

@neolit123 neolit123 added this to the v1.33 milestone Dec 6, 2024
@neolit123
Copy link
Member

neolit123 commented Dec 6, 2024

(For extra credit: ipvs mode requires the iptables and ipset binaries. It doesn't actually require any IPVS-specific binaries because it uses the netlink API to program IPVS.)

yet, given these different modes, perhaps we might want to pipe the kube-proxy config to preflight and check for the various diff binaries. problem is that on kubeadm join the kube-proxy config must be fetched from the cluster first. i think, it's done before preflight runs but there might be caveats.

@neolit123
Copy link
Member

alternatively, all this be documented at the website and we can remove these preflight checks.
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

@pacoxu
Copy link
Member

pacoxu commented Dec 9, 2024

Like swap preflight, we can warn if swap is enabled.

But things here are different. How about

  • warning for either iptables or nftables is missing (generally, iptables will not be removed IIUC?)
  • error for both iptables and nftables are missing

The current workaround for v1.32 is kubeadm init --ignore-preflight-errors=FileExisting-iptables

@neolit123
Copy link
Member

warning for either iptables or nftables is missing (generally, iptables will not be removed IIUC?)
error for both iptables and nftables are missing

yes, but IPVS mode is also in the picture.

i think kube-proxy should document it requirements for various modes at k/website and kubeadm can link to that page.
i think we don't want to maintain such preflight in kubeadm and my vote is to remove the iptables check too.

happy to hear more votes on this topic, though.

@danwinship @SataQiu @carlory

@danwinship
Copy link
Author

Wait, wait, sorry, I'm confused; kube-proxy needs iptables/ipset/nft in the kube-proxy image, but the user/kubeadm doesn't need to worry about that.

kubeadm is checking for iptables because kubelet uses it, not because of kube-proxy. In particular, kubelet uses it to create the KUBE-IPTABLES-HINT chain which kube-proxy (and maybe other components) will look for to figure out whether to use iptables-legacy or iptables-nft.

Even if you're running kube-proxy in nftables mode, it's good for kubelet to create the KUBE-IPTABLES-HINT chain if it can, because there might be other components on the system that check for it.

So, the iptables binary is required if kube-proxy is using iptables or ipvs mode, and recommended even when using nftables mode (or when not running kube-proxy at all).

But OTOH, if you don't have iptables installed, then that necessarily means you aren't using an iptables-based firewall or other things that kube-proxy would need to worry about, so it doesn't really matter whether kube-proxy decides to use iptables-legacy or itpables-nft, so it doesn't really matter whether kubelet creates the KUBE-IPTABLES-HINT chain, so it doesn't really matter that you didn't have iptables installed.

So yeah, maybe no errors no warnings?

@pacoxu
Copy link
Member

pacoxu commented Dec 10, 2024

@carlory
Copy link
Member

carlory commented Dec 10, 2024

+1 for removing it. but we need to consider how to handle the error message. kubelet may print 1 time per 1 minute.

@neolit123
Copy link
Member

@danwinship

thanks for the additional context. sounds messy...
i think the kubelet and kube-proxy should start documenting what "net tools" they exec into.

@neolit123
Copy link
Member

+1 for removing it. but we need to consider how to handle the error message. kubelet may print 1 time per 1 minute.

that sounds like a kubelet issue. there seems like 2 steps:

  • kubelet starts documenting what it needs.
  • frequencies are adjusted, e.g. print errors more often or at the start of the kubelet.

@pacoxu @carlory also, we need to audit what of these tools are no longer needed
https://github.com/kubernetes/kubernetes/blob/v1.32.0-rc.2/cmd/kubeadm/app/preflight/checks_linux.go#L86-L92

@danwinship do you have context on ip, ethtool, tc and if we can stop checking for these too?

@danwinship
Copy link
Author

+1 for removing it. but we need to consider how to handle the error message. kubelet may print 1 time per 1 minute.

The iptables.Monitor is only set up if the initial syncIPTablesRules succeeds. So if you don't have iptables installed, it will print an error once and then never think about iptables again. (And you can run kubelet with --make-iptables-util-chains=false to avoid even getting the error message once, though probably I should make the code distinguish between "iptables is not installed on the system" (log at Info, since the user probably expects iptables to not be used in that case) and "iptables is installed but doesn't work right" (log at Error because that's weird).

@danwinship
Copy link
Author

@danwinship do you have context on ip, ethtool, tc and if we can stop checking for these too?

I'm pretty sure no part of core kubernetes or any of the stock CNI plugins currently use any of those. (They may have in the past, but these days pretty much everybody has switched to netlink-based APIs rather than running commands.)

@carlory
Copy link
Member

carlory commented Dec 13, 2024

Can we stop checking for nsenter? kubelet's --containerized flag was deprecated in 1.14 and was removed in 1.16. And this PR will remove k8s.io/utils/nsenter from the codebase. kubernetes/kubernetes#122016

@pacoxu
Copy link
Member

pacoxu commented Dec 13, 2024

BTW, do we need to update the kinder dockerfiler

conntrack iptables iproute2 ethtool socat util-linux mount ebtables udev kmod aufs-tools \

@neolit123
Copy link
Member

@danwinship do you have context on ip, ethtool, tc and if we can stop checking for these too?

I'm pretty sure no part of core kubernetes or any of the stock CNI plugins currently use any of those. (They may have in the past, but these days pretty much everybody has switched to netlink-based APIs rather than running commands.)

thanks

Can we stop checking for nsenter? kubelet's --containerized flag was deprecated in 1.14 and was removed in 1.16. And this PR will remove k8s.io/utils/nsenter from the codebase. kubernetes/kubernetes#122016

sounds like we can have one (or two) PRs for kubeadm that cleans up most of the checked binaries.

@neolit123
Copy link
Member

neolit123 commented Dec 13, 2024

BTW, do we need to update the kinder dockerfiler

conntrack iptables iproute2 ethtool socat util-linux mount ebtables udev kmod aufs-tools \

that base image is not actively used. we use the upstream kind official kinder/base images.
but yes, you can PR that kinder base image Dockerfile if you'd like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kube-proxy area/preflight priority/backlog Higher priority than priority/awaiting-more-evidence. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants