SDN-5297,OCPBUGS-46527: DownStream Merge Sync from 4.19 [12-19-2024] #2403

jluhrsen · 2024-12-19T19:33:45Z

📑 Description

Fixes #

Additional Information for reviewers

✅ Checks

My code requires changes to the documentation
if so, I have updated the documentation as required
My code requires tests
if so, I have added and/or updated the tests as required
All the tests have passed in the CI

How to verify it

Signed-off-by: Surya Seetharaman <[email protected]>

The gateway init is only ever invoked for primary UDNs if the feature gate is enabled. We don't need to be checking this on every if inside the gatewayInit. Signed-off-by: Surya Seetharaman <[email protected]>

This reverts commit 8b94231. Signed-off-by: Surya Seetharaman <[email protected]>

- LocalNet support removed, as it's not supported at the moment. - Introduce ClusterUserDefinedNetwork CRD and types This commit changes are result of the following command: crd-ref-docs --source-path ./go-controller/pkg/crd/userdefinednetwork --config=crd-docs-config.yaml --renderer=markdown --output-path=./docs/api-reference/userdefinednetwork-api-spec.md Signed-off-by: Or Mergi <[email protected]>

By using maxUnavailable, we can make the redeploy faster. Signed-off-by: Or Shoval <[email protected]>

Signed-off-by: Or Mergi <[email protected]>

Routes via mp0 were being deleted on every ovnkube-node restart: [root@ovn-worker ~]# ip monitor route Deleted 192.72.3.0/24 dev ovn-k8s-mp0 proto kernel scope link src 192.72.3.2 Deleted broadcast 192.72.3.255 dev ovn-k8s-mp0 table local proto kernel scope link src 192.72.3.2 Deleted local 192.72.3.2 dev ovn-k8s-mp0 table local proto kernel scope host src 192.72.3.2 local 192.72.3.2 dev ovn-k8s-mp0 table local proto kernel scope host src 192.72.3.2 broadcast 192.72.3.255 dev ovn-k8s-mp0 table local proto kernel scope link src 192.72.3.2 This causes traffic outage during upgrade, as well as other unwanted side effects when pod-destined traffic is routed via default gateway route in the host. This is especially disruptive in local gateway mode. This patch removes the teardown, and then makes the synchronization of addresses and routes more robust, so that we can safely handle changes to MTU or mp0 addresses. Signed-off-by: Tim Rozet <[email protected]>

Fixes unexpected mp0 route removal during start up

Since in L2 UDNs, multiple networks are set on the rtos-/stor- ports when lb_force_snat_ip is set to router-ip it picks the lexicographically sorted subnet IP to perform the SNAT which breaks services on L2. In this fix we will start using the joinIP to explicitly SNAT traffic to. This is what we used to do before except we had to use router-ip to deal with the following case: ext->ovn-workerIP:NodePort backed by ovn-worker2 host networked pod: ext->ovn-worker->br-ex->GR(DNAT to ovn-worker2)-> GR(SNAT to joinIP)->send it back to br-ex->underlay ->ovn-worker2 .. here traffic goes out with joinIP which will break this flow. But in UDNs we don't support host-networked pods yet, much less host-networked-pod backends to services. So when that support gets added we must ensure to add flows to br-ex similar to the per network masqueradeIP flows we have but for joinIP to do another SNAT to nodeIP before sending it out. Signed-off-by: Surya Seetharaman <[email protected]>

There is a bug in OVN where LB templates do not work when lb_force_snat is not specified as router_ip. Additionally, services on UDNs are only supported in single node IC right now. Temporarily set LB templates to false until OVN is fixed. OVN Bug: https://issues.redhat.com/browse/FDP-988 Signed-off-by: Tim Rozet <[email protected]>

In a scenario ClusterUserDefinedNetwork object exist with role=primary, a primary NAD may exist in the subject namespace should exist, similar to UserDefinedNetwork object case. When looking for active network in the subject namespace, check if primary CUDN exist and refers to the subject namespace. Signed-off-by: Or Mergi <[email protected]>

In case the ovn control plane pod is restarted, (for example, node restart, upgrade, or any pod restart reason), the ipam claim Sync function created an array of all the ipams belonging to a specific network, and used named allocator AllocateIPs method to allocate them, so the allocator will reflect the current cluster state. The problem is AllocateIPs allows to allocate only one IP per given subnet, so once it is used for all the IPs in the network at once, it will fail, causing the control plane to have a crash loop. Fix it by calling AllocateIPs per each claim on its own. The claims were already created correctly, so each claim on its own is safe to call AllocateIPs. Add unit test to show the relevant assertion. Signed-off-by: Or Shoval <[email protected]>

…rks-case UDN: L2: Fix router ip on multiple networks set on LRP case

Signed-off-by: Or Shoval <[email protected]>

Since hostname is populated now via DHCP, we can always use the vmi name in the console expecter. Signed-off-by: Or Shoval <[email protected]>

Signed-off-by: Dan Winship <[email protected]>

ipam: Fix init flow in case there are sticky ips in the system

UDN: Patch Kubevirt CR to support managedTap binding

port management-port-related iptables code to nftables

The `AllocateIPs` method name is misleading; it receives a slice of IPs as parameter, which leads the user to think that it can allocate the IPs in the slice in a single subnet - i.e. for subnet 10.0.0.0/24, it could allocate IPs 10.0.0.2 and 10.0.0.3. That is not the case: that function allocates a **single** IP address in each of the subnets the IP allocator for said network manages. For instance, on a dual-stack network, it could handle the following subnets: 10.0.0.0/24 and fd10::/64. The function could then the invoked with ips=10.0.0.2,fd10::2 to allocate **one** IP in each of the two subnets managed by the IP allocator. Signed-off-by: Miguel Duarte Barroso <[email protected]>

NAD validations: Fix typo in error message

CreateOrReplaceLogicalRouterStaticRouteWithPredicate already does duplicate detection on static routes based on the predicate it gets as input, but the code in CreateDefaultRouteToExternal calls it with a predicate that is too restrictive: it checks against next hop, IP prefix and policy (src/dst). In corner cases where the gateway IP has changed (https://issues.redhat.com/browse/OCPBUGS-32754, for unknown reason), CreateDefaultRouteToExternal will add its static route to the gateway IP; the existing route, pointing to the incorrect gateway router IP, will stay in nbdb. Simply removing the check against the next hop in the predicate is not enough, since we would wipe out the existing route for the node subnet via mp0. Also, we should take into account that a cluster subnet can be expanded, however rare that is, by choosing a lower mask value, as long as the network address stays the same and the host prefix length is also unchanged. The new predicate skips the routes it identifies as being node subnet routes for the local node and matches against any existing cluster subnet route which might also have been expanded. Adding unit tests to capture these scenarios. Signed-off-by: Riccardo Ravaioli <[email protected]>

Signed off by: fossabot <[email protected]> Signed-off-by: Surya Seetharaman <[email protected]>

Add unit tests to document the behavior for localnet networks (with or without subnets defined) on IC/non-IC deployments. Signed-off-by: Miguel Duarte Barroso <[email protected]>

Add unit tests to assert the syncAll operation tolerates the `ErrNetworkControllerTopologyNotManaged` error, which was crash-looping the cluster manager when processing localnet networks without subnets defined. Signed-off-by: Miguel Duarte Barroso <[email protected]>

…it-tests cluster manager, tests: ignore topology not managed errors

The previous condition was incorrect and was ignoring host-networked pods only when `addPort` was false. This means that the controller tried to configure the logical port for new host-networked pods. Fix the condition by exiting when the pod is host networked or `addPort` is false. Signed-off-by: Patryk Diak <[email protected]>

Signed-off-by: Dmitry Porokh <[email protected]>

Today ovnkube-controller relies on ovnkube-node to annotate the mac address so that it can properly program the OVN mgmt port. In practice with UDN, this means cluster manager creates the node and allocates some subnet info to the node, both the node and the ovnkube-controller side get updates, but ovnkube-controller fails to program the first time because it is waiting for the node side to configure the mac address. This patch changes the behavior of ovn-kubernetes to calculate the MAC address for the management port from the mgmt port IP address of the first subnet for the network. For backwards compatibility, it will first attempt to read the node annotation, and if it no mgmt MAC exists, it will derive it from the mgmt IP of the subnet. Signed-off-by: Tim Rozet <[email protected]>

With the change to no longer depend on mgmt port mac address annotation on the node, the egress firewall test would unexpectedly now correctly sync the management port during the test. This would cause addAllowACLFromNode to execute and add an ACL for network policy to the node switch. That in turn would cause this egress firewall test to fail because it would see there are more than 0 ACLs on the switch. This commit fixes it by properly scoping the lookup in the egress firewall test for ACLs only applicable to egress firewall. Signed-off-by: Tim Rozet <[email protected]>

Signed-off-by: Tim Rozet <[email protected]>

L3 UDN: EgressIP hosted by primary interface (`breth0`)

This change may add 5ms delay on each informer start (on controller start). It shouldn't be an issue because this delay is negligible in compare to 100ms cache status polling interval. Signed-off-by: Dmitry Porokh <[email protected]>

Instead of failing ignore EndpointSlices without a service label set. These custom, user created slices should be ignored. Signed-off-by: Patryk Diak <[email protected]>

Calculates mgmt port MAC rather than storing

Improves pod deletion with user defined networks

UDN Gateway: Ignore EndpointSlices without a service label

this is a d/s only fix. it's needed u/s because the feature needs a kernel fix in the u/s test runners. Signed-off-by: Jamo Luhrsen <[email protected]>

OCPBUGS-41156, SDN-4930, OCPBUGS-44794, OCPBUGS-43354, OCPBUGS-43519, OCPBUGS-32754: Downstream Merge [12-04-2024]

…19-12-19-2024

openshift-ci-robot · 2024-12-19T19:33:48Z

@jluhrsen: This pull request references SDN-5297 which is a valid jira issue.

In response to this:

📑 Description

Fixes #

Additional Information for reviewers

✅ Checks

My code requires changes to the documentation

if so, I have updated the documentation as required

My code requires tests

if so, I have added and/or updated the tests as required

All the tests have passed in the CI

How to verify it

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2024-12-19T19:34:57Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jluhrsen
Once this PR has been reviewed and has the lgtm label, please assign jcaamano for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2024-12-20T00:02:56Z

@jluhrsen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-metal-ipi-ovn-ipv6-techpreview	`eb51905`	link	false	`/test e2e-metal-ipi-ovn-ipv6-techpreview`
ci/prow/okd-scos-e2e-aws-ovn	`eb51905`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/e2e-metal-ipi-ovn-dualstack-local-gateway-techpreview	`eb51905`	link	false	`/test e2e-metal-ipi-ovn-dualstack-local-gateway-techpreview`
ci/prow/e2e-aws-ovn-upgrade-local-gateway	`eb51905`	link	true	`/test e2e-aws-ovn-upgrade-local-gateway`
ci/prow/security	`eb51905`	link	false	`/test security`
ci/prow/e2e-metal-ipi-ovn-techpreview	`eb51905`	link	false	`/test e2e-metal-ipi-ovn-techpreview`
ci/prow/e2e-vsphere-ovn	`eb51905`	link	false	`/test e2e-vsphere-ovn`
ci/prow/e2e-metal-ipi-ovn-dualstack	`eb51905`	link	true	`/test e2e-metal-ipi-ovn-dualstack`
ci/prow/e2e-aws-ovn-kubevirt	`eb51905`	link	false	`/test e2e-aws-ovn-kubevirt`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

jluhrsen · 2024-12-20T00:22:50Z

/payload 4.18 ci blocking
/payload 4.18 nightly blocking
/test e2e-metal-ipi-ovn-ipv6-techpreview
/test e2e-aws-ovn-hypershift-conformance-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-metal-ipi-ovn-dualstack-techpreview
/test e2e-vsphere-ovn-techpreview
/test e2e-aws-ovn-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal-ipi-ovn-techpreview
/test openshift-e2e-gcp-ovn-techpreview-upgrade

openshift-ci · 2024-12-20T00:22:58Z

@jluhrsen: trigger 4 job(s) of type blocking for the ci release of OCP 4.18

periodic-ci-openshift-release-master-ci-4.18-upgrade-from-stable-4.17-e2e-aws-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.18-upgrade-from-stable-4.17-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.18-e2e-gcp-ovn-upgrade
periodic-ci-openshift-hypershift-release-4.18-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8c5399a0-be68-11ef-80a2-9e6442786538-0

trigger 13 job(s) of type blocking for the nightly release of OCP 4.18

periodic-ci-openshift-release-master-ci-4.18-e2e-aws-upgrade-ovn-single-node
periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.18-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.18-upgrade-from-stable-4.17-e2e-gcp-ovn-rt-upgrade
periodic-ci-openshift-hypershift-release-4.18-periodics-e2e-aws-ovn-conformance
periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial
periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-techpreview
periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-techpreview-serial
periodic-ci-openshift-release-master-nightly-4.18-fips-payload-scan
periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-bm
periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-ipv6
periodic-ci-openshift-microshift-release-4.18-periodics-e2e-aws-ovn-ocp-conformance
periodic-ci-openshift-microshift-release-4.18-periodics-e2e-aws-ovn-ocp-conformance-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8c5399a0-be68-11ef-80a2-9e6442786538-1

pperiyasamy · 2024-12-20T16:17:17Z

/retitle SDN-5297,OCPBUGS-46527: DownStream Merge Sync from 4.19 [12-19-2024]

openshift-ci-robot · 2024-12-20T16:17:44Z

@jluhrsen: This pull request references SDN-5297 which is a valid jira issue.

This pull request references Jira Issue OCPBUGS-46527, which is valid.

7 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.18.0) matches configured target version for branch (4.18.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
release note type set to "Release Note Not Required"
dependent bug Jira Issue OCPBUGS-43519 is in the state Verified, which is one of the valid states (MODIFIED, ON_QA, VERIFIED)
dependent Jira Issue OCPBUGS-43519 targets the "4.19.0" version, which is one of the valid target versions: 4.19.0
bug has dependents

Requesting review from QA contact:
/cc @asood-rh

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

📑 Description

Fixes #

Additional Information for reviewers

✅ Checks

My code requires changes to the documentation

if so, I have updated the documentation as required

My code requires tests

if so, I have added and/or updated the tests as required

All the tests have passed in the CI

How to verify it

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

tssurya and others added 30 commits October 29, 2024 12:40

nit fixes in docs

2c2f868

Signed-off-by: Surya Seetharaman <[email protected]>

nit: Remove unnecessary checks

a6247a2

The gateway init is only ever invoked for primary UDNs if the feature gate is enabled. We don't need to be checking this on every if inside the gatewayInit. Signed-off-by: Surya Seetharaman <[email protected]>

Revert "HACK: Update podSubnet for lexocographical correctness"

f4941d4

This reverts commit 8b94231. Signed-off-by: Surya Seetharaman <[email protected]>

kind,ic: Improve redeploy speed

9cfeb1f

By using maxUnavailable, we can make the redeploy faster. Signed-off-by: Or Shoval <[email protected]>

NAD validations: Fix typo in error message

bfb1672

Signed-off-by: Or Mergi <[email protected]>

Merge pull request #4825 from trozet/fix_mp0_flush_on_restart

9695576

Fixes unexpected mp0 route removal during start up

Merge pull request #4819 from tssurya/fix-router-ip-on-mulitple-netwo…

75f86a3

…rks-case UDN: L2: Fix router ip on multiple networks set on LRP case

UDN: Patch Kubevirt CR to support managedTap binding

debbd1f

Signed-off-by: Or Shoval <[email protected]>

UDN: Adapt tests to use managedTap instead passt

5c674d9

Since hostname is populated now via DHCP, we can always use the vmi name in the console expecter. Signed-off-by: Or Shoval <[email protected]>

Make sure nftables binaries are available everywhere

01eaae1

Signed-off-by: Dan Winship <[email protected]>

Import knftables package, add pkg/node/nftables/

0258a3e

Signed-off-by: Dan Winship <[email protected]>

Port management-port-related iptables rules to nftables.

4f5b176

Signed-off-by: Dan Winship <[email protected]>

Merge pull request #4823 from oshoval/fix_sync

ad62a1f

ipam: Fix init flow in case there are sticky ips in the system

Merge pull request #4773 from oshoval/mantap

7b50659

UDN: Patch Kubevirt CR to support managedTap binding

Merge pull request #3709 from danwinship/nftables

00dcad8

port management-port-related iptables code to nftables

Merge remote-tracking branch 'ovn-org/master' into merge-11-15-24

74c0816

Merge pull request #4828 from ormergi/nad-valid-error-typo-fix

bb06f37

NAD validations: Fix typo in error message

Add license scan report and status

e326408

Signed off by: fossabot <[email protected]> Signed-off-by: Surya Seetharaman <[email protected]>

cluster manager, tests: ignore topology not managed errors

1606de0

Add unit tests to document the behavior for localnet networks (with or without subnets defined) on IC/non-IC deployments. Signed-off-by: Miguel Duarte Barroso <[email protected]>

Merge pull request #4815 from maiqueb/add-cluster-manager-localnet-un…

f369f78

…it-tests cluster manager, tests: ignore topology not managed errors

poroh and others added 14 commits December 11, 2024 17:04

unit tests: unify NoRoot checking

7d86e41

Signed-off-by: Dmitry Porokh <[email protected]>

Fix ovnkube-trace to not rely on MAC annotation

4214d11

Signed-off-by: Tim Rozet <[email protected]>

Merge pull request #4530 from martinkennelly/eip-net-seg

63b6b57

L3 UDN: EgressIP hosted by primary interface (`breth0`)

UDN Gateway: Ignore EndpointSlices without a service label

0bc2c51

Instead of failing ignore EndpointSlices without a service label set. These custom, user created slices should be ignored. Signed-off-by: Patryk Diak <[email protected]>

Merge pull request #4882 from trozet/set_mac_by_cm

55df139

Calculates mgmt port MAC rather than storing

Merge pull request #4771 from trozet/fix_udn_delete

b40cf21

Improves pod deletion with user defined networks

Merge pull request #4902 from kyrtapz/ignore_unlabeled_services

bd7bdb5

UDN Gateway: Ignore EndpointSlices without a service label

Merge remote-tracking branch 'ovn-org/master' into merge-11-15-24

fb4e12b

hard code disable-udn-host-isolation to "true"

1243011

this is a d/s only fix. it's needed u/s because the feature needs a kernel fix in the u/s test runners. Signed-off-by: Jamo Luhrsen <[email protected]>

Merge pull request openshift#2357 from jluhrsen/merge-11-15-24

038aea1

OCPBUGS-41156, SDN-4930, OCPBUGS-44794, OCPBUGS-43354, OCPBUGS-43519, OCPBUGS-32754: Downstream Merge [12-04-2024]

Merge remote-tracking branch 'upstream/master' into 4.18-sync-from-4.…

eb51905

…19-12-19-2024

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 19, 2024

openshift-ci bot requested review from JacobTanenbaum and trozet December 19, 2024 19:34

JacobTanenbaum mentioned this pull request Dec 20, 2024

[release-4.18] OCPBUGS-37193: fixes overzealous deletion of SNAT in egressIP #2398

Closed

5 tasks

pperiyasamy mentioned this pull request Dec 20, 2024

[release-4.18] OCPBUGS-46527: Network policies in UDN do not work #2401

Closed

openshift-ci bot changed the title ~~SDN-5297: DownStream Merge Sync from 4.19 [12-19-2024]~~ SDN-5297,OCPBUGS-46527: DownStream Merge Sync from 4.19 [12-19-2024] Dec 20, 2024

openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Dec 20, 2024

openshift-ci bot requested a review from asood-rh December 20, 2024 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDN-5297,OCPBUGS-46527: DownStream Merge Sync from 4.19 [12-19-2024] #2403

SDN-5297,OCPBUGS-46527: DownStream Merge Sync from 4.19 [12-19-2024] #2403

jluhrsen commented Dec 19, 2024

openshift-ci-robot commented Dec 19, 2024 •

edited by openshift-ci bot

Loading

📑 Description

Additional Information for reviewers

✅ Checks

How to verify it

openshift-ci bot commented Dec 19, 2024

openshift-ci bot commented Dec 20, 2024

jluhrsen commented Dec 20, 2024

openshift-ci bot commented Dec 20, 2024

pperiyasamy commented Dec 20, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 20, 2024 •

edited by openshift-ci bot

Loading

📑 Description

Additional Information for reviewers

✅ Checks

How to verify it

SDN-5297,OCPBUGS-46527: DownStream Merge Sync from 4.19 [12-19-2024] #2403

Are you sure you want to change the base?

SDN-5297,OCPBUGS-46527: DownStream Merge Sync from 4.19 [12-19-2024] #2403

Conversation

jluhrsen commented Dec 19, 2024

📑 Description

Additional Information for reviewers

✅ Checks

How to verify it

openshift-ci-robot commented Dec 19, 2024 • edited by openshift-ci bot Loading

📑 Description

Additional Information for reviewers

✅ Checks

How to verify it

openshift-ci bot commented Dec 19, 2024

openshift-ci bot commented Dec 20, 2024

jluhrsen commented Dec 20, 2024

openshift-ci bot commented Dec 20, 2024

pperiyasamy commented Dec 20, 2024 • edited by openshift-ci bot Loading

openshift-ci-robot commented Dec 20, 2024 • edited by openshift-ci bot Loading

📑 Description

Additional Information for reviewers

✅ Checks

How to verify it

openshift-ci-robot commented Dec 19, 2024 •

edited by openshift-ci bot

Loading

pperiyasamy commented Dec 20, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 20, 2024 •

edited by openshift-ci bot

Loading