-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDN-4930: Downstream Merge [12-19-2024] #2402
SDN-4930: Downstream Merge [12-19-2024] #2402
Conversation
Hoping to bring improvement to the naming and interfacing of the nad controller package to reduce confusion. No functional changes but, obviously, broad code impact. On to the details... Rename package network-attach-def-controller to networkmanager. The controller struct itself has been unexported and instead two interfaces are provided: Controller to start and stop, and NetworkManager to provide (for now) read-only information about networks. A default implementation is provided that assumes the default network is the only ever existing network, to be used when multi-network capabilities are disabled or for testing. Other unexported structs have also been renamed accordingly as well as other types that followed the former naming scheme. Rename package network-controller-manager to controllemanager, understood as a package that manages other controllers that can be of any nature: network specific, network aware or network unaware. The interfaces that controller managers need to implement to interface with network manager have also been renamed accordingly as well as other types that followed the former naming scheme. For reasons beyond my comprehension, it befell to me coming accross, and fixing, a number of unrelated issues, like flaking unit tests, missing nftables stubbing to be able to run the tests on the testing container, etc... that given the broad impact of this commit, I decided to fix here instead of bothering with new commits. While I was at it, fixed all go vet warnings on impacted files because it bothers me to have my IDE painted yellow. We should add that to our linter. Moving on... Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
Consolidate all getters in NetInfo becoming the single interface to obtain network information intended to be used broadly as read-only. Consolidate all setters in new MutableNetInfo interface. A MutableNetInfo can only be built from a copy of NetInfo. Add setters for route advertisements related information. This is intended to be used by a network manager to aggregate consolidated information about the network from multiple sources that can change over time. Add a ReconcilableNetInfo interface acting as a receiver by which changes done in a MutableNetInfo can be incorporated into an existing NetInfo. A ReconcilableNetInfo can only be built from a copy of NetInfo. It is intended to be used by top level network controllers to reconcile changes to network information when instructed by a network manager. New free methods IsNetworkCompatible, DoesNetworkNeedReconciliation, and Reconcile facilitate the coordination of the reconciliation. The overall idea is to have a tighter control of NetInfo changes and making sure that controllers become aware of those changes in a way that allows the controllers to reconcile them. Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
Add the ability for network controllers to reconcile some network information changes. Currently just changes of the VRFs the network is leaking/advertising to. Support for reconciling NAD changes is not included in this commit. Currently reconciles: - for zone network controllers to configure or not the pod IP to node IP SNAT on the GR for a nodes local to the zone - for node network controllers to configure or not br-ex flows to redirect pod IP ingress traffic to the OVN network (except ingress to the management ip address). Only done for the default network, UDNs will be handled on a later commit. This should be enough to provide direct ingress capabilities for the default network in SGW mode. Note that non-primary network controllers don't reconcile anything as route advertising is not supported on them. Also cluster manager network controllers don't reconcile much as they don't have the need. Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
Network manager will, upon ensuring a network, fetch relevant information from RouteAdvertisements (such as VRFs being advertised on and selected nodes to advertise EIPs) applying to the network and set it on the corresponding NetInfo used to create the network controller, or triggering a reconciliation if it was already running. This relies on an annotation that will be set by cluster manager on NADs pointing to the RouteAdvertisements that apply to the network (future commit) This will also be done for the default network. This commit assumes cluster manager will create a dummy NAD for the default network in ovn-k namespace were the annotaiton will be set. The NAD controller will process the NAD and ensure the network on the network controller, just like with any other network. The network controller gains access to the default network controller, however, it treats that controller differently: it's not started or stopped, obviously, just reconciled. Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
Fix kind export logs to address the following warning message: $ kind export logs --name ${KIND_CLUSTER_NAME} --loglevel=debug /tmp/kind/logs WARNING: --loglevel is deprecated, please switch to -v and -q! please adjust the command so it is equivalent and works on latest kind release Signed-off-by: Flavio Fernandes <[email protected]>
This commit integrates ocp-traffic-flow-tests into the existing GitHub E2E workflows. The control tests can now be executed as a standalone lane. Currently, the traffic flow tests use a specific commit from the repository https://github.com/wizhaoredhat/ocp-traffic-flow-tests.git. This is a temporary solution until the repository becomes part of the ovn-kubernetes GitHub organization. Next steps to look into, after this commit / pr: - External and secondary network tests - Add bandwidth thresholds that would be reasonable for Github - Move repo to ovn-kubernetes org Fixes: ovn-kubernetes/ovn-kubernetes#4756 Signed-off-by: Flavio Fernandes <[email protected]>
Some time echoserver in VM could take more than 20s to start up, so increase timeout to 60 seconds to avoid flakiness. Signed-off-by: Lei Huang <[email protected]>
Direct ingress for default network in SGW mode with route advertisements
'ubuntu-latest' now references ubuntu 24.04. Previously, it referenced 22.04. Signed-off-by: Martin Kennelly <[email protected]>
Statically set GH runner image to ubuntu 22.04 for tests
Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
This reverts commit bae7288. Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
Looks like this path was missed and causes egress IP with UDNs on local gateway mode to not function correctly. Reply traffic was being sent to default cluster network patch port rather than the correct UDN patch port. Signed-off-by: Tim Rozet <[email protected]>
Previous to this change in dualstack, we are not attempting to match the mgmt port IP family with the EgressIP family. Single stack clusters performed as expected. Signed-off-by: Martin Kennelly <[email protected]>
Follow up work is required to run IPv6 CDN tests in dualstack env because currently only IPv4 tests are executed for dual stack clusters. Upstream CI executes IPv6 tests for single stack IPv6 clusters only. Also, skip test 'replies to egress IP packets that require fragmentation' because it is broken. Tracked by downstream bug OCPBUGS-46476. Signed-off-by: Martin Kennelly <[email protected]>
Fix for unexistent package msbuild in latest ubuntu images
Add EgressIP E2Es to Net Seg lanes
Latest metallb repo support dualstack at dev-env let's pin to it. Signed-off-by: Enrique Llorente <[email protected]>
Signed-off-by: Enrique Llorente <[email protected]>
Signed-off-by: Enrique Llorente <[email protected]>
Signed-off-by: Enrique Llorente <[email protected]>
Add testing for: - External clients to LB, NodePort - Podify clients to LB Signed-off-by: Enrique Llorente <[email protected]>
β¦-e2e udn, e2e: Add external client to nodeport and loadblanacer services tests
Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
ovs-doca requires mtu_request to be set on ports metadata for fragmented packets Signed-off-by: Alin Gabriel Serdean <[email protected]>
PR #4907 used ginkgo v1 to utilize DescribeTable but we already moved away from that as seen in PR #4749 Also, executed 'go mod tidy && go mod vendor' as go mod file was stale. Follow up work is required to detect this scenario in CI. Signed-off-by: Martin Kennelly <[email protected]>
Since it will be part of overall network reconciliation when its advertising configuration changes. Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
Used on secondary nic flows, not needed if pod network is advertised as traffic is supposed to reach the node hosting the pod directly. However, we add an SNAT rule for local IPT traffic flows. Otherwise the node masquerade address is used as it is set as the source address in the services route. Note that in this case the linux source selection algorithm just looks at the default routing table and does not look at non default, at least when the ip rule is based on packet marks set through conntrack. Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
/test e2e-metal-ipi-ovn-ipv6-techpreview |
/hold
no clue where to start, but maybe understanding what's going on with the |
/retest |
/payload-aggregate periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview 3 |
@jcaamano: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/718ec360-c43f-11ef-9c45-1b7a290aebcf-0 |
/payload-aggregate periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview-serial 3 |
@jcaamano: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/89858f30-c43f-11ef-80a1-4257b0c5355a-0 |
/payload 4.19 ci blocking |
@jluhrsen: trigger 4 job(s) of type blocking for the ci release of OCP 4.19
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7add6520-c949-11ef-8d80-ae7a55f29e16-0 trigger 13 job(s) of type blocking for the nightly release of OCP 4.19
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7add6520-c949-11ef-8d80-ae7a55f29e16-1 |
update: there are 3 optional jobs failing:
payload blocking jobs
/payload 4.19 ci blocking
/payload-aggregate periodic-ci-openshift-release-master-ci-4.19-e2e-aws-upgrade-ovn-single-node 10
I think overall if the ci blocking payload jobs come back clean as well as the e2e-aws-upgrade-ovn-single-node aggregate we may be good to go with merging this. |
@jluhrsen: trigger 4 job(s) of type blocking for the ci release of OCP 4.19
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/bee03e30-ca31-11ef-8b95-fdd856e0f060-0 trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/bee03e30-ca31-11ef-8b95-fdd856e0f060-1 |
/test e2e-metal-ipi-ovn-techpreview |
@jluhrsen: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
all the jobs in the payload 4.19 ci blocking for gcp-ovn-upgrade failed with some docker build issue getting the kmod package. let's try again. hoping that was some random package/infra issue: /payload-aggregate periodic-ci-openshift-release-master-ci-4.19-e2e-gcp-ovn-upgrade 10 |
@jluhrsen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4ec2b940-cd40-11ef-8406-ffdccb236d01-0 |
I think we should ignore the e2e-aws-upgrade-ovn-single-node aggregate job. the latest batch of 10 had 3 pass, 7 fail but even the periodic job tracked in sippy is passing under 50% of the time. I checked all 7 failures and none are UDN and none are consistent (different failures in each job). |
the e2e-metal-ipi-ovn-ipv6-techpreview job is failing because the tests are not v6 ready and I think we are ok with that for now knowing the fixes for those tests are coming. otherwise if the aggregate e2e-gcp-ovn-upgrade comes back clean I think we are good to get this in. |
@trozet @tssurya @jcaamano , I think we are good here. the aggregate results were 9 pass, 1 fail can we get this in and I'll start on the next one. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jcaamano, jluhrsen The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
d8e16ec
into
openshift:master
[ART PR BUILD NOTIFIER] Distgit: ovn-kubernetes-base |
[ART PR BUILD NOTIFIER] Distgit: ovn-kubernetes-microshift |
[ART PR BUILD NOTIFIER] Distgit: ose-ovn-kubernetes |
π Description
Fixes #
Additional Information for reviewers
β Checks
How to verify it