-
Notifications
You must be signed in to change notification settings - Fork 21
iface: add bonding support with active-backup mode #344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds Ethernet bonding support end-to-end. New API types, enums, macros and iface payloads for bonds (mode, members, MAC). Control-plane implements bond lifecycle, member attach/detach, MAC/MTU/VLAN propagation and ACTIVE_BACKUP active-member logic. CLI adds commands to create/configure bonds. Datapath: bond_output node and port_rx changes to respect bond membership. Port CLI shows bond associations. DPDK TAP driver replaces ioctl with netlink, adds namespace-change handling and carrier configuration. Build files updated and a smoke test for active-backup failover added. Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (18)
🚧 Files skipped from review as they are similar to previous changes (3)
🧰 Additional context used📓 Path-based instructions (2)**/*.{c,h}⚙️ CodeRabbit configuration file
Files:
**/*.sh⚙️ CodeRabbit configuration file
Files:
🧠 Learnings (5)📚 Learning: 2025-11-03T13:28:19.470ZApplied to files:
📚 Learning: 2025-09-05T07:06:51.554ZApplied to files:
📚 Learning: 2025-09-05T07:06:51.554ZApplied to files:
📚 Learning: 2025-11-05T13:55:26.169ZApplied to files:
📚 Learning: 2025-09-09T09:22:31.596ZApplied to files:
🧬 Code graph analysis (6)modules/infra/datapath/port_rx.c (2)
modules/infra/cli/bond.c (7)
modules/infra/control/bond.c (5)
modules/infra/control/port.c (1)
modules/infra/datapath/bond_output.c (5)
smoke/bond_active_backup_test.sh (2)
🪛 Shellcheck (0.11.0)smoke/bond_active_backup_test.sh[warning] 18-18: Quote this to prevent word splitting. (SC2046) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
As far as I can see it does not use DPDK net/bonding. If so, it is interesting to know why. |
|
The net/bonding driver had several severe bugs I fixed years ago. It would be a different story if you volunteer for maintaining it ;-). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (8)
smoke/bond_forward_test.sh (2)
5-5: Quote dirname to avoid word splitting.Use a quoted dirname when sourcing _init.sh.
-. $(dirname $0)/_init.sh +. "$(dirname "$0")/_init.sh"
7-12: Verify run_id initialization (ShellCheck SC2154).run_id is used before assignment in this script; ensure _init.sh exports it or set a default here.
If not set by _init.sh, add:
: "${run_id:?run_id not set}"orrun_id=${run_id:-gr}after sourcing.Also consider
set -euo pipefailat the top to fail fast on errors.modules/infra/cli/iface.c (1)
456-472: Add “active” to detailed show() output for consistency.The list view prints “active” but show() does not. Mirror the flag.
printf("flags: "); if (iface->flags & GR_IFACE_F_UP) printf("up"); else printf("down"); if (iface->state & GR_IFACE_S_RUNNING) printf(" running"); + if (iface->state & GR_IFACE_S_ACTIVE) + printf(" active"); if (iface->flags & GR_IFACE_F_PROMISC) printf(" promisc");modules/infra/control/gr_port.h (1)
16-21: C23 fixed underlying enum may hit toolchain compatibility.
typedef enum : uint8_t { ... }relies on C23 (gnu2x). Verify GCC/Clang in CI support this for C (not just C++).If portability is a concern, fallback to:
- Keep plain enum for constants.
- Use
typedef uint8_t mac_filter_flags_t;for the bitmask field.Example:
enum mac_filter_flag { MAC_FILTER_F_UNSUPP = GR_BIT8(0), MAC_FILTER_F_NOSPC = GR_BIT8(1), MAC_FILTER_F_ALL = GR_BIT8(2), }; typedef uint8_t mac_filter_flags_t;This preserves layout while avoiding C23 enum base types.
modules/infra/control/gr_bond.h (1)
12-22: LGTM, with a tiny doc nitSolid private payload. Consider clarifying that primary_member/active_member are indices into members[] (not iface IDs) to avoid ambiguity.
modules/infra/api/gr_infra.h (2)
126-132: Const-correctness for gr_bond_mode_nameReturn type should be const char * (aligns with gr_iface_type_name/mode_name and string literals).
Apply this diff:
-static inline char *gr_bond_mode_name(gr_bond_mode_t mode) { +static inline const char *gr_bond_mode_name(gr_bond_mode_t mode) {
145-149: Clarify PRIMARY semanticsComment reads “Primary port ID”. It is an index into member_iface_ids[], not an iface ID. Please clarify to prevent misuse.
modules/infra/control/bond.c (1)
49-63: Consider propagating errors in bond_mac_delCurrently ignores failures from bond_all_member_mac(). For consistency with add and to avoid desync between bond and members, return on error (except maybe EOPNOTSUPP).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
docs/graph.svgis excluded by!**/*.svg
📒 Files selected for processing (13)
modules/infra/api/gr_infra.h(5 hunks)modules/infra/cli/bond.c(1 hunks)modules/infra/cli/iface.c(1 hunks)modules/infra/cli/meson.build(1 hunks)modules/infra/cli/port.c(3 hunks)modules/infra/control/bond.c(1 hunks)modules/infra/control/gr_bond.h(1 hunks)modules/infra/control/gr_port.h(1 hunks)modules/infra/control/meson.build(1 hunks)modules/infra/datapath/bond_output.c(1 hunks)modules/infra/datapath/meson.build(1 hunks)modules/infra/datapath/port_rx.c(5 hunks)smoke/bond_forward_test.sh(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}
⚙️ CodeRabbit configuration file
**/*.{c,h}: -gr_vec_*()functions cannot fail. No need to check their return value.
gr_vec_free(x)always setsx = NULL. There is no risk of double free.ec_node_*()functions consume theirec_nodearguments. No leaks on error.rte_node->ctxis an uint8_t array of size 16, not a pointer.- Don't suggest to replace
assert()with graceful error checking.- We compile with
-std=gnu2x. Unnamed parameters in function definitions are valid.
Files:
modules/infra/cli/iface.cmodules/infra/control/gr_port.hmodules/infra/control/gr_bond.hmodules/infra/datapath/bond_output.cmodules/infra/cli/bond.cmodules/infra/api/gr_infra.hmodules/infra/cli/port.cmodules/infra/datapath/port_rx.cmodules/infra/control/bond.c
**/*.sh
⚙️ CodeRabbit configuration file
**/*.sh: - Don't bother about unquoted shell variables.
Files:
smoke/bond_forward_test.sh
🧠 Learnings (2)
📚 Learning: 2025-09-05T07:06:51.554Z
Learnt from: rjarry
PR: DPDK/grout#294
File: modules/policy/cli/meson.build:4-8
Timestamp: 2025-09-05T07:06:51.554Z
Learning: In this project, cli_src is a global variable that gets populated by individual modules (like modules/policy/cli/meson.build) and is ultimately consumed by an executable() call in the top-level meson.build file. This creates a modular CLI build where each module contributes its CLI sources to the main CLI binary.
Applied to files:
modules/infra/cli/meson.build
📚 Learning: 2025-09-05T07:06:51.554Z
Learnt from: rjarry
PR: DPDK/grout#294
File: modules/policy/cli/meson.build:4-8
Timestamp: 2025-09-05T07:06:51.554Z
Learning: The grout project uses a modular CLI build system where individual modules contribute their CLI sources to a global cli_src variable via `cli_src += files(...)`, and the top-level meson.build file builds the final grcli executable using all collected CLI sources. The grcli executable is defined in the top-level meson.build at lines 122-127.
Applied to files:
modules/infra/cli/meson.build
🧬 Code graph analysis (6)
modules/infra/datapath/bond_output.c (6)
modules/infra/control/gr_iface.h (2)
iface(16-22)iface_get_stats(100-102)modules/infra/cli/iface.c (1)
iface_from_id(107-115)modules/infra/datapath/gr_mbuf.h (1)
gr_mbuf_is_traced(52-54)modules/infra/datapath/trace.c (1)
gr_mbuf_trace_add(520-547)modules/infra/control/gr_graph.h (1)
rte_node_enqueue_x1(22-25)modules/infra/datapath/eth_output.c (1)
eth_output_register_interface_type(27-34)
modules/infra/cli/bond.c (6)
api/gr_errno.h (1)
errno_set(9-12)modules/infra/api/gr_infra.h (1)
gr_bond_mode_name(126-132)modules/infra/cli/iface.c (5)
iface_from_id(107-115)parse_iface_args(117-180)iface_from_name(92-105)complete_iface_names(70-90)register_iface_type(22-24)cli/ecoli.c (3)
arg_str(99-109)arg_eth_addr(164-178)with_help(15-24)api/gr_api.h (1)
gr_api_client_send_recv(41-52)cli/exec.c (1)
cli_context_register(16-18)
smoke/bond_forward_test.sh (1)
smoke/_init.sh (1)
netns_add(93-99)
modules/infra/cli/port.c (1)
modules/infra/cli/iface.c (1)
iface_from_id(107-115)
modules/infra/datapath/port_rx.c (3)
modules/infra/control/gr_iface.h (1)
iface(16-22)modules/infra/control/worker_test.c (1)
iface_from_id(40-42)modules/infra/datapath/trace.c (1)
trace_log_packet(354-504)
modules/infra/control/bond.c (5)
modules/infra/control/iface.c (9)
iface_add_eth_addr(310-323)iface_del_eth_addr(325-338)iface_set_mtu(340-357)iface_set_promisc(379-395)iface_set_allmulti(397-413)iface_set_up_down(359-377)iface_add_vlan(415-428)iface_del_vlan(430-443)iface_get_eth_addr(263-276)modules/infra/control/gr_iface.h (1)
iface(16-22)main/gr_vec.h (1)
gr_vec_len(47-51)api/gr_errno.h (1)
errno_set(9-12)main/event.c (1)
gr_event_subscribe(14-16)
🪛 Shellcheck (0.11.0)
smoke/bond_forward_test.sh
[warning] 5-5: Quote this to prevent word splitting.
(SC2046)
[warning] 7-7: run_id is referenced but not assigned.
(SC2154)
🔇 Additional comments (15)
modules/infra/datapath/meson.build (1)
5-5: LGTM: bond_output.c added to datapath build.No issues spotted.
modules/infra/cli/meson.build (1)
7-7: LGTM: bond.c added to CLI sources.Matches the modular cli_src aggregation pattern.
modules/infra/control/meson.build (1)
5-5: LGTM: control bond.c added.Build wiring looks correct.
modules/infra/cli/iface.c (1)
239-240: Good: show “active” in FLAGS for list output.Nice UX addition.
modules/infra/control/gr_port.h (1)
23-23: Typed flags field looks good.The struct layout remains 1-byte for flags; no ABI inflation expected.
modules/infra/cli/port.c (2)
23-30: Show bond linkage for ports: LGTM.Resolves bond name and frees temporary; fallbacks to numeric ID if missing.
53-72: List info includes bond: LGTM.Handles both peer and bond, with proper cleanup on all paths.
modules/infra/datapath/port_rx.c (4)
22-25: Drop edges for bond RX corner cases: LGTM.Clear separation for inactive member vs invalid bond.
55-69: Bond-aware RX selection is correct.
- Drops when member not UP or not ACTIVE.
- Forwards via bond iface when member is active.
- Falls back to port iface when not bonded.
Matches the PR’s RX rules.
77-88: Trace/log on resolved iface: LGTM.Packet tracing and logging use the bond iface when applicable.
90-95: Single enqueue with resolved edge: LGTM.Early return on rx==0 avoids unnecessary work; enqueue path is clean.
modules/infra/api/gr_infra.h (1)
367-369: Type-to-name mappingMapping for GR_IFACE_TYPE_BOND looks correct.
modules/infra/datapath/bond_output.c (1)
63-71: Stat update placement is fineCounting TX on the logical bond before forwarding to port_output is sensible.
modules/infra/cli/bond.c (1)
25-52: Show output looks goodDisplays mode, MAC, members, active/primary flags with per-port details. Nice.
modules/infra/control/bond.c (1)
386-391: Reactive failover hook looks correctOn port UP/DOWN, recompute active member. Good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
modules/infra/datapath/bond_output.c (1)
29-37: Bounds check uses ARRAY_DIM instead of n_members (potential OOB).Line 32 checks
bond->active_member < ARRAY_DIM(bond->members), butbond->membersis only populated up tobond->n_members. Ifactive_memberis valid but >=n_members(e.g., 5 when n_members is 3), the check passes but accesses an uninitialized slot.Apply this diff:
static const struct iface *bond_select_tx_member(const struct iface_info_bond *bond) { switch (bond->mode) { case GR_BOND_MODE_ACTIVE_BACKUP: - if (bond->active_member < ARRAY_DIM(bond->members)) + if (bond->active_member < bond->n_members) return bond->members[bond->active_member]; } return NULL; }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
docs/graph.svgis excluded by!**/*.svg
📒 Files selected for processing (17)
modules/infra/api/gr_infra.h(5 hunks)modules/infra/cli/bond.c(1 hunks)modules/infra/cli/gr_cli_iface.h(0 hunks)modules/infra/cli/iface.c(7 hunks)modules/infra/cli/loopback.c(0 hunks)modules/infra/cli/meson.build(1 hunks)modules/infra/cli/port.c(3 hunks)modules/infra/cli/vlan.c(0 hunks)modules/infra/control/bond.c(1 hunks)modules/infra/control/gr_bond.h(1 hunks)modules/infra/control/gr_port.h(1 hunks)modules/infra/control/meson.build(1 hunks)modules/infra/datapath/bond_output.c(1 hunks)modules/infra/datapath/meson.build(1 hunks)modules/infra/datapath/port_rx.c(5 hunks)modules/ipip/cli.c(0 hunks)smoke/bond_forward_test.sh(1 hunks)
💤 Files with no reviewable changes (4)
- modules/infra/cli/vlan.c
- modules/ipip/cli.c
- modules/infra/cli/loopback.c
- modules/infra/cli/gr_cli_iface.h
🚧 Files skipped from review as they are similar to previous changes (2)
- modules/infra/control/gr_port.h
- modules/infra/control/meson.build
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}
⚙️ CodeRabbit configuration file
**/*.{c,h}: -gr_vec_*()functions cannot fail. No need to check their return value.
gr_vec_free(x)always setsx = NULL. There is no risk of double free.ec_node_*()functions consume theirec_nodearguments. No leaks on error.rte_node->ctxis an uint8_t array of size 16, not a pointer.- Don't suggest to replace
assert()with graceful error checking.- We compile with
-std=gnu2x. Unnamed parameters in function definitions are valid.
Files:
modules/infra/cli/bond.cmodules/infra/cli/port.cmodules/infra/datapath/port_rx.cmodules/infra/api/gr_infra.hmodules/infra/cli/iface.cmodules/infra/control/bond.cmodules/infra/datapath/bond_output.cmodules/infra/control/gr_bond.h
**/*.sh
⚙️ CodeRabbit configuration file
**/*.sh: - Don't bother about unquoted shell variables.
Files:
smoke/bond_forward_test.sh
🧠 Learnings (2)
📚 Learning: 2025-09-05T07:06:51.554Z
Learnt from: rjarry
PR: DPDK/grout#294
File: modules/policy/cli/meson.build:4-8
Timestamp: 2025-09-05T07:06:51.554Z
Learning: In this project, cli_src is a global variable that gets populated by individual modules (like modules/policy/cli/meson.build) and is ultimately consumed by an executable() call in the top-level meson.build file. This creates a modular CLI build where each module contributes its CLI sources to the main CLI binary.
Applied to files:
modules/infra/cli/meson.build
📚 Learning: 2025-09-05T07:06:51.554Z
Learnt from: rjarry
PR: DPDK/grout#294
File: modules/policy/cli/meson.build:4-8
Timestamp: 2025-09-05T07:06:51.554Z
Learning: The grout project uses a modular CLI build system where individual modules contribute their CLI sources to a global cli_src variable via `cli_src += files(...)`, and the top-level meson.build file builds the final grcli executable using all collected CLI sources. The grcli executable is defined in the top-level meson.build at lines 122-127.
Applied to files:
modules/infra/cli/meson.build
🧬 Code graph analysis (7)
modules/infra/cli/bond.c (8)
api/gr_errno.h (1)
errno_set(9-12)modules/infra/control/gr_iface.h (1)
iface(16-22)modules/infra/api/gr_infra.h (1)
gr_bond_mode_name(126-132)modules/infra/cli/iface.c (6)
iface_from_id(108-116)parse_iface_args(118-181)iface_from_name(93-106)complete_iface_names(71-91)constructor(577-580)register_iface_type(22-24)cli/ecoli.c (3)
arg_str(99-109)arg_eth_addr(164-178)with_help(15-24)api/gr_api.h (1)
gr_api_client_send_recv(41-52)modules/infra/cli/port.c (1)
constructor(238-241)cli/exec.c (1)
cli_context_register(16-18)
modules/infra/cli/port.c (1)
modules/infra/cli/iface.c (1)
iface_from_id(108-116)
modules/infra/datapath/port_rx.c (3)
modules/infra/control/gr_iface.h (1)
iface(16-22)modules/infra/control/worker_test.c (1)
iface_from_id(40-42)modules/infra/datapath/trace.c (1)
trace_log_packet(354-504)
modules/infra/cli/iface.c (2)
modules/infra/api/gr_infra.h (1)
gr_iface_type_name(355-373)cli/table.c (1)
scols_line_sprintf(10-23)
modules/infra/control/bond.c (4)
modules/infra/control/iface.c (9)
iface_add_eth_addr(310-323)iface_del_eth_addr(325-338)iface_set_mtu(340-357)iface_set_promisc(379-395)iface_set_allmulti(397-413)iface_set_up_down(359-377)iface_add_vlan(415-428)iface_del_vlan(430-443)iface_get_eth_addr(263-276)main/gr_vec.h (1)
gr_vec_len(47-51)api/gr_errno.h (1)
errno_set(9-12)main/event.c (1)
gr_event_subscribe(14-16)
modules/infra/datapath/bond_output.c (5)
modules/infra/control/gr_iface.h (2)
iface(16-22)iface_get_stats(100-102)modules/infra/datapath/gr_mbuf.h (1)
gr_mbuf_is_traced(52-54)modules/infra/datapath/trace.c (1)
gr_mbuf_trace_add(520-547)modules/infra/control/gr_graph.h (1)
rte_node_enqueue_x1(22-25)modules/infra/datapath/eth_output.c (1)
eth_output_register_interface_type(27-34)
smoke/bond_forward_test.sh (1)
smoke/_init.sh (1)
netns_add(93-99)
🪛 Shellcheck (0.11.0)
smoke/bond_forward_test.sh
[warning] 5-5: Quote this to prevent word splitting.
(SC2046)
[warning] 7-7: run_id is referenced but not assigned.
(SC2154)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
- GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
- GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
- GitHub Check: deb
- GitHub Check: rpm
🔇 Additional comments (3)
modules/infra/datapath/port_rx.c (2)
52-54: Minor optimization: early return on zero RX.The early return when
rx == 0avoids unnecessary processing when no packets are received. This is a sensible optimization.
55-69: Bond-aware interface resolution logic is correct.The logic properly handles bonded ports:
- Checks if the member is active and up before forwarding
- Resolves the bond interface for packet processing
- Falls back to port interface for non-bonded ports
The edge cases (MEMBER_INACTIVE, INVALID_BOND) provide clear drop points for debugging.
modules/infra/cli/port.c (1)
63-71: Bond pointer lifecycle is handled correctly.The bond pointer follows the same pattern as the peer pointer: allocated via
iface_from_id, used for display, and freed in the error path. The cleanup at line 71 ensures no memory leaks.
| switch (bond->mode) { | ||
| case GR_BOND_MODE_ACTIVE_BACKUP: | ||
| assert(bond->primary_member < ARRAY_DIM(bond->member_iface_ids)); | ||
| member_iface_id = bond->member_iface_ids[bond->primary_member]; | ||
| if ((i = iface_from_id(c, member_iface_id)) == NULL) | ||
| SAFE_BUF(snprintf, len, " primary=%u", member_iface_id); | ||
| else | ||
| SAFE_BUF(snprintf, len, " primary=%s", i->name); | ||
| break; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert checks ARRAY_DIM instead of n_members (masks potential OOB).
Line 74 asserts bond->primary_member < ARRAY_DIM(bond->member_iface_ids), but the array is only populated up to bond->n_members. If members are removed without updating primary (e.g., primary_member = 5, n_members reduced to 3), the assert passes but line 75 reads uninitialized memory.
Apply this diff:
switch (bond->mode) {
case GR_BOND_MODE_ACTIVE_BACKUP:
- assert(bond->primary_member < ARRAY_DIM(bond->member_iface_ids));
+ assert(bond->primary_member < bond->n_members);
member_iface_id = bond->member_iface_ids[bond->primary_member];📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| switch (bond->mode) { | |
| case GR_BOND_MODE_ACTIVE_BACKUP: | |
| assert(bond->primary_member < ARRAY_DIM(bond->member_iface_ids)); | |
| member_iface_id = bond->member_iface_ids[bond->primary_member]; | |
| if ((i = iface_from_id(c, member_iface_id)) == NULL) | |
| SAFE_BUF(snprintf, len, " primary=%u", member_iface_id); | |
| else | |
| SAFE_BUF(snprintf, len, " primary=%s", i->name); | |
| break; | |
| } | |
| switch (bond->mode) { | |
| case GR_BOND_MODE_ACTIVE_BACKUP: | |
| assert(bond->primary_member < bond->n_members); | |
| member_iface_id = bond->member_iface_ids[bond->primary_member]; | |
| if ((i = iface_from_id(c, member_iface_id)) == NULL) | |
| SAFE_BUF(snprintf, len, " primary=%u", member_iface_id); | |
| else | |
| SAFE_BUF(snprintf, len, " primary=%s", i->name); | |
| break; | |
| } |
🤖 Prompt for AI Agents
In modules/infra/cli/bond.c around lines 72 to 81, the assert currently checks
bond->primary_member against ARRAY_DIM(bond->member_iface_ids) which can pass
even when the array entries beyond bond->n_members are uninitialized; change the
check to ensure bond->primary_member < bond->n_members (use n_members instead of
ARRAY_DIM) and handle the invalid case gracefully (do not index member_iface_ids
when primary_member is out of range — instead format a fallback string such as
printing the numeric primary_member or "primary=unknown" before attempting
iface_from_id).
7beb771 to
82618c4
Compare
82618c4 to
9b626cb
Compare
8e57128 to
46bdcc6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modules/infra/datapath/port_rx.c (1)
74-85: Use bond iface flags for packet tracing/logging.After resolving
iface, we still gate tracing and log naming onctx->iface. For bonded members this means enabling trace on the bond never fires, and logs keep showing the member name. Flip these checks/logs to use the resolvedifaceso bond-level tracing works as intended.Apply this diff:
- if (unlikely(ctx->iface->flags & GR_IFACE_F_PACKET_TRACE)) { + if (unlikely(iface->flags & GR_IFACE_F_PACKET_TRACE)) { struct port_queue *q; for (r = 0; r < rx; r++) { q = gr_mbuf_trace_add(mbufs[r], node, sizeof(*q)); *q = ctx->rxq; } } if (unlikely(gr_config.log_packets)) { for (r = 0; r < rx; r++) - trace_log_packet(mbufs[r], "rx", ctx->iface->name); + trace_log_packet(mbufs[r], "rx", iface->name);
♻️ Duplicate comments (1)
modules/infra/cli/bond.c (1)
65-73: Guard against stale primary index in CLI summaryIf the previous primary was dropped during a member update,
bond->primary_membercan be ≥bond->n_members. This path then readsbond->members[bond->primary_member]even though that slot is no longer valid. The CLI ends up reporting random interface IDs/names. Please bail out (or fall back) before indexing whenn_members == 0or the primary is out of range.- assert(bond->primary_member < ARRAY_DIM(bond->members)); - member_iface_id = bond->members[bond->primary_member].iface_id; + if (bond->n_members == 0 || bond->primary_member >= bond->n_members) { + SAFE_BUF(snprintf, len, " primary=%u", bond->primary_member); + break; + } + member_iface_id = bond->members[bond->primary_member].iface_id;
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
docs/graph.svgis excluded by!**/*.svg
📒 Files selected for processing (18)
modules/infra/api/gr_infra.h(4 hunks)modules/infra/cli/bond.c(1 hunks)modules/infra/cli/meson.build(1 hunks)modules/infra/cli/port.c(3 hunks)modules/infra/control/bond.c(1 hunks)modules/infra/control/gr_bond.h(1 hunks)modules/infra/control/gr_port.h(1 hunks)modules/infra/control/meson.build(1 hunks)modules/infra/control/port.c(3 hunks)modules/infra/datapath/bond_output.c(1 hunks)modules/infra/datapath/meson.build(1 hunks)modules/infra/datapath/port_rx.c(3 hunks)smoke/bond_active_backup_test.sh(1 hunks)subprojects/dpdk.wrap(1 hunks)subprojects/packagefiles/dpdk/0001-net-tap-add-netlink-helpers.patch(1 hunks)subprojects/packagefiles/dpdk/0002-net-tap-replace-ioctl-with-netlink.patch(1 hunks)subprojects/packagefiles/dpdk/0003-net-tap-detect-namespace-change.patch(1 hunks)subprojects/packagefiles/dpdk/0004-net-tap-configure-link-carrier.patch(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- modules/infra/control/meson.build
- modules/infra/datapath/bond_output.c
- modules/infra/cli/port.c
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}
⚙️ CodeRabbit configuration file
**/*.{c,h}: -gr_vec_*()functions cannot fail. No need to check their return value.
gr_vec_free(x)always setsx = NULL. There is no risk of double free.ec_node_*()functions consume theirec_nodearguments. No leaks on error.rte_node->ctxis an uint8_t array of size 16, not a pointer.- Don't suggest to replace
assert()with graceful error checking.- We compile with
-std=gnu2x. Unnamed parameters in function definitions are valid.
Files:
modules/infra/control/gr_bond.hmodules/infra/cli/bond.cmodules/infra/datapath/port_rx.cmodules/infra/control/gr_port.hmodules/infra/control/bond.cmodules/infra/control/port.cmodules/infra/api/gr_infra.h
**/*.sh
⚙️ CodeRabbit configuration file
**/*.sh: - Don't bother about unquoted shell variables.
Files:
smoke/bond_active_backup_test.sh
🧠 Learnings (4)
📚 Learning: 2025-09-05T07:06:51.554Z
Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/cli/meson.build:4-8
Timestamp: 2025-09-05T07:06:51.554Z
Learning: In this project, cli_src is a global variable that gets populated by individual modules (like modules/policy/cli/meson.build) and is ultimately consumed by an executable() call in the top-level meson.build file. This creates a modular CLI build where each module contributes its CLI sources to the main CLI binary.
Applied to files:
modules/infra/cli/meson.buildmodules/infra/datapath/meson.build
📚 Learning: 2025-09-05T07:06:51.554Z
Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/cli/meson.build:4-8
Timestamp: 2025-09-05T07:06:51.554Z
Learning: The grout project uses a modular CLI build system where individual modules contribute their CLI sources to a global cli_src variable via `cli_src += files(...)`, and the top-level meson.build file builds the final grcli executable using all collected CLI sources. The grcli executable is defined in the top-level meson.build at lines 122-127.
Applied to files:
modules/infra/cli/meson.build
📚 Learning: 2025-11-03T13:28:19.470Z
Learnt from: rjarry
Repo: DPDK/grout PR: 379
File: modules/infra/datapath/port_tx.c:36-46
Timestamp: 2025-11-03T13:28:19.470Z
Learning: In DPDK graph node process callbacks, the return value is only used for statistics and does not affect packet flow or scheduling through the graph. Nodes can return 0 when they haven't processed packets (e.g., when dropping or redirecting due to port state).
Applied to files:
modules/infra/datapath/port_rx.c
📚 Learning: 2025-11-05T13:55:26.169Z
Learnt from: maxime-leroy
Repo: DPDK/grout PR: 372
File: smoke/cross_vrf_forward_test.sh:18-18
Timestamp: 2025-11-05T13:55:26.169Z
Learning: In the DPDK/grout codebase, VRF interfaces (named gr-vrf<id>) are automatically created when an interface is added to a non-existing VRF using port_add. The VRF creation is handled automatically by the event system in vrf_netlink.c, so no explicit VRF interface creation commands are needed in test scripts.
Applied to files:
modules/infra/control/bond.cmodules/infra/api/gr_infra.h
🧬 Code graph analysis (5)
smoke/bond_active_backup_test.sh (1)
smoke/_init.sh (2)
fail(102-105)netns_add(111-119)
modules/infra/cli/bond.c (6)
modules/infra/control/gr_iface.h (1)
iface(16-22)modules/infra/api/gr_infra.h (1)
gr_bond_mode_name(125-131)api/gr_errno.h (1)
errno_set(9-12)modules/infra/cli/iface.c (4)
parse_iface_args(141-204)iface_from_name(93-106)complete_iface_names(71-91)register_iface_type(22-24)cli/ecoli.c (2)
arg_str(99-109)arg_eth_addr(164-178)api/gr_api.h (1)
gr_api_client_send_recv(41-52)
modules/infra/datapath/port_rx.c (2)
modules/infra/control/gr_iface.h (1)
iface(16-22)modules/infra/control/worker_test.c (1)
iface_from_id(45-47)
modules/infra/control/bond.c (5)
modules/infra/control/iface.c (8)
iface_add_eth_addr(310-323)iface_del_eth_addr(325-338)iface_set_mtu(340-357)iface_set_promisc(379-395)iface_set_allmulti(397-413)iface_add_vlan(415-428)iface_del_vlan(430-443)iface_get_eth_addr(263-276)api/gr_errno.h (1)
errno_set(9-12)modules/infra/control/gr_iface.h (1)
iface(16-22)main/gr_vec.h (1)
gr_vec_len(47-51)main/event.c (1)
gr_event_subscribe(14-16)
modules/infra/control/port.c (1)
api/gr_errno.h (1)
errno_set(9-12)
🪛 Shellcheck (0.11.0)
smoke/bond_active_backup_test.sh
[warning] 18-18: Quote this to prevent word splitting.
(SC2046)
| static int bond_mac_del(struct iface *iface, const struct rte_ether_addr *mac) { | ||
| struct iface_info_bond *bond = iface_info_bond(iface); | ||
|
|
||
| // Remove MAC address from all member ports | ||
| bond_all_member_del_mac(bond, mac); | ||
|
|
||
| for (unsigned i = 0; i < gr_vec_len(bond->extra_macs); i++) { | ||
| if (rte_is_same_ether_addr(&bond->extra_macs[i], mac)) { | ||
| gr_vec_del(bond->extra_macs, i); | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Propagate MAC removal failures
bond_all_member_del_mac() already reports errors when a member refuses the MAC, but bond_mac_del() drops that status and returns success. That leaves half-configured members with the address still programmed. Capture the return value and propagate it.
- // Remove MAC address from all member ports
- bond_all_member_del_mac(bond, mac);
+ int ret = bond_all_member_del_mac(bond, mac);
+ if (ret < 0)
+ return ret;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| static int bond_mac_del(struct iface *iface, const struct rte_ether_addr *mac) { | |
| struct iface_info_bond *bond = iface_info_bond(iface); | |
| // Remove MAC address from all member ports | |
| bond_all_member_del_mac(bond, mac); | |
| for (unsigned i = 0; i < gr_vec_len(bond->extra_macs); i++) { | |
| if (rte_is_same_ether_addr(&bond->extra_macs[i], mac)) { | |
| gr_vec_del(bond->extra_macs, i); | |
| break; | |
| } | |
| } | |
| return 0; | |
| static int bond_mac_del(struct iface *iface, const struct rte_ether_addr *mac) { | |
| struct iface_info_bond *bond = iface_info_bond(iface); | |
| // Remove MAC address from all member ports | |
| int ret = bond_all_member_del_mac(bond, mac); | |
| if (ret < 0) | |
| return ret; | |
| for (unsigned i = 0; i < gr_vec_len(bond->extra_macs); i++) { | |
| if (rte_is_same_ether_addr(&bond->extra_macs[i], mac)) { | |
| gr_vec_del(bond->extra_macs, i); | |
| break; | |
| } | |
| } | |
| return 0; | |
| } |
🤖 Prompt for AI Agents
In modules/infra/control/bond.c around lines 53 to 66, bond_mac_del() calls
bond_all_member_del_mac(bond, mac) but ignores its return value so failures are
swallowed; capture the return value, check for an error and return that error
code immediately (before modifying extra_macs), otherwise continue to remove the
MAC from bond->extra_macs and return success; ensure the function returns the
propagated non-zero error if bond_all_member_del_mac fails.
| if (set_attrs & (GR_BOND_SET_MAC | GR_BOND_SET_MEMBERS | GR_BOND_SET_PRIMARY)) { | ||
| struct rte_ether_addr mac; | ||
| if (rte_is_zero_ether_addr(&api->mac)) { | ||
| const struct iface *primary = bond->members[bond->primary_member].iface; | ||
| if (iface_get_eth_addr(primary->id, &mac) < 0) | ||
| return errno_set(errno); | ||
| } else { | ||
| mac = api->mac; | ||
| } | ||
| if (bond_mac_set(iface, &mac) < 0) | ||
| return errno_set(errno); | ||
| bond->mac = mac; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix out-of-range primary when members change
GR_BOND_SET_MEMBERS reorders/shrinks bond->members, but the existing bond->primary_member is left untouched. When the unit that used to be primary disappears, this dereferences bond->members[bond->primary_member] out of bounds (and crashes if the new list is empty). Clamp or reset the primary before dereferencing, and skip the implicit MAC lookup when there is no valid member.
- if (rte_is_zero_ether_addr(&api->mac)) {
- const struct iface *primary = bond->members[bond->primary_member].iface;
- if (iface_get_eth_addr(primary->id, &mac) < 0)
- return errno_set(errno);
- } else {
+ if (bond->n_members == 0)
+ return errno_set(ENOLINK);
+ if (bond->primary_member >= bond->n_members)
+ bond->primary_member = 0;
+ if (rte_is_zero_ether_addr(&api->mac)) {
+ const struct iface *primary = bond->members[bond->primary_member].iface;
+ if (iface_get_eth_addr(primary->id, &mac) < 0)
+ return errno_set(errno);
+ } else {| if bridge -n n0 fdb show br br0 brport "$iface" state reachable | grep -F "$mac"; then | ||
| return 0 | ||
| fi | ||
| sleep 0.2 | ||
| attempts=$((attempts + 1)) | ||
| done | ||
| fail "bond member $iface not active" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix bridge fdb filter syntax.
bridge fdb show doesn’t accept brport; it exits with “invalid parameter brport”, so wait_member_active always times out. Use the supported dev filter to match the port.
Patch:
- if bridge -n n0 fdb show br br0 brport "$iface" state reachable | grep -F "$mac"; then
+ if bridge -n n0 fdb show br br0 dev "$iface" state reachable | grep -F "$mac"; then📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if bridge -n n0 fdb show br br0 brport "$iface" state reachable | grep -F "$mac"; then | |
| return 0 | |
| fi | |
| sleep 0.2 | |
| attempts=$((attempts + 1)) | |
| done | |
| fail "bond member $iface not active" | |
| } | |
| if bridge -n n0 fdb show br br0 dev "$iface" state reachable | grep -F "$mac"; then | |
| return 0 | |
| fi | |
| sleep 0.2 | |
| attempts=$((attempts + 1)) | |
| done | |
| fail "bond member $iface not active" | |
| } |
🤖 Prompt for AI Agents
In smoke/bond_active_backup_test.sh around lines 9 to 16, the bridge fdb command
uses the unsupported filter "brport" which causes "bridge fdb show" to fail and
the wait_member_active loop to always time out; replace the "brport" filter with
the supported "dev" filter so the command becomes a valid bridge fdb show
invocation (e.g. use ... fdb show br br0 dev "$iface" state reachable | grep -F
"$mac") and keep the rest of the loop logic unchanged.
| - int ret = ioctl(internals->ioctl_sock, SIOCSIFFLAGS, | ||
| - &internals->remote_initial_flags); | ||
| + int ret = tap_nl_set_flags(internals->nlsk_fd, internals->remote_if_index, | ||
| + internals->remote_initial_flags, 1); | ||
| if (ret) | ||
| TAP_LOG(ERR, "restore remote state failed: %d", ret); | ||
| - | ||
| } | ||
|
|
||
| rte_mempool_free(internals->gso_ctx_mp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore remote flags exactly on close
tap_nl_set_flags(..., remote_initial_flags, 1) only sets the bits present in the initial mask; it never clears the bits we added (PROMISC, ALLMULTI, etc.). The old ioctl path wrote the full flag word, so the remote interface always returned to its original state. Please fetch the current flags, clear the bits that should be off, then set the ones that should stay on before closing.
- int ret = tap_nl_set_flags(internals->nlsk_fd, internals->remote_if_index,
- internals->remote_initial_flags, 1);
+ unsigned int current, to_clear, to_set;
+ int ret = tap_nl_get_flags(internals->nlsk_fd, internals->remote_if_index, ¤t);
+ if (ret == 0) {
+ to_clear = current & ~internals->remote_initial_flags;
+ if (to_clear)
+ ret = tap_nl_set_flags(internals->nlsk_fd, internals->remote_if_index, to_clear, 0);
+ if (ret == 0) {
+ to_set = internals->remote_initial_flags & ~current;
+ if (to_set)
+ ret = tap_nl_set_flags(internals->nlsk_fd, internals->remote_if_index, to_set, 1);
+ }
+ }Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In subprojects/packagefiles/dpdk/0002-net-tap-replace-ioctl-with-netlink.patch
around lines 175-184, instead of only calling tap_nl_set_flags(...,
remote_initial_flags, 1) you must first read the current flags, compute which
bits need clearing, clear those bits, then set the bits that must remain on:
call tap_nl_get_flags(internals->nlsk_fd, internals->remote_if_index,
&curr_flags), compute bits_to_clear = curr_flags &
~internals->remote_initial_flags, if bits_to_clear != 0 call
tap_nl_set_flags(internals->nlsk_fd, internals->remote_if_index, bits_to_clear,
0) to clear them, then call tap_nl_set_flags(internals->nlsk_fd,
internals->remote_if_index, internals->remote_initial_flags, 1) to restore the
remaining flags and handle errors for both calls.
| + netns_fd = ioctl(pmd->ka_fd, TUNGETDEVNETNS); | ||
| + if (netns_fd < 0) { | ||
| + TAP_LOG(INFO, "%s: interface deleted", pmd->name); | ||
| + return 0; | ||
| + } | ||
| + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix TUNGETDEVNETNS ioctl usage
We must pass a pointer when calling this _IOR ioctl; otherwise the kernel writes to an undefined address, we get EFAULT, and all namespace moves are misdetected as deletions.
Apply this diff:
- netns_fd = ioctl(pmd->ka_fd, TUNGETDEVNETNS);
- if (netns_fd < 0) {
+ if (ioctl(pmd->ka_fd, TUNGETDEVNETNS, &netns_fd) < 0) {
TAP_LOG(INFO, "%s: interface deleted", pmd->name);
return 0;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| + netns_fd = ioctl(pmd->ka_fd, TUNGETDEVNETNS); | |
| + if (netns_fd < 0) { | |
| + TAP_LOG(INFO, "%s: interface deleted", pmd->name); | |
| + return 0; | |
| + } | |
| + | |
| if (ioctl(pmd->ka_fd, TUNGETDEVNETNS, &netns_fd) < 0) { | |
| TAP_LOG(INFO, "%s: interface deleted", pmd->name); | |
| return 0; | |
| } | |
🤖 Prompt for AI Agents
In subprojects/packagefiles/dpdk/0003-net-tap-detect-namespace-change.patch
around lines 53-58, the ioctl call uses TUNGETDEVNETNS incorrectly by not
passing a pointer for the _IOR operation; change it to declare a local int (e.g.
netns_fd), call ioctl(pmd->ka_fd, TUNGETDEVNETNS, &netns_fd) and check the ioctl
return value (< 0) to log deletion and return 0 — do not assign the ioctl return
directly to the netns variable since the kernel writes to the provided pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is incorrect. This ioctl returns a file descriptor or a negative value on error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Fetch the kernel source for TUNGETDEVNETNS to verify its behavior
curl -sL 'https://raw.githubusercontent.com/torvalds/linux/c2c2ccfd4ba72718266a56f3ecc34c989cb5b7a0/drivers/net/tun.c' | sed -n '3310,3330p'
Length of output: 358
@rjarry, you're absolutely right—I apologize for the incorrect suggestion. The kernel source confirms that TUNGETDEVNETNS returns the file descriptor directly via open_related_ns(), not through a pointer argument. The original code is correct:
netns_fd = ioctl(pmd->ka_fd, TUNGETDEVNETNS);
if (netns_fd < 0) {
TAP_LOG(INFO, "%s: interface deleted", pmd->name);
return 0;
}This follows the standard pattern for ioctls that return file descriptors (like dup() or open()). Thank you for the correction and the reference.
✏️ Learnings added
Learnt from: rjarry
Repo: DPDK/grout PR: 344
File: subprojects/packagefiles/dpdk/0003-net-tap-detect-namespace-change.patch:53-58
Timestamp: 2025-11-06T18:06:32.337Z
Learning: The TUNGETDEVNETNS ioctl returns a file descriptor directly as the ioctl return value (via open_related_ns()), not through a pointer argument. The correct usage is: `netns_fd = ioctl(fd, TUNGETDEVNETNS);` followed by checking `netns_fd < 0` for errors. Do not suggest passing a pointer argument for this ioctl.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: maxime-leroy
Repo: DPDK/grout PR: 309
File: modules/srv6/datapath/srv6_local.c:97-101
Timestamp: 2025-09-09T09:22:31.596Z
Learning: In SRv6 IPv6 extension header parsing, when is_ipv6_ext[proto] is true, rte_ipv6_get_next_ext() will not return a negative value, making error checking unnecessary. An assert can be used as a defensive measure for future DPDK compatibility.
| assert(bond->primary_member < ARRAY_DIM(bond->members)); | ||
| member_iface_id = bond->members[bond->primary_member].iface_id; | ||
| if ((i = iface_from_id(c, member_iface_id)) == NULL) | ||
| SAFE_BUF(snprintf, len, " primary=%u", member_iface_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we output an error message ? (as this is a race: while the user requested the bond info, 1 interface was removed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is used across multiple CLI commands. It shouldn't happen and I think displaying a number is fine in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (6)
subprojects/packagefiles/dpdk/0002-net-tap-replace-ioctl-with-netlink.patch (1)
175-184: Restore remote flags exactly on close.
tap_nl_set_flags(..., remote_initial_flags, 1)only sets the bits in the initial mask; it never clears bits added during operation (PROMISC, ALLMULTI). The remote interface won't return to its original state.Fetch current flags, compute which bits need clearing, clear them, then set the bits that must remain:
- int ret = tap_nl_set_flags(internals->nlsk_fd, internals->remote_if_index, - internals->remote_initial_flags, 1); + unsigned int current, to_clear; + int ret = tap_nl_get_flags(internals->nlsk_fd, internals->remote_if_index, ¤t); + if (ret == 0) { + to_clear = current & ~internals->remote_initial_flags; + if (to_clear) + ret = tap_nl_set_flags(internals->nlsk_fd, internals->remote_if_index, to_clear, 0); + if (ret == 0 && internals->remote_initial_flags) + ret = tap_nl_set_flags(internals->nlsk_fd, internals->remote_if_index, + internals->remote_initial_flags, 1); + } if (ret)subprojects/packagefiles/dpdk/0003-net-tap-detect-namespace-change.patch (1)
53-57: Pass storage to TUNGETDEVNETNS ioctlThis _IOR ioctl writes into the third argument. Without passing a pointer the call returns EFAULT, so every namespace move looks like a deletion. Reuse a local
int netns_fd;, pass its address, and keep checking the ioctl return.- netns_fd = ioctl(pmd->ka_fd, TUNGETDEVNETNS); - if (netns_fd < 0) { + if (ioctl(pmd->ka_fd, TUNGETDEVNETNS, &netns_fd) < 0) {smoke/bond_active_backup_test.sh (1)
9-12: Fix invalid bridge filter.
bridge fdb showrejectsbrport, so member detection never succeeds and the test times out. Use the supporteddevfilter instead.- if bridge -n n0 fdb show br br0 brport "$iface" state reachable | grep -F "$mac"; then + if bridge -n n0 fdb show br br0 dev "$iface" state reachable | grep -F "$mac"; thenmodules/infra/cli/bond.c (1)
65-73: Guard primary index when members shrink. Removing the primary member leavesbond->primary_member≥bond->n_members, so we pull a staleiface_idand feediface_from_id()garbage. Please clamp or fall back before dereferencing.- assert(bond->primary_member < ARRAY_DIM(bond->members)); - member_iface_id = bond->members[bond->primary_member].iface_id; + if (bond->primary_member >= bond->n_members) + goto no_primary; + member_iface_id = bond->members[bond->primary_member].iface_id; if ((i = iface_from_id(c, member_iface_id)) == NULL) SAFE_BUF(snprintf, len, " primary=%u", member_iface_id); else SAFE_BUF(snprintf, len, " primary=%s", i->name); break; } + +no_primary: + if (bond->mode == GR_BOND_MODE_ACTIVE_BACKUP) + SAFE_BUF(snprintf, len, " primary=unknown");modules/infra/control/bond.c (2)
56-64: Propagate MAC removal failures. We still ignorebond_all_member_del_mac()errors, so we claim success while members keep the address. Bubble the error up before touchingextra_macs.- // Remove MAC address from all member ports - bond_all_member_del_mac(bond, mac); + // Remove MAC address from all member ports + ret = bond_all_member_del_mac(bond, mac); + if (ret < 0) + return ret;
307-318: Clampprimary_memberafter reshuffle. When membership shrinks without an explicit PRIMARY update,bond->primary_membercan point pastbond->n_members, so we grab the MAC from a removed port. Validate (or reset) before dereferencing.- if (rte_is_zero_ether_addr(&api->mac)) { + if (bond->n_members == 0) + return errno_set(ENOLINK); + if (bond->primary_member >= bond->n_members) + bond->primary_member = 0; + if (rte_is_zero_ether_addr(&api->mac)) { const struct iface *primary = bond->members[bond->primary_member].iface; if (iface_get_eth_addr(primary->id, &mac) < 0) return errno_set(errno);
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
docs/graph.svgis excluded by!**/*.svg
📒 Files selected for processing (18)
modules/infra/api/gr_infra.h(4 hunks)modules/infra/cli/bond.c(1 hunks)modules/infra/cli/meson.build(1 hunks)modules/infra/cli/port.c(3 hunks)modules/infra/control/bond.c(1 hunks)modules/infra/control/gr_bond.h(1 hunks)modules/infra/control/gr_port.h(1 hunks)modules/infra/control/meson.build(1 hunks)modules/infra/control/port.c(3 hunks)modules/infra/datapath/bond_output.c(1 hunks)modules/infra/datapath/meson.build(1 hunks)modules/infra/datapath/port_rx.c(3 hunks)smoke/bond_active_backup_test.sh(1 hunks)subprojects/dpdk.wrap(1 hunks)subprojects/packagefiles/dpdk/0001-net-tap-add-netlink-helpers.patch(1 hunks)subprojects/packagefiles/dpdk/0002-net-tap-replace-ioctl-with-netlink.patch(1 hunks)subprojects/packagefiles/dpdk/0003-net-tap-detect-namespace-change.patch(1 hunks)subprojects/packagefiles/dpdk/0004-net-tap-configure-link-carrier.patch(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- modules/infra/cli/meson.build
- modules/infra/cli/port.c
- subprojects/dpdk.wrap
- modules/infra/control/meson.build
- modules/infra/control/gr_port.h
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}
⚙️ CodeRabbit configuration file
**/*.{c,h}: -gr_vec_*()functions cannot fail. No need to check their return value.
gr_vec_free(x)always setsx = NULL. There is no risk of double free.ec_node_*()functions consume theirec_nodearguments. No leaks on error.rte_node->ctxis an uint8_t array of size 16, not a pointer.- Don't suggest to replace
assert()with graceful error checking.- We compile with
-std=gnu2x. Unnamed parameters in function definitions are valid.
Files:
modules/infra/datapath/port_rx.cmodules/infra/control/port.cmodules/infra/datapath/bond_output.cmodules/infra/cli/bond.cmodules/infra/api/gr_infra.hmodules/infra/control/gr_bond.hmodules/infra/control/bond.c
**/*.sh
⚙️ CodeRabbit configuration file
**/*.sh: - Don't bother about unquoted shell variables.
Files:
smoke/bond_active_backup_test.sh
🧠 Learnings (3)
📚 Learning: 2025-11-03T13:28:19.470Z
Learnt from: rjarry
Repo: DPDK/grout PR: 379
File: modules/infra/datapath/port_tx.c:36-46
Timestamp: 2025-11-03T13:28:19.470Z
Learning: In DPDK graph node process callbacks, the return value is only used for statistics and does not affect packet flow or scheduling through the graph. Nodes can return 0 when they haven't processed packets (e.g., when dropping or redirecting due to port state).
Applied to files:
modules/infra/datapath/port_rx.cmodules/infra/datapath/bond_output.c
📚 Learning: 2025-11-05T13:55:26.169Z
Learnt from: maxime-leroy
Repo: DPDK/grout PR: 372
File: smoke/cross_vrf_forward_test.sh:18-18
Timestamp: 2025-11-05T13:55:26.169Z
Learning: In the DPDK/grout codebase, VRF interfaces (named gr-vrf<id>) are automatically created when an interface is added to a non-existing VRF using port_add. The VRF creation is handled automatically by the event system in vrf_netlink.c, so no explicit VRF interface creation commands are needed in test scripts.
Applied to files:
modules/infra/api/gr_infra.hmodules/infra/control/bond.c
📚 Learning: 2025-09-09T09:22:31.596Z
Learnt from: maxime-leroy
Repo: DPDK/grout PR: 309
File: modules/srv6/datapath/srv6_local.c:97-101
Timestamp: 2025-09-09T09:22:31.596Z
Learning: In SRv6 IPv6 extension header parsing, when is_ipv6_ext[proto] is true, rte_ipv6_get_next_ext() will not return a negative value, making error checking unnecessary. An assert can be used as a defensive measure for future DPDK compatibility.
Applied to files:
subprojects/packagefiles/dpdk/0003-net-tap-detect-namespace-change.patch
🧬 Code graph analysis (6)
modules/infra/datapath/port_rx.c (2)
modules/infra/control/gr_iface.h (1)
iface(16-22)modules/infra/control/worker_test.c (1)
iface_from_id(45-47)
modules/infra/control/port.c (1)
api/gr_errno.h (1)
errno_set(9-12)
modules/infra/datapath/bond_output.c (5)
modules/infra/control/gr_iface.h (2)
iface(16-22)iface_get_stats(100-102)modules/infra/datapath/gr_mbuf.h (1)
gr_mbuf_is_traced(52-54)modules/infra/datapath/trace.c (1)
gr_mbuf_trace_add(562-589)modules/infra/control/gr_graph.h (1)
rte_node_enqueue_x1(22-25)modules/infra/datapath/eth_output.c (1)
eth_output_register_interface_type(27-34)
modules/infra/cli/bond.c (7)
modules/infra/control/gr_iface.h (1)
iface(16-22)modules/infra/api/gr_infra.h (1)
gr_bond_mode_name(125-131)api/gr_errno.h (1)
errno_set(9-12)modules/infra/cli/iface.c (4)
parse_iface_args(141-204)iface_from_name(93-106)complete_iface_names(71-91)register_iface_type(22-24)cli/ecoli.c (2)
arg_str(99-109)arg_eth_addr(164-178)api/gr_api.h (1)
gr_api_client_send_recv(41-52)cli/exec.c (1)
cli_context_register(16-18)
modules/infra/control/bond.c (5)
modules/infra/control/iface.c (8)
iface_add_eth_addr(310-323)iface_del_eth_addr(325-338)iface_set_mtu(340-357)iface_set_promisc(379-395)iface_set_allmulti(397-413)iface_add_vlan(415-428)iface_del_vlan(430-443)iface_get_eth_addr(263-276)api/gr_errno.h (1)
errno_set(9-12)modules/infra/control/gr_iface.h (1)
iface(16-22)main/gr_vec.h (1)
gr_vec_len(47-51)main/event.c (1)
gr_event_subscribe(14-16)
smoke/bond_active_backup_test.sh (1)
smoke/_init.sh (2)
fail(102-105)netns_add(111-119)
🪛 Shellcheck (0.11.0)
smoke/bond_active_backup_test.sh
[warning] 18-18: Quote this to prevent word splitting.
(SC2046)
🔇 Additional comments (7)
subprojects/packagefiles/dpdk/0001-net-tap-add-netlink-helpers.patch (1)
35-323: Netlink helper implementations look solid.The netlink message construction, attribute parsing, and flag manipulation logic are all correct. Error handling is appropriate with ENODEV returned when interfaces aren't found.
subprojects/packagefiles/dpdk/0002-net-tap-replace-ioctl-with-netlink.patch (6)
134-159: Link up/down conversions look correct.The link state management properly handles both local and remote interfaces with appropriate error checking.
199-230: Link update gracefully handles netlink failures.Falls back appropriately if flag queries fail, which is reasonable for status polling.
233-358: Promisc and allmulti conversions maintain proper rollback.Both enable and disable paths correctly handle local and remote interfaces with appropriate error recovery.
361-434: MAC and MTU conversions are clean.The refactoring correctly uses netlink helpers with appropriate short-circuit optimizations.
466-549: Netlink initialization moved earlier and made mandatory.This is the right approach. The fallback to
remote_initial_flags = 0when flag reading fails is acceptable—failing device creation would be worse.
569-585: Header changes align with netlink refactoring.The type changes and field consolidation are correct.
| if (arg_eth_addr(p, "MAC", &bond->mac) == 0) | ||
| set_attrs |= GR_BOND_SET_MAC; | ||
| else | ||
| memset(&bond->mac, 0, sizeof(bond->mac)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep configured MAC during updates. On iface set you zero the cached MAC when the user omits MAC, so any later reconfig wipes their programmed address. Only clear it for fresh bonds.
- else
- memset(&bond->mac, 0, sizeof(bond->mac));
+ else if (!update)
+ memset(&bond->mac, 0, sizeof(bond->mac));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (arg_eth_addr(p, "MAC", &bond->mac) == 0) | |
| set_attrs |= GR_BOND_SET_MAC; | |
| else | |
| memset(&bond->mac, 0, sizeof(bond->mac)); | |
| if (arg_eth_addr(p, "MAC", &bond->mac) == 0) | |
| set_attrs |= GR_BOND_SET_MAC; | |
| else if (!update) | |
| memset(&bond->mac, 0, sizeof(bond->mac)); |
🤖 Prompt for AI Agents
In modules/infra/cli/bond.c around lines 154-157, the code zeroes bond->mac when
the user omits MAC on an `iface set`, which wipes a previously configured
hardware address; change the logic so the MAC is only cleared for newly created
bonds, not on updates. Specifically, detect whether this operation is a fresh
bond creation (use the existing creation flag/field used elsewhere for new
bonds, e.g., a "new" or "created" marker or the initial set_attrs state) and
only call memset(&bond->mac, 0, sizeof(bond->mac)) when that marker indicates a
new bond; otherwise leave bond->mac untouched when arg_eth_addr returns non-zero
(i.e., MAC omitted).
| for (uint8_t i = 0; i < new->n_members; i++) { | ||
| struct iface *member = iface_from_id(new->members[i].iface_id); | ||
| if (member == NULL) | ||
| return errno_set(errno); | ||
|
|
||
| if (member->type != GR_IFACE_TYPE_PORT) | ||
| return errno_set(EMEDIUMTYPE); | ||
|
|
||
| for (uint8_t j = 0; j < bond->n_members; j++) { | ||
| if (bond->members[j].iface->id == member->id) | ||
| goto skip; | ||
| } | ||
|
|
||
| LOG(DEBUG, "adding %s to bond %s", member->name, iface->name); | ||
| port = iface_info_port(member); | ||
| port->bond_iface_id = iface->id; | ||
| skip:; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Block double-enlisting ports. We let a port that already belongs to another bond be reassigned, corrupting the old bond’s state. Refuse members whose bond_iface_id is set to a different bond.
- port = iface_info_port(member);
+ port = iface_info_port(member);
+ if (port->bond_iface_id != GR_IFACE_ID_UNDEF && port->bond_iface_id != iface->id)
+ return errno_set(EBUSY);
port->bond_iface_id = iface->id;🤖 Prompt for AI Agents
In modules/infra/control/bond.c around lines 179 to 195, the loop adding members
doesn't refuse a port already assigned to a different bond, leading to
double-enlistment and corruption; after resolving port with port =
iface_info_port(member) check port->bond_iface_id and if it's non-zero and not
equal to iface->id return errno_set(EBUSY) (or an appropriate error constant)
instead of reassigning, otherwise proceed to set port->bond_iface_id =
iface->id; keep the existing skip/duplicate check intact.
| + if (setns(orig_netns_fd, CLONE_NEWNET) < 0) | ||
| + TAP_LOG(ERR, "%s: failed to return to original netns: %s", | ||
| + pmd->name, strerror(errno)); | ||
| + | ||
| + close(orig_netns_fd); | ||
| + close(netns_fd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Treat failure to restore original netns as fatal
If we cannot switch back to the original namespace we keep executing inside the member netns, breaking every subsequent TAP operation. Bail out immediately, closing any new sockets to avoid leaks.
- if (setns(orig_netns_fd, CLONE_NEWNET) < 0)
- TAP_LOG(ERR, "%s: failed to return to original netns: %s",
- pmd->name, strerror(errno));
-
- close(orig_netns_fd);
- close(netns_fd);
+ if (setns(orig_netns_fd, CLONE_NEWNET) < 0) {
+ TAP_LOG(ERR, "%s: failed to return to original netns: %s",
+ pmd->name, strerror(errno));
+ close(orig_netns_fd);
+ close(netns_fd);
+ if (new_nlsk_fd >= 0)
+ close(new_nlsk_fd);
+ return -1;
+ }
+
+ close(orig_netns_fd);
+ close(netns_fd);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| + if (setns(orig_netns_fd, CLONE_NEWNET) < 0) | |
| + TAP_LOG(ERR, "%s: failed to return to original netns: %s", | |
| + pmd->name, strerror(errno)); | |
| + | |
| + close(orig_netns_fd); | |
| + close(netns_fd); | |
| if (setns(orig_netns_fd, CLONE_NEWNET) < 0) { | |
| TAP_LOG(ERR, "%s: failed to return to original netns: %s", | |
| pmd->name, strerror(errno)); | |
| close(orig_netns_fd); | |
| close(netns_fd); | |
| if (new_nlsk_fd >= 0) | |
| close(new_nlsk_fd); | |
| return -1; | |
| } | |
| close(orig_netns_fd); | |
| close(netns_fd); |
🤖 Prompt for AI Agents
In subprojects/packagefiles/dpdk/0003-net-tap-detect-namespace-change.patch
around lines 96 to 101, treat failure to restore the original network namespace
as fatal: if setns(orig_netns_fd, CLONE_NEWNET) fails you must log the error and
immediately return/exit the function with a failure code, ensuring you close any
new sockets/fds (e.g., netns_fd and any sockets opened in the member netns)
before returning to avoid descriptor leaks and prevent continuing execution in
the member netns.
| + /* set carrier after creating at least one rxq */ | ||
| + ret = tap_carrier_set(internals, 1); | ||
| + if (ret < 0) | ||
| + goto error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't fail queue setup when the kernel lacks TUNSETCARRIER support.
Line 54 currently aborts queue setup if ioctl(TUNSETCARRIER) returns -1. That’s a runtime regression: builds using headers that define TUNSETCARRIER can still run on kernels predating 5.0 (or configurations where the tap device refuses that ioctl), and those kernels return ENOTTY/EINVAL. Prior to this patch we’d succeed; after it we bail out and the port never comes up. Please treat those failures as non-fatal (log a warning, keep init going) so older kernels keep working while newer ones benefit from the carrier setting.
🤖 Prompt for AI Agents
In subprojects/packagefiles/dpdk/0004-net-tap-configure-link-carrier.patch
around lines 53 to 56, don't treat a -1 return from tap_carrier_set as fatal
unconditionally; instead check errno and only abort on unexpected errors. Change
the logic so that if tap_carrier_set returns -1 and errno is ENOTTY or EINVAL
you log a warning and continue initialization, otherwise on other errors goto
error as before.
In order to support bond ports, we need the net_tap driver to handle netns changes and keep reporting the link status without issues. Patches have been submitted upstream and will be included in the next 25.11 release. Until this release is available, include these patches in grout. Link: http://patches.dpdk.org/project/dpdk/list/?series=36528&state=* Signed-off-by: Robin Jarry <[email protected]> Reviewed-by: Christophe Fontaine <[email protected]>
Remove the workaround that assumed tap interfaces are always running. This was needed when tap interfaces were moved to network namespaces, which caused the kernel to signal them as down even though they could still transmit and receive packets. Since the DPDK tap drive now supports netns change, we no longer need this workaround. Signed-off-by: Robin Jarry <[email protected]> Reviewed-by: Christophe Fontaine <[email protected]>
Allow aggregating multiple physical ports into a single logical bond interface. Only active-backup mode is implemented where traffic flows through a single active member port at any time. Active member selection prioritizes the configured primary port when it is UP and RUNNING. Otherwise, the first available UP and RUNNING member becomes active. The bond subscribes to port link state events and automatically fails over when necessary. In the RX path, port_rx checks if a port belongs to a bond by looking at the bond_iface_id field. Packets from inactive members are dropped immediately. Packets from the active member get their iface pointer rewritten to point to the bond interface before being forwarded to normal processing. In the TX path, bond_output examines the current active member and rewrites the mbuf iface pointer and port fields before forwarding to port_output. Interface configuration (MAC address, MTU, promisc, allmulti, VLANs) is propagated from the bond to all member ports. MAC addresses are tracked separately to handle members being added or removed dynamically. This ensures that bond reconfiguration does not leave stale addresses on removed members or forget to apply addresses to newly added ones. New CLI commands allow creating and managing bond interfaces: grcli interface add bond NAME mode active-backup member PORT ... grcli interface set bond NAME member PORT ... primary PORT The smoke test creates two bond interfaces with multiple member ports connected via the Linux bridge to network namespaces. It verifies bidirectional connectivity and tests failover by bringing down the active member port and confirming traffic continues through the backup. Signed-off-by: Robin Jarry <[email protected]> Reviewed-by: Christophe Fontaine <[email protected]>
Add support for bonding multiple physical ports into a single logical interface. This initial implementation supports active-backup mode where only one member port is active at a time for both RX and TX traffic.
The bond interface type can aggregate multiple physical ports. Member ports forward their received packets to the bond interface when active. The bond output node selects the active member based on the configured mode and forwards packets accordingly.
Active member selection in active-backup mode works as follows:
The bond interface propagates configuration to all members:
A new ACTIVE state flag indicates which member is currently handling traffic. Port interfaces track their bond membership via bond_iface_id.
The RX datapath checks if incoming ports are bond members. If the member is inactive or not UP, packets are dropped. Active member packets are forwarded with the iface pointer updated to the bond interface.
The TX datapath uses bond_output node to select the active member and update the mbuf iface pointer before forwarding to port_output.
Add CLI commands to manage bond interfaces.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores