-
Notifications
You must be signed in to change notification settings - Fork 21
Add IPv4 fragmentation #348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughAdds IPv4 fragmentation datapath and a DF-set error node. Introduces Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
modules/ip/datapath/ip_error.c (1)
109-113: Include Next‑Hop MTU in ICMP “Fragmentation needed” (RFC 1191).For code UNREACH_FRAG, set the 32‑bit “unused/MTU” field to the egress MTU so PMTUD works.
Apply:
icmp->icmp_type = icmp_type; icmp->icmp_code = icmp_code; icmp->icmp_cksum = 0; - icmp->icmp_ident = 0; - icmp->icmp_seq_nb = 0; + icmp->icmp_ident = 0; + icmp->icmp_seq_nb = 0; + if (icmp_type == RTE_ICMP_TYPE_DEST_UNREACHABLE && + icmp_code == RTE_ICMP_CODE_UNREACH_FRAG) { + // RFC1191: return Next-Hop MTU in the 32-bit field + *(uint32_t *)&icmp->icmp_ident = rte_cpu_to_be_32(iface->mtu); + }node->ctx usage with single‑byte storage for type/code is correct. As per coding guidelines.
Also applies to: 80-85
smoke/ip_fragment_test.sh (2)
50-50: Correct the fragmentation split in the comment.With MTU 1280 and 20‑byte IP header, first fragment payload aligns to 1256 (multiple of 8), not 1276; remainder 32.
Apply:
-# Expected: Packet is fragmented into 2 fragments (1276 + 32 bytes) and ping succeeds +# Expected: Packet is fragmented into 2 fragments (1256 + 32 bytes) and ping succeeds
1-7: Optional: make the test more robust.Enable strict mode for early failure and clearer diagnostics.
Apply:
-#!/bin/bash +#!/bin/bash +set -euo pipefail
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
docs/graph.svgis excluded by!**/*.svg
📒 Files selected for processing (5)
modules/ip/datapath/ip_error.c(3 hunks)modules/ip/datapath/ip_fragment.c(1 hunks)modules/ip/datapath/ip_output.c(3 hunks)modules/ip/datapath/meson.build(1 hunks)smoke/ip_fragment_test.sh(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.sh
⚙️ CodeRabbit configuration file
**/*.sh: - Don't bother about unquoted shell variables.
Files:
smoke/ip_fragment_test.sh
**/*.{c,h}
⚙️ CodeRabbit configuration file
**/*.{c,h}: -gr_vec_*()functions cannot fail. No need to check their return value.
gr_vec_free(x)always setsx = NULL. There is no risk of double free.ec_node_*()functions consume theirec_nodearguments. No leaks on error.rte_node->ctxis an uint8_t array of size 16, not a pointer.- Don't suggest to replace
assert()with graceful error checking.- We compile with
-std=gnu2x. Unnamed parameters in function definitions are valid.
Files:
modules/ip/datapath/ip_output.cmodules/ip/datapath/ip_error.cmodules/ip/datapath/ip_fragment.c
🧬 Code graph analysis (2)
smoke/ip_fragment_test.sh (1)
smoke/_init.sh (2)
netns_add(93-99)fail(88-91)
modules/ip/datapath/ip_fragment.c (4)
modules/infra/control/gr_iface.h (1)
iface(16-22)modules/infra/control/gr_graph.h (1)
rte_node_enqueue_x1(22-25)modules/infra/datapath/gr_mbuf.h (1)
gr_mbuf_is_traced(52-54)modules/infra/datapath/trace.c (2)
gr_mbuf_trace_add(520-547)trace_ip_format(120-129)
🪛 Shellcheck (0.11.0)
smoke/ip_fragment_test.sh
[warning] 6-6: Quote this to prevent word splitting.
(SC2046)
[warning] 8-8: run_id is referenced but not assigned.
(SC2154)
[warning] 44-44: Use semicolon or linefeed before 'do' (or quote to make it literal).
(SC1010)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: deb
- GitHub Check: rpm
- GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
- GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
- GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
- GitHub Check: build-cross-aarch64
🔇 Additional comments (3)
modules/ip/datapath/meson.build (1)
16-16: Build integration LGTM.ip_fragment.c correctly added to the module sources.
modules/ip/datapath/ip_output.c (1)
92-101: MTU/DF routing and edge wiring look right.iface set before MTU check; DF→ip_error_frag_needed, else→ip_fragment. next_nodes map matches.
Please confirm iface->mtu is the L3 MTU (excludes L2 header) to keep the length comparison consistent with IP total_length.
Also applies to: 151-163
modules/ip/datapath/ip_fragment.c (1)
96-101: The existing code already propagates iface—no change needed.Line 96's struct copy (
*ip_output_mbuf_data(frag_mbuf) = *ip_output_mbuf_data(mbuf)) copies the entire struct, which includes theifacefield as its first member (per theGR_MBUF_PRIV_DATA_TYPEmacro). The suggested addition is redundant.Likely an incorrect or invalid review comment.
9f7e71e to
4822d15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
modules/ip/datapath/ip_fragment.c (3)
42-50: Don’t drop already-fragmented packets; forward unchanged.Current code contradicts the comment and sends fragments to ERROR. Forward them to IP_OUTPUT.
- // Check if packet is already a fragment - if so, just pass it through - if ((ip->fragment_offset - & rte_cpu_to_be_16(RTE_IPV4_HDR_MF_FLAG | RTE_IPV4_HDR_OFFSET_MASK)) - != 0) { - // This is already a fragment, drop it - edge = ERROR; - rte_node_enqueue_x1(graph, node, edge, mbuf); - continue; - } + // Already a fragment? forward as-is + if (ip->fragment_offset & + rte_cpu_to_be_16(RTE_IPV4_HDR_MF_FLAG | RTE_IPV4_HDR_OFFSET_MASK)) { + rte_node_enqueue_x1(graph, node, IP_OUTPUT, mbuf); + continue; + }
63-67: Fix 8-byte alignment and division-by-zero hazard.When (mtu - ip_hdr_len) < 8, frag_size becomes 0 and num_frags divides by zero. Align explicitly and guard.
- // Calculate fragment payload size (must be multiple of 8) - frag_size = (mtu - ip_hdr_len) & (UINT16_MAX << 3); + // Calculate fragment payload size (multiple of 8, >= 8) + uint16_t max_payload = (uint16_t)(mtu - ip_hdr_len); + frag_size = RTE_ALIGN_FLOOR(max_payload, 8); + if (unlikely(frag_size < 8)) { + edge = ERROR; // MTU too small to carry even one 8B unit + goto error; + }Also ensure RTE_ALIGN_FLOOR and RTE_MIN are available:
#include <gr_trace.h> +#include <rte_common.h> #include <rte_byteorder.h>
70-112: Avoid emitting partial fragment sets; enqueue error edge on failure.If allocation/append fails mid-loop, earlier fragments are already enqueued and the original mbuf is freed, producing inconsistent sets. Build all fragments first, then enqueue; on failure, free built frags and send the original to NO_MBUF/ERROR edge.
- // Create each fragment - for (i = 0; i < num_frags; i++) { + // Build all fragments first; enqueue only if all succeed + struct rte_mbuf *frags[num_frags]; + uint16_t built = 0; + for (i = 0; i < num_frags; i++) { // Create new fragment, copying the original IPv4 header. frag_mbuf = rte_pktmbuf_copy(mbuf, mbuf->pool, 0, ip_hdr_len); if (unlikely(frag_mbuf == NULL)) { edge = NO_MBUF; - goto error; + goto build_error; } frag_ip = rte_pktmbuf_mtod(frag_mbuf, struct rte_ipv4_hdr *); offset = i * frag_size; - frag_data_len = RTE_MIN(frag_size, data_len - offset); + frag_data_len = RTE_MIN(frag_size, (uint16_t)(data_len - offset)); payload = rte_pktmbuf_append(frag_mbuf, frag_data_len); if (unlikely(payload == NULL)) { rte_pktmbuf_free(frag_mbuf); - goto error; + edge = NO_MBUF; + goto build_error; } rte_memcpy( payload, rte_pktmbuf_mtod_offset(mbuf, const void *, ip_hdr_len + offset), frag_data_len ); frag_ip->total_length = rte_cpu_to_be_16(ip_hdr_len + frag_data_len); frag_ip->packet_id = frag_id; frag_ip->fragment_offset = rte_cpu_to_be_16( (offset / 8) | ((i < num_frags - 1) ? RTE_IPV4_HDR_MF_FLAG : 0) ); frag_ip->hdr_checksum = 0; frag_ip->hdr_checksum = rte_ipv4_cksum(frag_ip); *ip_output_mbuf_data(frag_mbuf) = *ip_output_mbuf_data(mbuf); frag_mbuf->packet_type = mbuf->packet_type; if (gr_mbuf_is_traced(mbuf)) { gr_mbuf_trace_add(frag_mbuf, node, 0); } - rte_node_enqueue_x1(graph, node, IP_OUTPUT, frag_mbuf); + frags[built++] = frag_mbuf; } - rte_pktmbuf_free(mbuf); - continue; + for (uint16_t k = 0; k < built; k++) + rte_node_enqueue_x1(graph, node, IP_OUTPUT, frags[k]); + rte_pktmbuf_free(muf); + continue; -error: - rte_pktmbuf_free(mbuf); +build_error: + // Free any fragments built so far and send original to error edge + for (uint16_t k = 0; k < built; k++) + rte_pktmbuf_free(frags[k]); +error: + rte_node_enqueue_x1(graph, node, edge, mbuf);
🧹 Nitpick comments (4)
smoke/ip_fragment_test.sh (3)
1-7: Make the test fail-fast and more reproducible.Add strict modes to catch setup errors early.
#!/bin/bash +# Exit on error, unset var, and failed pipelines +set -euo pipefail
36-45: Assert error path explicitly for DF case.Ping may still succeed unexpectedly; guard with return code check for clarity.
-ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M do -n 10.1.0.2 && fail "ping with DF flag should have failed" +if ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M do -w 2 -n 10.1.0.2; then + fail "ping with DF flag should have failed" +fi
47-53: Minor: add timeout to avoid slow environments flaking.Cap per-ping wait to keep CI time bounded.
-ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M dont -n 10.1.0.2 +ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M dont -w 3 -n 10.1.0.2modules/ip/datapath/ip_fragment.c (1)
127-131: Map NO_MBUF to a dedicated drop if available.If an “error_no_mbuf” drop node exists, prefer it over “error_no_headroom” for clearer telemetry; otherwise keep as-is.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
docs/graph.svgis excluded by!**/*.svg
📒 Files selected for processing (5)
modules/ip/datapath/ip_error.c(3 hunks)modules/ip/datapath/ip_fragment.c(1 hunks)modules/ip/datapath/ip_output.c(3 hunks)modules/ip/datapath/meson.build(1 hunks)smoke/ip_fragment_test.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- modules/ip/datapath/ip_output.c
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}
⚙️ CodeRabbit configuration file
**/*.{c,h}: -gr_vec_*()functions cannot fail. No need to check their return value.
gr_vec_free(x)always setsx = NULL. There is no risk of double free.ec_node_*()functions consume theirec_nodearguments. No leaks on error.rte_node->ctxis an uint8_t array of size 16, not a pointer.- Don't suggest to replace
assert()with graceful error checking.- We compile with
-std=gnu2x. Unnamed parameters in function definitions are valid.
Files:
modules/ip/datapath/ip_error.cmodules/ip/datapath/ip_fragment.c
**/*.sh
⚙️ CodeRabbit configuration file
**/*.sh: - Don't bother about unquoted shell variables.
Files:
smoke/ip_fragment_test.sh
🧬 Code graph analysis (2)
modules/ip/datapath/ip_fragment.c (4)
modules/infra/control/gr_iface.h (1)
iface(16-22)modules/infra/control/gr_graph.h (1)
rte_node_enqueue_x1(22-25)modules/infra/datapath/gr_mbuf.h (1)
gr_mbuf_is_traced(52-54)modules/infra/datapath/trace.c (2)
gr_mbuf_trace_add(520-547)trace_ip_format(120-129)
smoke/ip_fragment_test.sh (1)
smoke/_init.sh (2)
netns_add(93-99)fail(88-91)
🪛 Shellcheck (0.11.0)
smoke/ip_fragment_test.sh
[warning] 6-6: Quote this to prevent word splitting.
(SC2046)
[warning] 8-8: run_id is referenced but not assigned.
(SC2154)
[warning] 44-44: Use semicolon or linefeed before 'do' (or quote to make it literal).
(SC1010)
🔇 Additional comments (1)
modules/ip/datapath/meson.build (1)
16-16: Build integration looks good.ip_fragment.c correctly added to the datapath sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @aharivel, thanks for the patch! I have a few remarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a git trailer in commit e00839b
ip: implement output packet fragmentation
...
Closes: https://github.com/DPDK/grout/issues/336
Signed-off-by: ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (6)
smoke/ip_fragment_test.sh (2)
14-15: Align IPs with the IPv4 smoke suite.Reuse the same address blocks as other IPv4 tests for consistency and less cognitive load. Please align to the suite’s convention.
12-13: Good: MTU set at port creation.Setting MTU in the grcli interface add matches prior feedback.
modules/ip/datapath/ip_error.c (1)
82-87: Populate Next‑Hop MTU in ICMP Frag Needed (RFC 1191).Set MTU in the low 16 bits when code == UNREACH_FRAG to make PMTUD work.
icmp->icmp_cksum = 0; icmp->icmp_ident = 0; - icmp->icmp_seq_nb = 0; + icmp->icmp_seq_nb = (ctx->icmp_code == RTE_ICMP_CODE_UNREACH_FRAG) + ? rte_cpu_to_be_16(ip_output_mbuf_data(mbuf)->iface->mtu) + : 0;modules/ip/datapath/ip_fragment.c (3)
51-57: Don’t drop already-fragmented packets; forward unchanged.Fragmenter should pass existing fragments through.
- // Check if packet is already a fragment - if so, just pass it through - if ((ip->fragment_offset - & RTE_BE16(RTE_IPV4_HDR_MF_FLAG | RTE_IPV4_HDR_OFFSET_MASK))) { - // This is already a fragment, drop it - edge = ALREADY_FRAGMENTED; - goto drop; - } + // Already a fragment? forward as-is + if (ip->fragment_offset & RTE_BE16(RTE_IPV4_HDR_MF_FLAG | RTE_IPV4_HDR_OFFSET_MASK)) { + edge = IP_OUTPUT; + goto drop; + }
59-61: Read iface via ip_output_mbuf_data(...) to match producers/consumers.Keeps metadata API coherent across nodes.
- iface = mbuf_data(mbuf)->iface; + iface = ip_output_mbuf_data(mbuf)->iface;
76-121: Avoid emitting partial fragment sets; route errors via edges.Currently enqueues fragments as they’re built; on later failure, some frags are already sent, original mbuf is freed, and edge may be unset. Build all frags first, then enqueue; on failure, free built frags and send original to NO_MBUF/ERROR.
- // Create the next fragments - for (i = 1; i < num_frags; i++) { + // Build the next fragments first + struct rte_mbuf *frags[num_frags > 1 ? (num_frags - 1) : 1]; + uint16_t built = 0; + for (i = 1; i < num_frags; i++) { // Create new fragment, copying the original IPv4 header. frag_mbuf = rte_pktmbuf_copy(mbuf, mbuf->pool, 0, ip_hdr_len); if (unlikely(frag_mbuf == NULL)) { - edge = NO_MBUF; - goto error; + edge = NO_MBUF; + goto build_error; } frag_ip = rte_pktmbuf_mtod(frag_mbuf, struct rte_ipv4_hdr *); offset = i * frag_size; frag_data_len = RTE_MIN(frag_size, data_len - offset); payload = rte_pktmbuf_append(frag_mbuf, frag_data_len); if (unlikely(payload == NULL)) { rte_pktmbuf_free(frag_mbuf); - goto error; + edge = NO_MBUF; + goto build_error; } memcpy( payload, rte_pktmbuf_mtod_offset(mbuf, const void *, ip_hdr_len + offset), frag_data_len ); frag_ip->total_length = rte_cpu_to_be_16(ip_hdr_len + frag_data_len); frag_ip->fragment_offset = rte_cpu_to_be_16( (offset / 8) | ((i < num_frags - 1) ? RTE_IPV4_HDR_MF_FLAG : 0) ); frag_ip->hdr_checksum = 0; frag_ip->hdr_checksum = rte_ipv4_cksum(frag_ip); *ip_output_mbuf_data(frag_mbuf) = *ip_output_mbuf_data(mbuf); frag_mbuf->packet_type = mbuf->packet_type; if (gr_mbuf_is_traced(mbuf)) { struct ip_fragment_trace_data *t; t = gr_mbuf_trace_add(frag_mbuf, node, sizeof(*t)); t->packet_id = rte_be_to_cpu_16(frag_ip->packet_id); t->frag_num = i; t->offset = offset; t->more_frags = (i < num_frags - 1) ? 1 : 0; } - rte_node_enqueue_x1(graph, node, IP_OUTPUT, frag_mbuf); + frags[built++] = frag_mbuf; } // Use original mbuf for first fragment ip->total_length = rte_cpu_to_be_16(ip_hdr_len + frag_size); ip->fragment_offset = RTE_BE16(RTE_IPV4_HDR_MF_FLAG); ip->hdr_checksum = 0; ip->hdr_checksum = rte_ipv4_cksum(ip); // Trim excess payload if (unlikely(rte_pktmbuf_trim(mbuf, data_len - frag_size) < 0)) { - edge = ERROR; - goto error; + edge = ERROR; + goto build_error; } if (gr_mbuf_is_traced(mbuf)) { struct ip_fragment_trace_data *t; t = gr_mbuf_trace_add(mbuf, node, sizeof(*t)); t->packet_id = rte_be_to_cpu_16(ip->packet_id); t->frag_num = 0; t->offset = 0; t->more_frags = 1; } - edge = IP_OUTPUT; - -drop: - rte_node_enqueue_x1(graph, node, edge, mbuf); - continue; - -error: - rte_pktmbuf_free(mbuf); + // Enqueue original first, then the rest +drop: + if (edge == IP_OUTPUT) + rte_node_enqueue_x1(graph, node, IP_OUTPUT, mbuf); + else + rte_node_enqueue_x1(graph, node, edge, mbuf); + for (uint16_t k = 0; edge == IP_OUTPUT && k < built; k++) + rte_node_enqueue_x1(graph, node, IP_OUTPUT, frags[k]); + continue; + +build_error: + // Free any fragments built so far and send original to error edge + for (uint16_t k = 0; k < built; k++) + rte_pktmbuf_free(frags[k]); + rte_node_enqueue_x1(graph, node, edge, mbuf); continue;Also applies to: 145-151
🧹 Nitpick comments (2)
smoke/ip_fragment_test.sh (1)
6-6: Minor shellcheck cleanups (optional).Purely to quiet shellcheck; behavior unchanged.
-. $(dirname $0)/_init.sh +. "$(dirname "$0")/_init.sh" @@ -ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M do -n 172.16.1.2 && fail "ping with DF flag should have failed" +ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M 'do' -n 172.16.1.2 && fail "ping with DF flag should have failed" @@ -ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M dont -n 172.16.1.2 +ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M 'dont' -n 172.16.1.2Also applies to: 39-39, 47-47
modules/ip/datapath/ip_fragment.c (1)
12-15: Include rte_common.h for RTE_ALIGN_FLOOR.Avoid relying on transitive includes.
#include <gr_trace.h> -#include <rte_byteorder.h> +#include <rte_common.h> +#include <rte_byteorder.h>
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
docs/graph.svgis excluded by!**/*.svg
📒 Files selected for processing (5)
modules/ip/datapath/ip_error.c(6 hunks)modules/ip/datapath/ip_fragment.c(1 hunks)modules/ip/datapath/ip_output.c(3 hunks)modules/ip/datapath/meson.build(1 hunks)smoke/ip_fragment_test.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- modules/ip/datapath/meson.build
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}
⚙️ CodeRabbit configuration file
**/*.{c,h}: -gr_vec_*()functions cannot fail. No need to check their return value.
gr_vec_free(x)always setsx = NULL. There is no risk of double free.ec_node_*()functions consume theirec_nodearguments. No leaks on error.rte_node->ctxis an uint8_t array of size 16, not a pointer.- Don't suggest to replace
assert()with graceful error checking.- We compile with
-std=gnu2x. Unnamed parameters in function definitions are valid.
Files:
modules/ip/datapath/ip_error.cmodules/ip/datapath/ip_output.cmodules/ip/datapath/ip_fragment.c
**/*.sh
⚙️ CodeRabbit configuration file
**/*.sh: - Don't bother about unquoted shell variables.
Files:
smoke/ip_fragment_test.sh
🧬 Code graph analysis (3)
modules/ip/datapath/ip_error.c (1)
modules/ip6/datapath/ip6_error.c (1)
no_route_init(119-122)
modules/ip/datapath/ip_fragment.c (4)
modules/infra/control/gr_iface.h (1)
iface(16-22)modules/infra/datapath/gr_mbuf.h (1)
gr_mbuf_is_traced(52-54)modules/infra/datapath/trace.c (1)
gr_mbuf_trace_add(520-547)modules/infra/control/gr_graph.h (1)
rte_node_enqueue_x1(22-25)
smoke/ip_fragment_test.sh (1)
smoke/_init.sh (2)
netns_add(93-99)fail(88-91)
🪛 Shellcheck (0.11.0)
smoke/ip_fragment_test.sh
[warning] 6-6: Quote this to prevent word splitting.
(SC2046)
[warning] 8-8: run_id is referenced but not assigned.
(SC2154)
[warning] 39-39: Use semicolon or linefeed before 'do' (or quote to make it literal).
(SC1010)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: rpm
- GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
- GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
- GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
🔇 Additional comments (1)
modules/ip/datapath/ip_output.c (1)
151-163: FRAG_NEEDED edge wiring looks right.DF→FRAG_NEEDED to ip_error_frag_needed; non‑DF→TOO_BIG to ip_fragment is coherent with the flow.
|
Don't mind the deb/rpm failures. This is related to a pending FRR pull request that has changed yesterday. I have sent a fix for it in #355 |
63ca189 to
55f7522
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
smoke/ip_fragment_test.sh (3)
6-6: Quote the subshell output to prevent word splitting.Line 6 should quote the
$(dirname $0)output.-. $(dirname $0)/_init.sh +. "$(dirname $0)/_init.sh"
31-39: Trim verbose comments to focus on test intent.The explanatory comments on lines 36–38 repeat packet size calculations and MTU values that are already evident from the code. Simplify to focus on test intent and expected result.
# Test 2: Large packet with DF flag set (should get ICMP fragmentation needed error) -# Send 1260-byte packet with DF=1 (Don't Fragment) -# Packet size: 1260 + 8 (ICMP) + 20 (IP) = 1288 bytes -# Fits in p0 MTU (1500) but exceeds p1 MTU (1280) -# Expected: ICMP Type 3 Code 4 (Fragmentation Needed and DF Set) ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M do -n 172.16.1.2 && fail "ping with DF flag should have failed"
41-45: Trim verbose comments to match Test 2 simplification.Apply the same simplification as Test 2 to reduce repetition.
# Test 3: Large packet without DF flag (should fragment and succeed) -# Send 1260-byte packet with DF=0 (fragmentation allowed) -# Packet size: 1260 + 8 (ICMP) + 20 (IP) = 1288 bytes -# Fits in p0 MTU (1500) but needs fragmentation for p1 MTU (1280) -# Expected: Packet is fragmented into 2 fragments (1276 + 32 bytes) and ping succeeds ip netns exec $p0 ip route flush cache
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
docs/graph.svgis excluded by!**/*.svg
📒 Files selected for processing (4)
modules/ip/datapath/ip_fragment.c(1 hunks)modules/ip/datapath/ip_output.c(3 hunks)modules/ip/datapath/meson.build(1 hunks)smoke/ip_fragment_test.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- modules/ip/datapath/meson.build
- modules/ip/datapath/ip_output.c
- modules/ip/datapath/ip_fragment.c
🧰 Additional context used
📓 Path-based instructions (1)
**/*.sh
⚙️ CodeRabbit configuration file
**/*.sh: - Don't bother about unquoted shell variables.
Files:
smoke/ip_fragment_test.sh
🧬 Code graph analysis (1)
smoke/ip_fragment_test.sh (1)
smoke/_init.sh (2)
netns_add(93-99)fail(88-91)
🪛 Shellcheck (0.11.0)
smoke/ip_fragment_test.sh
[warning] 6-6: Quote this to prevent word splitting.
(SC2046)
[warning] 8-8: run_id is referenced but not assigned.
(SC2154)
[warning] 39-39: Use semicolon or linefeed before 'do' (or quote to make it literal).
(SC1010)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
- GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
- GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
- GitHub Check: deb
- GitHub Check: rpm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
modules/ip/datapath/ip_output.c (1)
92-92: Past review comment still applies.Previous review suggested using ip_output_mbuf_data(...) for consistency with downstream readers. The concern remains valid.
🧹 Nitpick comments (1)
smoke/ip_fragment_test.sh (1)
6-6: Quote the command substitution for robustness.Shellcheck flags unquoted command substitution, though it's unlikely to cause issues in practice.
-. $(dirname $0)/_init.sh +. "$(dirname "$0")/_init.sh"
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
docs/graph.svgis excluded by!**/*.svg
📒 Files selected for processing (4)
modules/ip/datapath/ip_fragment.c(1 hunks)modules/ip/datapath/ip_output.c(3 hunks)modules/ip/datapath/meson.build(1 hunks)smoke/ip_fragment_test.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- modules/ip/datapath/ip_fragment.c
- modules/ip/datapath/meson.build
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}
⚙️ CodeRabbit configuration file
**/*.{c,h}: -gr_vec_*()functions cannot fail. No need to check their return value.
gr_vec_free(x)always setsx = NULL. There is no risk of double free.ec_node_*()functions consume theirec_nodearguments. No leaks on error.rte_node->ctxis an uint8_t array of size 16, not a pointer.- Don't suggest to replace
assert()with graceful error checking.- We compile with
-std=gnu2x. Unnamed parameters in function definitions are valid.
Files:
modules/ip/datapath/ip_output.c
**/*.sh
⚙️ CodeRabbit configuration file
**/*.sh: - Don't bother about unquoted shell variables.
Files:
smoke/ip_fragment_test.sh
🧬 Code graph analysis (1)
smoke/ip_fragment_test.sh (1)
smoke/_init.sh (2)
netns_add(93-99)fail(88-91)
🪛 Shellcheck (0.11.0)
smoke/ip_fragment_test.sh
[warning] 6-6: Quote this to prevent word splitting.
(SC2046)
[warning] 8-8: run_id is referenced but not assigned.
(SC2154)
[warning] 39-39: Use semicolon or linefeed before 'do' (or quote to make it literal).
(SC1010)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: rpm
- GitHub Check: deb
- GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
- GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
- GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
🔇 Additional comments (8)
smoke/ip_fragment_test.sh (5)
11-15: LGTM!Interface setup correctly configures p1 with reduced MTU to trigger fragmentation, and MTU is set at port creation as suggested in previous reviews.
17-29: LGTM!Namespace setup is thorough and correctly configures routing and MTU. The PMTU cache flush ensures deterministic test behavior.
31-32: LGTM!Basic connectivity test establishes baseline functionality before testing fragmentation scenarios.
34-39: LGTM!Test correctly validates ICMP fragmentation-needed response when DF flag prevents fragmentation of oversized packets. Detailed comments clearly explain the expected behavior.
41-47: LGTM!Test validates successful fragmentation when DF flag is not set. The PMTU cache flush ensures clean state, and the packet size correctly triggers fragmentation through the new ip_fragment path.
modules/ip/datapath/ip_output.c (3)
30-30: LGTM!FRAG_NEEDED edge added correctly for routing DF-set oversized packets to the ICMP error handler.
94-101: LGTM!MTU overflow handling correctly differentiates between DF-set packets (route to ICMP error) and fragmentable packets (route to ip_fragment). The DF flag check is properly byte-swapped.
160-161: LGTM!Edge mappings correctly route oversized packets: TOO_BIG to the new fragmentation node, FRAG_NEEDED to the ICMP error handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it valuable to add a config option (globally, per interface ? ) to enable/disable ip_frag ?
|
Hi @christophefontaine , |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modules/ip6/datapath/ip6_error.c (1)
71-76: Fix wrong sizeof in TTL-exceeded path.Using sizeof(*du) allocates the wrong size for icmp6_err_ttl_exceeded.
Apply:
- te = (struct icmp6_err_ttl_exceeded *) - rte_pktmbuf_prepend(mbuf, sizeof(*du)); + te = (struct icmp6_err_ttl_exceeded *) + rte_pktmbuf_prepend(mbuf, sizeof(*te));
♻️ Duplicate comments (5)
smoke/ip_fragment_test.sh (1)
14-27: Reuse the same address scheme as other IPv4 smoke tests.Align IPs with existing tests for consistency and easier cross-test reuse.
modules/ip/datapath/ip_fragment.c (2)
51-57: Forward existing fragments; do not drop.Already-fragmented packets must be forwarded unchanged to ip_output.
Apply:
- // Check if packet is already a fragment - if so, just pass it through - if (ip->fragment_offset - & RTE_BE16(RTE_IPV4_HDR_MF_FLAG | RTE_IPV4_HDR_OFFSET_MASK)) { - // This is already a fragment, drop it - edge = ALREADY_FRAGMENTED; - goto drop; - } + // Already a fragment? forward as-is + if (ip->fragment_offset & + RTE_BE16(RTE_IPV4_HDR_MF_FLAG | RTE_IPV4_HDR_OFFSET_MASK)) { + rte_node_enqueue_x1(graph, node, IP_OUTPUT, mbuf); + continue; + }
76-95: Avoid partial fragment sets; stage builds and enqueue only on success.Current flow enqueues the first fragment, then breaks on failure, yielding orphaned sets and no error edge.
Apply staged-build refactor (minimal change):
- // Prepare and enqueue first fragment (using original mbuf) - ip->total_length = rte_cpu_to_be_16(ip_hdr_len + frag_size); - ip->fragment_offset = RTE_BE16(RTE_IPV4_HDR_MF_FLAG); - ip->hdr_checksum = 0; - ip->hdr_checksum = rte_ipv4_cksum(ip); - - if (gr_mbuf_is_traced(mbuf)) { - struct ip_fragment_trace_data *t; - t = gr_mbuf_trace_add(mbuf, node, sizeof(*t)); - t->packet_id = rte_be_to_cpu_16(ip->packet_id); - t->frag_num = 0; - t->offset = 0; - t->more_frags = 1; - } - - // Enqueue first fragment - rte_node_enqueue_x1(graph, node, IP_OUTPUT, mbuf); - - // Create and enqueue remaining fragments - for (i = 1; i < num_frags; i++) { + // Build all fragments first + struct rte_mbuf *frags[num_frags]; + uint16_t built = 0; + + // Prepare first fragment (original mbuf) + ip->total_length = rte_cpu_to_be_16(ip_hdr_len + frag_size); + ip->fragment_offset = RTE_BE16(RTE_IPV4_HDR_MF_FLAG); + ip->hdr_checksum = 0; + ip->hdr_checksum = rte_ipv4_cksum(ip); + frags[built++] = mbuf; + if (gr_mbuf_is_traced(mbuf)) { + struct ip_fragment_trace_data *t; + t = gr_mbuf_trace_add(mbuf, node, sizeof(*t)); + t->packet_id = rte_be_to_cpu_16(ip->packet_id); + t->frag_num = 0; + t->offset = 0; + t->more_frags = 1; + } + + // Create remaining fragments + for (i = 1; i < num_frags; i++) { // Create new fragment, copying the original IPv4 header. frag_mbuf = rte_pktmbuf_copy(mbuf, mbuf->pool, 0, ip_hdr_len); if (unlikely(frag_mbuf == NULL)) { - break; + edge = NO_MBUF; + goto build_error; } frag_ip = rte_pktmbuf_mtod(frag_mbuf, struct rte_ipv4_hdr *); offset = i * frag_size; frag_data_len = RTE_MIN(frag_size, data_len - offset); payload = rte_pktmbuf_append(frag_mbuf, frag_data_len); if (unlikely(payload == NULL)) { rte_pktmbuf_free(frag_mbuf); - break; + edge = NO_MBUF; + goto build_error; } @@ *ip_output_mbuf_data(frag_mbuf) = *ip_output_mbuf_data(mbuf); frag_mbuf->packet_type = mbuf->packet_type; if (gr_mbuf_is_traced(mbuf)) { struct ip_fragment_trace_data *t; t = gr_mbuf_trace_add(frag_mbuf, node, sizeof(*t)); t->packet_id = rte_be_to_cpu_16(frag_ip->packet_id); t->frag_num = i; t->offset = offset; t->more_frags = (i < num_frags - 1) ? 1 : 0; } - rte_node_enqueue_x1(graph, node, IP_OUTPUT, frag_mbuf); + frags[built++] = frag_mbuf; } - // Trim first fragment to the right size - rte_pktmbuf_trim(mbuf, data_len - frag_size); - - continue; + // Finalize first fragment and enqueue all in order + rte_pktmbuf_trim(mbuf, data_len - frag_size); + for (uint16_t k = 0; k < built; k++) + rte_node_enqueue_x1(graph, node, IP_OUTPUT, frags[k]); + continue; +build_error: + // Free any built extra fragments (keep original mbuf for error edge) + for (uint16_t k = 1; k < built; k++) + rte_pktmbuf_free(frags[k]); + goto drop;This preserves order, avoids partial emission, and routes failures to NO_MBUF/ERROR edges.
Also applies to: 98-110, 134-141
modules/ip/datapath/ip_error.c (1)
86-86: Verify MTU is populated for FRAG_NEEDED per RFC 1191.A previous review noted that
icmp_seq_nbshould carry the next-hop MTU (viarte_cpu_to_be_16(iface->mtu)) whenicmp_code == RTE_ICMP_CODE_UNREACH_FRAG, but line 86 still sets it unconditionally to zero. The past comment was marked addressed in commits c9debf0–4d0a400; confirm whether the fix is present or was reverted.#!/bin/bash # Check if MTU is set anywhere in the ICMP processing path rg -n 'icmp_seq_nb.*mtu|mtu.*icmp_seq_nb' modules/ip/datapath/modules/ip/datapath/ip_output.c (1)
92-92: Use ip_output_mbuf_data for consistency.Downstream code expects
ip_output_mbuf_data(mbuf)->iface. Use the same accessor here.Apply this diff:
- mbuf_data(mbuf)->iface = iface; + ip_output_mbuf_data(mbuf)->iface = iface;
🧹 Nitpick comments (4)
smoke/ip_fragment_test.sh (1)
31-47: Assert the expected ICMP error explicitly.Consider grepping ping stderr for “Frag needed” to make the failure reason explicit, not just exit-code based.
Example:
ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M do -n 172.16.1.2 2>err && fail "expected PMTU error" grep -q -E "Frag(mentation)? needed" err || fail "missing ICMP frag-needed indication"modules/ip6/datapath/ip6_error.c (1)
86-89: Optional: drop redundant ip NULL check.ip is set before prepend; it won’t be NULL.
modules/ip/datapath/ip_fragment.c (2)
59-61: Use ip_output_mbuf_data for consistency.Stay consistent with the ip_output path metadata.
Apply:
- iface = mbuf_data(mbuf)->iface; + iface = ip_output_mbuf_data(mbuf)->iface;(This aligns with the copy at Line 123.)
Also applies to: 123-125
167-172: Remove unused ALREADY_FRAGMENTED drop if forwarding.If you forward existing frags, the ALREADY_FRAGMENTED edge and drop node can be dropped.
Also applies to: 181-182
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
docs/graph.svgis excluded by!**/*.svg
📒 Files selected for processing (7)
modules/infra/control/gr_graph.h(1 hunks)modules/ip/datapath/ip_error.c(6 hunks)modules/ip/datapath/ip_fragment.c(1 hunks)modules/ip/datapath/ip_output.c(3 hunks)modules/ip/datapath/meson.build(1 hunks)modules/ip6/datapath/ip6_error.c(6 hunks)smoke/ip_fragment_test.sh(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}
⚙️ CodeRabbit configuration file
**/*.{c,h}: -gr_vec_*()functions cannot fail. No need to check their return value.
gr_vec_free(x)always setsx = NULL. There is no risk of double free.ec_node_*()functions consume theirec_nodearguments. No leaks on error.rte_node->ctxis an uint8_t array of size 16, not a pointer.- Don't suggest to replace
assert()with graceful error checking.- We compile with
-std=gnu2x. Unnamed parameters in function definitions are valid.
Files:
modules/ip6/datapath/ip6_error.cmodules/ip/datapath/ip_fragment.cmodules/infra/control/gr_graph.hmodules/ip/datapath/ip_error.cmodules/ip/datapath/ip_output.c
**/*.sh
⚙️ CodeRabbit configuration file
**/*.sh: - Don't bother about unquoted shell variables.
Files:
smoke/ip_fragment_test.sh
🧬 Code graph analysis (5)
modules/ip6/datapath/ip6_error.c (1)
modules/ip/datapath/ip_error.c (1)
no_route_init(106-111)
modules/ip/datapath/ip_fragment.c (4)
modules/infra/control/gr_iface.h (1)
iface(16-22)modules/infra/datapath/gr_mbuf.h (1)
gr_mbuf_is_traced(52-54)modules/infra/datapath/trace.c (1)
gr_mbuf_trace_add(520-547)modules/infra/control/gr_graph.h (1)
rte_node_enqueue_x1(22-25)
modules/ip/datapath/ip_error.c (1)
modules/ip6/datapath/ip6_error.c (1)
no_route_init(124-129)
modules/ip/datapath/ip_output.c (1)
modules/infra/control/gr_iface.h (1)
iface(16-22)
smoke/ip_fragment_test.sh (1)
smoke/_init.sh (2)
netns_add(93-99)fail(88-91)
🪛 Shellcheck (0.11.0)
smoke/ip_fragment_test.sh
[warning] 6-6: Quote this to prevent word splitting.
(SC2046)
[warning] 8-8: run_id is referenced but not assigned.
(SC2154)
[warning] 39-39: Use semicolon or linefeed before 'do' (or quote to make it literal).
(SC1010)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
- GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
- GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
- GitHub Check: rpm
- GitHub Check: deb
🔇 Additional comments (6)
modules/ip/datapath/meson.build (1)
16-16: Build wiring LGTM.ip_fragment.c is correctly added to the datapath build.
smoke/ip_fragment_test.sh (1)
39-39: SC1010 warning is a false positive here.“-M do” is a valid ping argument; no action needed.
If desired, silence ShellCheck by quoting "do" (cosmetic only).
modules/infra/control/gr_graph.h (1)
30-40: Macro looks good; clear and safe.The size check and typed accessor are helpful. No issues.
modules/ip/datapath/ip_output.c (1)
94-100: LGTM: DF flag handling is correct.The router correctly honors the DF flag: when set, it routes to FRAG_NEEDED (ICMP); otherwise it fragments via TOO_BIG.
modules/ip/datapath/ip_error.c (2)
16-19: LGTM: Clean context type definition.The
GR_NODE_CTX_TYPEmacro provides type-safe accessor with compile-time size checking.
113-118: LGTM: Consistent error node pattern.The
frag_needed_nodecorrectly mirrors the existing error node structure with appropriate ICMP type/code for "fragmentation needed."Also applies to: 144-154
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modules/ip/datapath/ip_error.c (1)
84-86: Set Next-Hop MTU in icmp_seq_nb for ICMP Fragmentation Needed.Per RFC 1191, when
icmp_codeisRTE_ICMP_CODE_UNREACH_FRAG, populateicmp_seq_nbwith the outbound interface MTU in network byte order. The iface is available at line 53.Apply this diff:
icmp->icmp_type = ctx->icmp_type; icmp->icmp_code = ctx->icmp_code; icmp->icmp_cksum = 0; icmp->icmp_ident = 0; - icmp->icmp_seq_nb = 0; + if (ctx->icmp_code == RTE_ICMP_CODE_UNREACH_FRAG) + icmp->icmp_seq_nb = rte_cpu_to_be_16(iface->mtu); + else + icmp->icmp_seq_nb = 0;
🧹 Nitpick comments (2)
modules/ip/datapath/ip_output.c (1)
92-92: Use ip_output_mbuf_data() for consistency.Line 74 reads from
ip_output_mbuf_data(mbuf)->nh. For consistency within this datapath context, use the same accessor when setting iface.Apply this diff:
- mbuf_data(mbuf)->iface = iface; + ip_output_mbuf_data(mbuf)->iface = iface;modules/ip/datapath/ip_fragment.c (1)
59-59: Use ip_output_mbuf_data() for consistency.Line 123 uses
ip_output_mbuf_data(frag_mbuf)for bulk copy. Use the same accessor here to match the ip_output datapath context.Apply this diff:
- iface = mbuf_data(mbuf)->iface; + iface = ip_output_mbuf_data(mbuf)->iface;
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
docs/graph.svgis excluded by!**/*.svg
📒 Files selected for processing (5)
modules/ip/datapath/ip_error.c(3 hunks)modules/ip/datapath/ip_fragment.c(1 hunks)modules/ip/datapath/ip_output.c(3 hunks)modules/ip/datapath/meson.build(1 hunks)smoke/ip_fragment_test.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- modules/ip/datapath/meson.build
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}
⚙️ CodeRabbit configuration file
**/*.{c,h}: -gr_vec_*()functions cannot fail. No need to check their return value.
gr_vec_free(x)always setsx = NULL. There is no risk of double free.ec_node_*()functions consume theirec_nodearguments. No leaks on error.rte_node->ctxis an uint8_t array of size 16, not a pointer.- Don't suggest to replace
assert()with graceful error checking.- We compile with
-std=gnu2x. Unnamed parameters in function definitions are valid.
Files:
modules/ip/datapath/ip_fragment.cmodules/ip/datapath/ip_output.cmodules/ip/datapath/ip_error.c
**/*.sh
⚙️ CodeRabbit configuration file
**/*.sh: - Don't bother about unquoted shell variables.
Files:
smoke/ip_fragment_test.sh
🧬 Code graph analysis (2)
modules/ip/datapath/ip_fragment.c (4)
modules/infra/control/gr_iface.h (1)
iface(16-22)modules/infra/datapath/gr_mbuf.h (1)
gr_mbuf_is_traced(52-54)modules/infra/datapath/trace.c (1)
gr_mbuf_trace_add(522-549)modules/infra/control/gr_graph.h (1)
rte_node_enqueue_x1(22-25)
smoke/ip_fragment_test.sh (1)
smoke/_init.sh (2)
netns_add(93-99)fail(88-91)
🪛 Shellcheck (0.11.0)
smoke/ip_fragment_test.sh
[warning] 6-6: Quote this to prevent word splitting.
(SC2046)
[warning] 8-8: run_id is referenced but not assigned.
(SC2154)
[warning] 39-39: Use semicolon or linefeed before 'do' (or quote to make it literal).
(SC1010)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
- GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
- GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
- GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-cross-aarch64
- GitHub Check: rpm
- GitHub Check: deb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
modules/ip/datapath/ip_output.c (1)
92-92: Useip_output_mbuf_data(mbuf)->ifacefor consistency.This line uses
mbuf_data(mbuf)but the rest of the file and related code (e.g., ip_fragment.c line 123) useip_output_mbuf_data(mbuf)in the ip_output datapath context.- mbuf_data(mbuf)->iface = iface; + ip_output_mbuf_data(mbuf)->iface = iface;modules/ip/datapath/ip_fragment.c (3)
59-59: Useip_output_mbuf_data(mbuf)->ifacefor consistency.Line 123 uses
ip_output_mbuf_datato access mbuf metadata. Use the same accessor here.- iface = mbuf_data(mbuf)->iface; + iface = ip_output_mbuf_data(mbuf)->iface;
76-138: Critical: Trim mbuf before enqueueing it.Line 92 enqueues
mbuf, then line 138 trims it. Modifying an mbuf after enqueueing is a use-after-enqueue bug—the graph may have already processed or freed it.Move the trim before line 92:
// Prepare and enqueue first fragment (using original mbuf) ip->total_length = rte_cpu_to_be_16(ip_hdr_len + frag_size); ip->fragment_offset = RTE_BE16(RTE_IPV4_HDR_MF_FLAG); ip->hdr_checksum = 0; ip->hdr_checksum = rte_ipv4_cksum(ip); + + // Trim first fragment to the right size + rte_pktmbuf_trim(mbuf, data_len - frag_size); if (gr_mbuf_is_traced(mbuf)) { struct ip_fragment_trace_data *t; t = gr_mbuf_trace_add(mbuf, node, sizeof(*t)); t->packet_id = rte_be_to_cpu_16(ip->packet_id); t->frag_num = 0; t->offset = 0; t->more_frags = 1; } // Enqueue first fragment rte_node_enqueue_x1(graph, node, IP_OUTPUT, mbuf); // Create and enqueue remaining fragments for (i = 1; i < num_frags; i++) { ... } - - // Trim first fragment to the right size - rte_pktmbuf_trim(mbuf, data_len - frag_size); continue;
94-135: Critical: Avoid partial fragment sets on allocation failure.If
rte_pktmbuf_copy(line 97) orrte_pktmbuf_append(line 106) fails, the first fragment (line 92) and any previously created fragments (line 134) are already enqueued. This produces incomplete fragment sets on the network.Stage all fragments before enqueueing:
// Enqueue first fragment rte_node_enqueue_x1(graph, node, IP_OUTPUT, mbuf); - // Create and enqueue remaining fragments + // Build remaining fragments + struct rte_mbuf *frags[num_frags - 1]; + uint16_t built = 0; + for (i = 1; i < num_frags; i++) { // Create new fragment, copying the original IPv4 header. frag_mbuf = rte_pktmbuf_copy(mbuf, mbuf->pool, 0, ip_hdr_len); if (unlikely(frag_mbuf == NULL)) { + // Free partial fragments; first already enqueued + for (uint16_t k = 0; k < built; k++) + rte_pktmbuf_free(frags[k]); - break; + goto next_packet; } frag_ip = rte_pktmbuf_mtod(frag_mbuf, struct rte_ipv4_hdr *); offset = i * frag_size; frag_data_len = RTE_MIN(frag_size, data_len - offset); payload = rte_pktmbuf_append(frag_mbuf, frag_data_len); if (unlikely(payload == NULL)) { rte_pktmbuf_free(frag_mbuf); + for (uint16_t k = 0; k < built; k++) + rte_pktmbuf_free(frags[k]); - break; + goto next_packet; } memcpy(payload, rte_pktmbuf_mtod_offset(mbuf, const void *, ip_hdr_len + offset), frag_data_len); frag_ip->total_length = rte_cpu_to_be_16(ip_hdr_len + frag_data_len); frag_ip->fragment_offset = rte_cpu_to_be_16( (offset / 8) | ((i < num_frags - 1) ? RTE_IPV4_HDR_MF_FLAG : 0) ); frag_ip->hdr_checksum = 0; frag_ip->hdr_checksum = rte_ipv4_cksum(frag_ip); *ip_output_mbuf_data(frag_mbuf) = *ip_output_mbuf_data(mbuf); frag_mbuf->packet_type = mbuf->packet_type; if (gr_mbuf_is_traced(mbuf)) { struct ip_fragment_trace_data *t; t = gr_mbuf_trace_add(frag_mbuf, node, sizeof(*t)); t->packet_id = rte_be_to_cpu_16(frag_ip->packet_id); t->frag_num = i; t->offset = offset; t->more_frags = (i < num_frags - 1) ? 1 : 0; } - rte_node_enqueue_x1(graph, node, IP_OUTPUT, frag_mbuf); + frags[built++] = frag_mbuf; } + // Enqueue all remaining fragments + for (uint16_t k = 0; k < built; k++) + rte_node_enqueue_x1(graph, node, IP_OUTPUT, frags[k]); + +next_packet: continue;Note: The first fragment (enqueued at line 92) cannot be recovered on failure. Consider refactoring to stage the first fragment as well for true atomicity.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
docs/graph.svgis excluded by!**/*.svg
📒 Files selected for processing (5)
modules/ip/datapath/ip_error.c(3 hunks)modules/ip/datapath/ip_fragment.c(1 hunks)modules/ip/datapath/ip_output.c(3 hunks)modules/ip/datapath/meson.build(1 hunks)smoke/ip_fragment_test.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- modules/ip/datapath/meson.build
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}
⚙️ CodeRabbit configuration file
**/*.{c,h}: -gr_vec_*()functions cannot fail. No need to check their return value.
gr_vec_free(x)always setsx = NULL. There is no risk of double free.ec_node_*()functions consume theirec_nodearguments. No leaks on error.rte_node->ctxis an uint8_t array of size 16, not a pointer.- Don't suggest to replace
assert()with graceful error checking.- We compile with
-std=gnu2x. Unnamed parameters in function definitions are valid.
Files:
modules/ip/datapath/ip_output.cmodules/ip/datapath/ip_error.cmodules/ip/datapath/ip_fragment.c
**/*.sh
⚙️ CodeRabbit configuration file
**/*.sh: - Don't bother about unquoted shell variables.
Files:
smoke/ip_fragment_test.sh
🧬 Code graph analysis (3)
modules/ip/datapath/ip_output.c (1)
modules/infra/control/gr_iface.h (1)
iface(16-22)
modules/ip/datapath/ip_fragment.c (4)
modules/infra/control/gr_iface.h (1)
iface(16-22)modules/infra/datapath/gr_mbuf.h (1)
gr_mbuf_is_traced(52-54)modules/infra/datapath/trace.c (1)
gr_mbuf_trace_add(522-549)modules/infra/control/gr_graph.h (1)
rte_node_enqueue_x1(22-25)
smoke/ip_fragment_test.sh (1)
smoke/_init.sh (2)
netns_add(93-99)fail(88-91)
🪛 Shellcheck (0.11.0)
smoke/ip_fragment_test.sh
[warning] 6-6: Quote this to prevent word splitting.
(SC2046)
[warning] 8-8: run_id is referenced but not assigned.
(SC2154)
[warning] 39-39: Use semicolon or linefeed before 'do' (or quote to make it literal).
(SC1010)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it weird to have the node ip_error_frag_needed without any link to icmp_output ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea.. I guess it's like "ip_error_ttl_exceeded" that doesn't have any link with "icmp_output" ?? Maybe a bug with the graph generation ?
When a packet exceeds the output interface MTU and has the DF (Don't Fragment) flag set, the router must drop the packet and send an ICMP "Destination Unreachable" error (Type 3, Code 4: Fragmentation Needed and DF Set) back to the source. Previously, grout would silently drop packets that exceeded the MTU regardless of the DF flag setting. This prevented proper Path MTU Discovery (RFC 1191) from functioning. Add a new error node ip_error_frag_needed that generates ICMP Type 3 Code 4 errors using the existing ip_error_process() infrastructure. Signed-off-by: Anthony Harivel <[email protected]> Reviewed-by: Christophe Fontaine <[email protected]>
Add an ip_fragment node that fragments IPv4 packets exceeding the outgoing interface MTU. The node is invoked from ip_output when a packet is too large and does not have the DF (Don't Fragment) flag set. Fragmentation follows RFC 791 requirements: each fragment payload size is rounded down to a multiple of 8 bytes, fragment offset field is expressed in 8-byte units, and the MF (More Fragments) flag is set on all fragments except the last one. All fragments share the same packet ID from the original packet. The ip_output node now stores the output interface in mbuf metadata before the size check, ensuring ip_fragment can access the MTU for proper fragment sizing. The previous ip_output_too_big drop node is removed as oversized packets are now handled by ip_fragment. Fragments inherit trace state from the original packet to maintain debugging capability across fragmentation boundaries. Closes: DPDK#336 Signed-off-by: Anthony Harivel <[email protected]> Reviewed-by: Christophe Fontaine <[email protected]>
Add a smoke test that verifies IPv4 packet fragmentation behavior by configuring asymmetric MTUs between two TAP interfaces (1500 and 1280 bytes) and sending packets through grout. The test validates three scenarios: normal forwarding without fragmentation, ICMP error generation when a large packet has the DF flag set, and successful fragmentation when the DF flag is not set. This ensures both the fragmentation logic and the DF flag handling work correctly. Signed-off-by: Anthony Harivel <[email protected]> Reviewed-by: Christophe Fontaine <[email protected]>
Fix issue #336
Summary by CodeRabbit
New Features
Tests