Skip to content

Conversation

@aharivel
Copy link
Collaborator

@aharivel aharivel commented Oct 20, 2025

Fix issue #336

Summary by CodeRabbit

  • New Features

    • IPv4 fragmentation: large packets are automatically split to fit outbound MTU and forwarded.
    • ICMP "fragmentation needed" handling: packets with DF set now generate proper ICMP responses and are not fragmented.
    • Output behavior improved: MTU checks now consistently route packets to fragmentation or ICMP-as-needed paths.
  • Tests

    • Added end-to-end smoke test validating MTU-triggered fragmentation, DF behavior, and expected ICMP responses.

@aharivel aharivel requested a review from rjarry October 20, 2025 10:29
@coderabbitai
Copy link

coderabbitai bot commented Oct 20, 2025

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Adds IPv4 fragmentation datapath and a DF-set error node. Introduces ip_fragment node that splits IPv4 packets exceeding MTU into 8-byte-aligned fragments, updates headers and checksums, emits per-fragment trace data, and routes fragments to IP output. Adds ip_error_frag_needed node for DF-set "fragmentation needed" cases and new output edges FRAGMENT and FRAG_NEEDED in ip_output. Registers the fragment source in meson.build and adds smoke/ip_fragment_test.sh to validate fragmentation and DF behavior.

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Add IPv4 fragmentation" directly and accurately summarizes the main change in the pull request. The changeset introduces a new IPv4 fragmentation feature, which is the primary focus across all modifications: a new fragmentation module (ip_fragment.c), complementary error handling for fragmentation-needed scenarios (ip_error.c updates), updated output routing logic (ip_output.c), build configuration updates, and a test script. The title is concise, specific, and clearly conveys the core functionality being added without vague or misleading language.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9115e7 and 813af55.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (5)
  • modules/ip/datapath/ip_error.c (3 hunks)
  • modules/ip/datapath/ip_fragment.c (1 hunks)
  • modules/ip/datapath/ip_output.c (3 hunks)
  • modules/ip/datapath/meson.build (1 hunks)
  • smoke/ip_fragment_test.sh (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
modules/ip/datapath/ip_error.c (1)

109-113: Include Next‑Hop MTU in ICMP “Fragmentation needed” (RFC 1191).

For code UNREACH_FRAG, set the 32‑bit “unused/MTU” field to the egress MTU so PMTUD works.

Apply:

 	icmp->icmp_type = icmp_type;
 	icmp->icmp_code = icmp_code;
 	icmp->icmp_cksum = 0;
-	icmp->icmp_ident = 0;
-	icmp->icmp_seq_nb = 0;
+	icmp->icmp_ident = 0;
+	icmp->icmp_seq_nb = 0;
+	if (icmp_type == RTE_ICMP_TYPE_DEST_UNREACHABLE &&
+	    icmp_code == RTE_ICMP_CODE_UNREACH_FRAG) {
+		// RFC1191: return Next-Hop MTU in the 32-bit field
+		*(uint32_t *)&icmp->icmp_ident = rte_cpu_to_be_32(iface->mtu);
+	}

node->ctx usage with single‑byte storage for type/code is correct. As per coding guidelines.

Also applies to: 80-85

smoke/ip_fragment_test.sh (2)

50-50: Correct the fragmentation split in the comment.

With MTU 1280 and 20‑byte IP header, first fragment payload aligns to 1256 (multiple of 8), not 1276; remainder 32.

Apply:

-# Expected: Packet is fragmented into 2 fragments (1276 + 32 bytes) and ping succeeds
+# Expected: Packet is fragmented into 2 fragments (1256 + 32 bytes) and ping succeeds

1-7: Optional: make the test more robust.

Enable strict mode for early failure and clearer diagnostics.

Apply:

-#!/bin/bash
+#!/bin/bash
+set -euo pipefail
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 23666dc and 1a3ef3d.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (5)
  • modules/ip/datapath/ip_error.c (3 hunks)
  • modules/ip/datapath/ip_fragment.c (1 hunks)
  • modules/ip/datapath/ip_output.c (3 hunks)
  • modules/ip/datapath/meson.build (1 hunks)
  • smoke/ip_fragment_test.sh (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.sh

⚙️ CodeRabbit configuration file

**/*.sh: - Don't bother about unquoted shell variables.

Files:

  • smoke/ip_fragment_test.sh
**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

  • gr_vec_free(x) always sets x = NULL. There is no risk of double free.
  • ec_node_*() functions consume their ec_node arguments. No leaks on error.
  • rte_node->ctx is an uint8_t array of size 16, not a pointer.
  • Don't suggest to replace assert() with graceful error checking.
  • We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

  • modules/ip/datapath/ip_output.c
  • modules/ip/datapath/ip_error.c
  • modules/ip/datapath/ip_fragment.c
🧬 Code graph analysis (2)
smoke/ip_fragment_test.sh (1)
smoke/_init.sh (2)
  • netns_add (93-99)
  • fail (88-91)
modules/ip/datapath/ip_fragment.c (4)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/trace.c (2)
  • gr_mbuf_trace_add (520-547)
  • trace_ip_format (120-129)
🪛 Shellcheck (0.11.0)
smoke/ip_fragment_test.sh

[warning] 6-6: Quote this to prevent word splitting.

(SC2046)


[warning] 8-8: run_id is referenced but not assigned.

(SC2154)


[warning] 44-44: Use semicolon or linefeed before 'do' (or quote to make it literal).

(SC1010)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: deb
  • GitHub Check: rpm
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-cross-aarch64
🔇 Additional comments (3)
modules/ip/datapath/meson.build (1)

16-16: Build integration LGTM.

ip_fragment.c correctly added to the module sources.

modules/ip/datapath/ip_output.c (1)

92-101: MTU/DF routing and edge wiring look right.

iface set before MTU check; DF→ip_error_frag_needed, else→ip_fragment. next_nodes map matches.

Please confirm iface->mtu is the L3 MTU (excludes L2 header) to keep the length comparison consistent with IP total_length.

Also applies to: 151-163

modules/ip/datapath/ip_fragment.c (1)

96-101: The existing code already propagates iface—no change needed.

Line 96's struct copy (*ip_output_mbuf_data(frag_mbuf) = *ip_output_mbuf_data(mbuf)) copies the entire struct, which includes the iface field as its first member (per the GR_MBUF_PRIV_DATA_TYPE macro). The suggested addition is redundant.

Likely an incorrect or invalid review comment.

@aharivel aharivel force-pushed the ip_frag branch 2 times, most recently from 9f7e71e to 4822d15 Compare October 20, 2025 11:05
@aharivel aharivel marked this pull request as draft October 20, 2025 12:16
@aharivel aharivel marked this pull request as ready for review October 20, 2025 14:35
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
modules/ip/datapath/ip_fragment.c (3)

42-50: Don’t drop already-fragmented packets; forward unchanged.

Current code contradicts the comment and sends fragments to ERROR. Forward them to IP_OUTPUT.

-		// Check if packet is already a fragment - if so, just pass it through
-		if ((ip->fragment_offset
-		     & rte_cpu_to_be_16(RTE_IPV4_HDR_MF_FLAG | RTE_IPV4_HDR_OFFSET_MASK))
-		    != 0) {
-			// This is already a fragment, drop it
-			edge = ERROR;
-			rte_node_enqueue_x1(graph, node, edge, mbuf);
-			continue;
-		}
+		// Already a fragment? forward as-is
+		if (ip->fragment_offset &
+		    rte_cpu_to_be_16(RTE_IPV4_HDR_MF_FLAG | RTE_IPV4_HDR_OFFSET_MASK)) {
+			rte_node_enqueue_x1(graph, node, IP_OUTPUT, mbuf);
+			continue;
+		}

63-67: Fix 8-byte alignment and division-by-zero hazard.

When (mtu - ip_hdr_len) < 8, frag_size becomes 0 and num_frags divides by zero. Align explicitly and guard.

-		// Calculate fragment payload size (must be multiple of 8)
-		frag_size = (mtu - ip_hdr_len) & (UINT16_MAX << 3);
+		// Calculate fragment payload size (multiple of 8, >= 8)
+		uint16_t max_payload = (uint16_t)(mtu - ip_hdr_len);
+		frag_size = RTE_ALIGN_FLOOR(max_payload, 8);
+		if (unlikely(frag_size < 8)) {
+			edge = ERROR; // MTU too small to carry even one 8B unit
+			goto error;
+		}

Also ensure RTE_ALIGN_FLOOR and RTE_MIN are available:

 #include <gr_trace.h>
 
+#include <rte_common.h>
 #include <rte_byteorder.h>

70-112: Avoid emitting partial fragment sets; enqueue error edge on failure.

If allocation/append fails mid-loop, earlier fragments are already enqueued and the original mbuf is freed, producing inconsistent sets. Build all fragments first, then enqueue; on failure, free built frags and send the original to NO_MBUF/ERROR edge.

-		// Create each fragment
-		for (i = 0; i < num_frags; i++) {
+		// Build all fragments first; enqueue only if all succeed
+		struct rte_mbuf *frags[num_frags];
+		uint16_t built = 0;
+		for (i = 0; i < num_frags; i++) {
 			// Create new fragment, copying the original IPv4 header.
 			frag_mbuf = rte_pktmbuf_copy(mbuf, mbuf->pool, 0, ip_hdr_len);
 			if (unlikely(frag_mbuf == NULL)) {
 				edge = NO_MBUF;
-				goto error;
+				goto build_error;
 			}
 
 			frag_ip = rte_pktmbuf_mtod(frag_mbuf, struct rte_ipv4_hdr *);
 			offset = i * frag_size;
-			frag_data_len = RTE_MIN(frag_size, data_len - offset);
+			frag_data_len = RTE_MIN(frag_size, (uint16_t)(data_len - offset));
 
 			payload = rte_pktmbuf_append(frag_mbuf, frag_data_len);
 			if (unlikely(payload == NULL)) {
 				rte_pktmbuf_free(frag_mbuf);
-				goto error;
+				edge = NO_MBUF;
+				goto build_error;
 			}
 
 			rte_memcpy(
 				payload,
 				rte_pktmbuf_mtod_offset(mbuf, const void *, ip_hdr_len + offset),
 				frag_data_len
 			);
 
 			frag_ip->total_length = rte_cpu_to_be_16(ip_hdr_len + frag_data_len);
 			frag_ip->packet_id = frag_id;
 			frag_ip->fragment_offset = rte_cpu_to_be_16(
 				(offset / 8) | ((i < num_frags - 1) ? RTE_IPV4_HDR_MF_FLAG : 0)
 			);
 			frag_ip->hdr_checksum = 0;
 			frag_ip->hdr_checksum = rte_ipv4_cksum(frag_ip);
 
 			*ip_output_mbuf_data(frag_mbuf) = *ip_output_mbuf_data(mbuf);
 			frag_mbuf->packet_type = mbuf->packet_type;
 			if (gr_mbuf_is_traced(mbuf)) {
 				gr_mbuf_trace_add(frag_mbuf, node, 0);
 			}
 
-			rte_node_enqueue_x1(graph, node, IP_OUTPUT, frag_mbuf);
+			frags[built++] = frag_mbuf;
 		}
 
-		rte_pktmbuf_free(mbuf);
-		continue;
+		for (uint16_t k = 0; k < built; k++)
+			rte_node_enqueue_x1(graph, node, IP_OUTPUT, frags[k]);
+		rte_pktmbuf_free(muf);
+		continue;
 
-error:
-		rte_pktmbuf_free(mbuf);
+build_error:
+		// Free any fragments built so far and send original to error edge
+		for (uint16_t k = 0; k < built; k++)
+			rte_pktmbuf_free(frags[k]);
+error:
+		rte_node_enqueue_x1(graph, node, edge, mbuf);
🧹 Nitpick comments (4)
smoke/ip_fragment_test.sh (3)

1-7: Make the test fail-fast and more reproducible.

Add strict modes to catch setup errors early.

 #!/bin/bash
+# Exit on error, unset var, and failed pipelines
+set -euo pipefail

36-45: Assert error path explicitly for DF case.

Ping may still succeed unexpectedly; guard with return code check for clarity.

-ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M do -n 10.1.0.2 && fail "ping with DF flag should have failed"
+if ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M do -w 2 -n 10.1.0.2; then
+  fail "ping with DF flag should have failed"
+fi

47-53: Minor: add timeout to avoid slow environments flaking.

Cap per-ping wait to keep CI time bounded.

-ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M dont -n 10.1.0.2
+ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M dont -w 3 -n 10.1.0.2
modules/ip/datapath/ip_fragment.c (1)

127-131: Map NO_MBUF to a dedicated drop if available.

If an “error_no_mbuf” drop node exists, prefer it over “error_no_headroom” for clearer telemetry; otherwise keep as-is.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4822d15 and 96fc7ad.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (5)
  • modules/ip/datapath/ip_error.c (3 hunks)
  • modules/ip/datapath/ip_fragment.c (1 hunks)
  • modules/ip/datapath/ip_output.c (3 hunks)
  • modules/ip/datapath/meson.build (1 hunks)
  • smoke/ip_fragment_test.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • modules/ip/datapath/ip_output.c
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

  • gr_vec_free(x) always sets x = NULL. There is no risk of double free.
  • ec_node_*() functions consume their ec_node arguments. No leaks on error.
  • rte_node->ctx is an uint8_t array of size 16, not a pointer.
  • Don't suggest to replace assert() with graceful error checking.
  • We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

  • modules/ip/datapath/ip_error.c
  • modules/ip/datapath/ip_fragment.c
**/*.sh

⚙️ CodeRabbit configuration file

**/*.sh: - Don't bother about unquoted shell variables.

Files:

  • smoke/ip_fragment_test.sh
🧬 Code graph analysis (2)
modules/ip/datapath/ip_fragment.c (4)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/trace.c (2)
  • gr_mbuf_trace_add (520-547)
  • trace_ip_format (120-129)
smoke/ip_fragment_test.sh (1)
smoke/_init.sh (2)
  • netns_add (93-99)
  • fail (88-91)
🪛 Shellcheck (0.11.0)
smoke/ip_fragment_test.sh

[warning] 6-6: Quote this to prevent word splitting.

(SC2046)


[warning] 8-8: run_id is referenced but not assigned.

(SC2154)


[warning] 44-44: Use semicolon or linefeed before 'do' (or quote to make it literal).

(SC1010)

🔇 Additional comments (1)
modules/ip/datapath/meson.build (1)

16-16: Build integration looks good.

ip_fragment.c correctly added to the datapath sources.

Copy link
Collaborator

@rjarry rjarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @aharivel, thanks for the patch! I have a few remarks.

Copy link
Collaborator

@rjarry rjarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a git trailer in commit e00839b

ip: implement output packet fragmentation

...

Closes: https://github.com/DPDK/grout/issues/336
Signed-off-by: ...

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
smoke/ip_fragment_test.sh (2)

14-15: Align IPs with the IPv4 smoke suite.

Reuse the same address blocks as other IPv4 tests for consistency and less cognitive load. Please align to the suite’s convention.


12-13: Good: MTU set at port creation.

Setting MTU in the grcli interface add matches prior feedback.

modules/ip/datapath/ip_error.c (1)

82-87: Populate Next‑Hop MTU in ICMP Frag Needed (RFC 1191).

Set MTU in the low 16 bits when code == UNREACH_FRAG to make PMTUD work.

 		icmp->icmp_cksum = 0;
 		icmp->icmp_ident = 0;
-		icmp->icmp_seq_nb = 0;
+		icmp->icmp_seq_nb = (ctx->icmp_code == RTE_ICMP_CODE_UNREACH_FRAG)
+			? rte_cpu_to_be_16(ip_output_mbuf_data(mbuf)->iface->mtu)
+			: 0;
modules/ip/datapath/ip_fragment.c (3)

51-57: Don’t drop already-fragmented packets; forward unchanged.

Fragmenter should pass existing fragments through.

-		// Check if packet is already a fragment - if so, just pass it through
-		if ((ip->fragment_offset
-		     & RTE_BE16(RTE_IPV4_HDR_MF_FLAG | RTE_IPV4_HDR_OFFSET_MASK))) {
-			// This is already a fragment, drop it
-			edge = ALREADY_FRAGMENTED;
-			goto drop;
-		}
+		// Already a fragment? forward as-is
+		if (ip->fragment_offset & RTE_BE16(RTE_IPV4_HDR_MF_FLAG | RTE_IPV4_HDR_OFFSET_MASK)) {
+			edge = IP_OUTPUT;
+			goto drop;
+		}

59-61: Read iface via ip_output_mbuf_data(...) to match producers/consumers.

Keeps metadata API coherent across nodes.

-		iface = mbuf_data(mbuf)->iface;
+		iface = ip_output_mbuf_data(mbuf)->iface;

76-121: Avoid emitting partial fragment sets; route errors via edges.

Currently enqueues fragments as they’re built; on later failure, some frags are already sent, original mbuf is freed, and edge may be unset. Build all frags first, then enqueue; on failure, free built frags and send original to NO_MBUF/ERROR.

-		// Create the next fragments
-		for (i = 1; i < num_frags; i++) {
+		// Build the next fragments first
+		struct rte_mbuf *frags[num_frags > 1 ? (num_frags - 1) : 1];
+		uint16_t built = 0;
+		for (i = 1; i < num_frags; i++) {
 			// Create new fragment, copying the original IPv4 header.
 			frag_mbuf = rte_pktmbuf_copy(mbuf, mbuf->pool, 0, ip_hdr_len);
 			if (unlikely(frag_mbuf == NULL)) {
-				edge = NO_MBUF;
-				goto error;
+				edge = NO_MBUF;
+				goto build_error;
 			}
 
 			frag_ip = rte_pktmbuf_mtod(frag_mbuf, struct rte_ipv4_hdr *);
 			offset = i * frag_size;
 			frag_data_len = RTE_MIN(frag_size, data_len - offset);
 
 			payload = rte_pktmbuf_append(frag_mbuf, frag_data_len);
 			if (unlikely(payload == NULL)) {
 				rte_pktmbuf_free(frag_mbuf);
-				goto error;
+				edge = NO_MBUF;
+				goto build_error;
 			}
 
 			memcpy(
 				payload,
 				rte_pktmbuf_mtod_offset(mbuf, const void *, ip_hdr_len + offset),
 				frag_data_len
 			);
 
 			frag_ip->total_length = rte_cpu_to_be_16(ip_hdr_len + frag_data_len);
 			frag_ip->fragment_offset = rte_cpu_to_be_16(
 				(offset / 8) | ((i < num_frags - 1) ? RTE_IPV4_HDR_MF_FLAG : 0)
 			);
 			frag_ip->hdr_checksum = 0;
 			frag_ip->hdr_checksum = rte_ipv4_cksum(frag_ip);
 
 			*ip_output_mbuf_data(frag_mbuf) = *ip_output_mbuf_data(mbuf);
 			frag_mbuf->packet_type = mbuf->packet_type;
 			if (gr_mbuf_is_traced(mbuf)) {
 				struct ip_fragment_trace_data *t;
 				t = gr_mbuf_trace_add(frag_mbuf, node, sizeof(*t));
 				t->packet_id = rte_be_to_cpu_16(frag_ip->packet_id);
 				t->frag_num = i;
 				t->offset = offset;
 				t->more_frags = (i < num_frags - 1) ? 1 : 0;
 			}
 
-			rte_node_enqueue_x1(graph, node, IP_OUTPUT, frag_mbuf);
+			frags[built++] = frag_mbuf;
 		}
 
 		// Use original mbuf for first fragment
 		ip->total_length = rte_cpu_to_be_16(ip_hdr_len + frag_size);
 		ip->fragment_offset = RTE_BE16(RTE_IPV4_HDR_MF_FLAG);
 		ip->hdr_checksum = 0;
 		ip->hdr_checksum = rte_ipv4_cksum(ip);
 
 		// Trim excess payload
 		if (unlikely(rte_pktmbuf_trim(mbuf, data_len - frag_size) < 0)) {
-			edge = ERROR;
-			goto error;
+			edge = ERROR;
+			goto build_error;
 		}
 
 		if (gr_mbuf_is_traced(mbuf)) {
 			struct ip_fragment_trace_data *t;
 			t = gr_mbuf_trace_add(mbuf, node, sizeof(*t));
 			t->packet_id = rte_be_to_cpu_16(ip->packet_id);
 			t->frag_num = 0;
 			t->offset = 0;
 			t->more_frags = 1;
 		}
 
-		edge = IP_OUTPUT;
-
-drop:
-		rte_node_enqueue_x1(graph, node, edge, mbuf);
-		continue;
-
-error:
-		rte_pktmbuf_free(mbuf);
+		// Enqueue original first, then the rest
+drop:
+		if (edge == IP_OUTPUT)
+			rte_node_enqueue_x1(graph, node, IP_OUTPUT, mbuf);
+		else
+			rte_node_enqueue_x1(graph, node, edge, mbuf);
+		for (uint16_t k = 0; edge == IP_OUTPUT && k < built; k++)
+			rte_node_enqueue_x1(graph, node, IP_OUTPUT, frags[k]);
+		continue;
+
+build_error:
+		// Free any fragments built so far and send original to error edge
+		for (uint16_t k = 0; k < built; k++)
+			rte_pktmbuf_free(frags[k]);
+		rte_node_enqueue_x1(graph, node, edge, mbuf);
 		continue;

Also applies to: 145-151

🧹 Nitpick comments (2)
smoke/ip_fragment_test.sh (1)

6-6: Minor shellcheck cleanups (optional).

Purely to quiet shellcheck; behavior unchanged.

-. $(dirname $0)/_init.sh
+. "$(dirname "$0")/_init.sh"
@@
-ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M do -n 172.16.1.2 && fail "ping with DF flag should have failed"
+ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M 'do' -n 172.16.1.2 && fail "ping with DF flag should have failed"
@@
-ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M dont -n 172.16.1.2
+ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M 'dont' -n 172.16.1.2

Also applies to: 39-39, 47-47

modules/ip/datapath/ip_fragment.c (1)

12-15: Include rte_common.h for RTE_ALIGN_FLOOR.

Avoid relying on transitive includes.

 #include <gr_trace.h>
 
-#include <rte_byteorder.h>
+#include <rte_common.h>
+#include <rte_byteorder.h>
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 96fc7ad and 4d0a400.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (5)
  • modules/ip/datapath/ip_error.c (6 hunks)
  • modules/ip/datapath/ip_fragment.c (1 hunks)
  • modules/ip/datapath/ip_output.c (3 hunks)
  • modules/ip/datapath/meson.build (1 hunks)
  • smoke/ip_fragment_test.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • modules/ip/datapath/meson.build
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

  • gr_vec_free(x) always sets x = NULL. There is no risk of double free.
  • ec_node_*() functions consume their ec_node arguments. No leaks on error.
  • rte_node->ctx is an uint8_t array of size 16, not a pointer.
  • Don't suggest to replace assert() with graceful error checking.
  • We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

  • modules/ip/datapath/ip_error.c
  • modules/ip/datapath/ip_output.c
  • modules/ip/datapath/ip_fragment.c
**/*.sh

⚙️ CodeRabbit configuration file

**/*.sh: - Don't bother about unquoted shell variables.

Files:

  • smoke/ip_fragment_test.sh
🧬 Code graph analysis (3)
modules/ip/datapath/ip_error.c (1)
modules/ip6/datapath/ip6_error.c (1)
  • no_route_init (119-122)
modules/ip/datapath/ip_fragment.c (4)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/trace.c (1)
  • gr_mbuf_trace_add (520-547)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
smoke/ip_fragment_test.sh (1)
smoke/_init.sh (2)
  • netns_add (93-99)
  • fail (88-91)
🪛 Shellcheck (0.11.0)
smoke/ip_fragment_test.sh

[warning] 6-6: Quote this to prevent word splitting.

(SC2046)


[warning] 8-8: run_id is referenced but not assigned.

(SC2154)


[warning] 39-39: Use semicolon or linefeed before 'do' (or quote to make it literal).

(SC1010)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: rpm
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
🔇 Additional comments (1)
modules/ip/datapath/ip_output.c (1)

151-163: FRAG_NEEDED edge wiring looks right.

DF→FRAG_NEEDED to ip_error_frag_needed; non‑DF→TOO_BIG to ip_fragment is coherent with the flow.

@rjarry
Copy link
Collaborator

rjarry commented Oct 22, 2025

Don't mind the deb/rpm failures. This is related to a pending FRR pull request that has changed yesterday. I have sent a fix for it in #355

@aharivel aharivel force-pushed the ip_frag branch 2 times, most recently from 63ca189 to 55f7522 Compare October 22, 2025 12:55
@coderabbitai coderabbitai bot requested a review from rjarry October 22, 2025 12:56
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
smoke/ip_fragment_test.sh (3)

6-6: Quote the subshell output to prevent word splitting.

Line 6 should quote the $(dirname $0) output.

-. $(dirname $0)/_init.sh
+. "$(dirname $0)/_init.sh"

31-39: Trim verbose comments to focus on test intent.

The explanatory comments on lines 36–38 repeat packet size calculations and MTU values that are already evident from the code. Simplify to focus on test intent and expected result.

 # Test 2: Large packet with DF flag set (should get ICMP fragmentation needed error)
-# Send 1260-byte packet with DF=1 (Don't Fragment)
-# Packet size: 1260 + 8 (ICMP) + 20 (IP) = 1288 bytes
-# Fits in p0 MTU (1500) but exceeds p1 MTU (1280)
-# Expected: ICMP Type 3 Code 4 (Fragmentation Needed and DF Set)
 ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M do -n 172.16.1.2 && fail "ping with DF flag should have failed"

41-45: Trim verbose comments to match Test 2 simplification.

Apply the same simplification as Test 2 to reduce repetition.

 # Test 3: Large packet without DF flag (should fragment and succeed)
-# Send 1260-byte packet with DF=0 (fragmentation allowed)
-# Packet size: 1260 + 8 (ICMP) + 20 (IP) = 1288 bytes
-# Fits in p0 MTU (1500) but needs fragmentation for p1 MTU (1280)
-# Expected: Packet is fragmented into 2 fragments (1276 + 32 bytes) and ping succeeds
 ip netns exec $p0 ip route flush cache
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63ca189 and 55f7522.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (4)
  • modules/ip/datapath/ip_fragment.c (1 hunks)
  • modules/ip/datapath/ip_output.c (3 hunks)
  • modules/ip/datapath/meson.build (1 hunks)
  • smoke/ip_fragment_test.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • modules/ip/datapath/meson.build
  • modules/ip/datapath/ip_output.c
  • modules/ip/datapath/ip_fragment.c
🧰 Additional context used
📓 Path-based instructions (1)
**/*.sh

⚙️ CodeRabbit configuration file

**/*.sh: - Don't bother about unquoted shell variables.

Files:

  • smoke/ip_fragment_test.sh
🧬 Code graph analysis (1)
smoke/ip_fragment_test.sh (1)
smoke/_init.sh (2)
  • netns_add (93-99)
  • fail (88-91)
🪛 Shellcheck (0.11.0)
smoke/ip_fragment_test.sh

[warning] 6-6: Quote this to prevent word splitting.

(SC2046)


[warning] 8-8: run_id is referenced but not assigned.

(SC2154)


[warning] 39-39: Use semicolon or linefeed before 'do' (or quote to make it literal).

(SC1010)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: deb
  • GitHub Check: rpm

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
modules/ip/datapath/ip_output.c (1)

92-92: Past review comment still applies.

Previous review suggested using ip_output_mbuf_data(...) for consistency with downstream readers. The concern remains valid.

🧹 Nitpick comments (1)
smoke/ip_fragment_test.sh (1)

6-6: Quote the command substitution for robustness.

Shellcheck flags unquoted command substitution, though it's unlikely to cause issues in practice.

-. $(dirname $0)/_init.sh
+. "$(dirname "$0")/_init.sh"
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55f7522 and 37e7009.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (4)
  • modules/ip/datapath/ip_fragment.c (1 hunks)
  • modules/ip/datapath/ip_output.c (3 hunks)
  • modules/ip/datapath/meson.build (1 hunks)
  • smoke/ip_fragment_test.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • modules/ip/datapath/ip_fragment.c
  • modules/ip/datapath/meson.build
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

  • gr_vec_free(x) always sets x = NULL. There is no risk of double free.
  • ec_node_*() functions consume their ec_node arguments. No leaks on error.
  • rte_node->ctx is an uint8_t array of size 16, not a pointer.
  • Don't suggest to replace assert() with graceful error checking.
  • We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

  • modules/ip/datapath/ip_output.c
**/*.sh

⚙️ CodeRabbit configuration file

**/*.sh: - Don't bother about unquoted shell variables.

Files:

  • smoke/ip_fragment_test.sh
🧬 Code graph analysis (1)
smoke/ip_fragment_test.sh (1)
smoke/_init.sh (2)
  • netns_add (93-99)
  • fail (88-91)
🪛 Shellcheck (0.11.0)
smoke/ip_fragment_test.sh

[warning] 6-6: Quote this to prevent word splitting.

(SC2046)


[warning] 8-8: run_id is referenced but not assigned.

(SC2154)


[warning] 39-39: Use semicolon or linefeed before 'do' (or quote to make it literal).

(SC1010)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: rpm
  • GitHub Check: deb
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
🔇 Additional comments (8)
smoke/ip_fragment_test.sh (5)

11-15: LGTM!

Interface setup correctly configures p1 with reduced MTU to trigger fragmentation, and MTU is set at port creation as suggested in previous reviews.


17-29: LGTM!

Namespace setup is thorough and correctly configures routing and MTU. The PMTU cache flush ensures deterministic test behavior.


31-32: LGTM!

Basic connectivity test establishes baseline functionality before testing fragmentation scenarios.


34-39: LGTM!

Test correctly validates ICMP fragmentation-needed response when DF flag prevents fragmentation of oversized packets. Detailed comments clearly explain the expected behavior.


41-47: LGTM!

Test validates successful fragmentation when DF flag is not set. The PMTU cache flush ensures clean state, and the packet size correctly triggers fragmentation through the new ip_fragment path.

modules/ip/datapath/ip_output.c (3)

30-30: LGTM!

FRAG_NEEDED edge added correctly for routing DF-set oversized packets to the ICMP error handler.


94-101: LGTM!

MTU overflow handling correctly differentiates between DF-set packets (route to ICMP error) and fragmentable packets (route to ip_fragment). The DF flag check is properly byte-swapped.


160-161: LGTM!

Edge mappings correctly route oversized packets: TOO_BIG to the new fragmentation node, FRAG_NEEDED to the ICMP error handler.

Copy link
Collaborator

@christophefontaine christophefontaine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it valuable to add a config option (globally, per interface ? ) to enable/disable ip_frag ?

@aharivel
Copy link
Collaborator Author

aharivel commented Oct 23, 2025

Hi @christophefontaine ,
IMO a router must respect the DF flag of the incoming packet. This should not be a config flag. Grout is not a end-host and must forward every packets. This implementation is actually respecting the RFC1191 for PMTU discovery and the sender should use it to send packet to the lowest MTU of the path and avoid in the end the fragmentation.
Thanks for your feedback - I'm currently implementing the static check and the ipv6 of this.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modules/ip6/datapath/ip6_error.c (1)

71-76: Fix wrong sizeof in TTL-exceeded path.

Using sizeof(*du) allocates the wrong size for icmp6_err_ttl_exceeded.

Apply:

-		te = (struct icmp6_err_ttl_exceeded *)
-			rte_pktmbuf_prepend(mbuf, sizeof(*du));
+		te = (struct icmp6_err_ttl_exceeded *)
+			rte_pktmbuf_prepend(mbuf, sizeof(*te));
♻️ Duplicate comments (5)
smoke/ip_fragment_test.sh (1)

14-27: Reuse the same address scheme as other IPv4 smoke tests.

Align IPs with existing tests for consistency and easier cross-test reuse.

modules/ip/datapath/ip_fragment.c (2)

51-57: Forward existing fragments; do not drop.

Already-fragmented packets must be forwarded unchanged to ip_output.

Apply:

-		// Check if packet is already a fragment - if so, just pass it through
-		if (ip->fragment_offset
-		    & RTE_BE16(RTE_IPV4_HDR_MF_FLAG | RTE_IPV4_HDR_OFFSET_MASK)) {
-			// This is already a fragment, drop it
-			edge = ALREADY_FRAGMENTED;
-			goto drop;
-		}
+		// Already a fragment? forward as-is
+		if (ip->fragment_offset &
+		    RTE_BE16(RTE_IPV4_HDR_MF_FLAG | RTE_IPV4_HDR_OFFSET_MASK)) {
+			rte_node_enqueue_x1(graph, node, IP_OUTPUT, mbuf);
+			continue;
+		}

76-95: Avoid partial fragment sets; stage builds and enqueue only on success.

Current flow enqueues the first fragment, then breaks on failure, yielding orphaned sets and no error edge.

Apply staged-build refactor (minimal change):

-		// Prepare and enqueue first fragment (using original mbuf)
-		ip->total_length = rte_cpu_to_be_16(ip_hdr_len + frag_size);
-		ip->fragment_offset = RTE_BE16(RTE_IPV4_HDR_MF_FLAG);
-		ip->hdr_checksum = 0;
-		ip->hdr_checksum = rte_ipv4_cksum(ip);
-
-		if (gr_mbuf_is_traced(mbuf)) {
-			struct ip_fragment_trace_data *t;
-			t = gr_mbuf_trace_add(mbuf, node, sizeof(*t));
-			t->packet_id = rte_be_to_cpu_16(ip->packet_id);
-			t->frag_num = 0;
-			t->offset = 0;
-			t->more_frags = 1;
-		}
-
-		// Enqueue first fragment
-		rte_node_enqueue_x1(graph, node, IP_OUTPUT, mbuf);
-
-		// Create and enqueue remaining fragments
-		for (i = 1; i < num_frags; i++) {
+		// Build all fragments first
+		struct rte_mbuf *frags[num_frags];
+		uint16_t built = 0;
+
+		// Prepare first fragment (original mbuf)
+		ip->total_length = rte_cpu_to_be_16(ip_hdr_len + frag_size);
+		ip->fragment_offset = RTE_BE16(RTE_IPV4_HDR_MF_FLAG);
+		ip->hdr_checksum = 0;
+		ip->hdr_checksum = rte_ipv4_cksum(ip);
+		frags[built++] = mbuf;
+		if (gr_mbuf_is_traced(mbuf)) {
+			struct ip_fragment_trace_data *t;
+			t = gr_mbuf_trace_add(mbuf, node, sizeof(*t));
+			t->packet_id = rte_be_to_cpu_16(ip->packet_id);
+			t->frag_num = 0;
+			t->offset = 0;
+			t->more_frags = 1;
+		}
+
+		// Create remaining fragments
+		for (i = 1; i < num_frags; i++) {
 			// Create new fragment, copying the original IPv4 header.
 			frag_mbuf = rte_pktmbuf_copy(mbuf, mbuf->pool, 0, ip_hdr_len);
 			if (unlikely(frag_mbuf == NULL)) {
-				break;
+				edge = NO_MBUF;
+				goto build_error;
 			}
 
 			frag_ip = rte_pktmbuf_mtod(frag_mbuf, struct rte_ipv4_hdr *);
 			offset = i * frag_size;
 			frag_data_len = RTE_MIN(frag_size, data_len - offset);
 
 			payload = rte_pktmbuf_append(frag_mbuf, frag_data_len);
 			if (unlikely(payload == NULL)) {
 				rte_pktmbuf_free(frag_mbuf);
-				break;
+				edge = NO_MBUF;
+				goto build_error;
 			}
@@
 			*ip_output_mbuf_data(frag_mbuf) = *ip_output_mbuf_data(mbuf);
 			frag_mbuf->packet_type = mbuf->packet_type;
 			if (gr_mbuf_is_traced(mbuf)) {
 				struct ip_fragment_trace_data *t;
 				t = gr_mbuf_trace_add(frag_mbuf, node, sizeof(*t));
 				t->packet_id = rte_be_to_cpu_16(frag_ip->packet_id);
 				t->frag_num = i;
 				t->offset = offset;
 				t->more_frags = (i < num_frags - 1) ? 1 : 0;
 			}
 
-			rte_node_enqueue_x1(graph, node, IP_OUTPUT, frag_mbuf);
+			frags[built++] = frag_mbuf;
 		}
 
-		// Trim first fragment to the right size
-		rte_pktmbuf_trim(mbuf, data_len - frag_size);
-
-		continue;
+		// Finalize first fragment and enqueue all in order
+		rte_pktmbuf_trim(mbuf, data_len - frag_size);
+		for (uint16_t k = 0; k < built; k++)
+			rte_node_enqueue_x1(graph, node, IP_OUTPUT, frags[k]);
+		continue;
 
+build_error:
+		// Free any built extra fragments (keep original mbuf for error edge)
+		for (uint16_t k = 1; k < built; k++)
+			rte_pktmbuf_free(frags[k]);
+		goto drop;

This preserves order, avoids partial emission, and routes failures to NO_MBUF/ERROR edges.

Also applies to: 98-110, 134-141

modules/ip/datapath/ip_error.c (1)

86-86: Verify MTU is populated for FRAG_NEEDED per RFC 1191.

A previous review noted that icmp_seq_nb should carry the next-hop MTU (via rte_cpu_to_be_16(iface->mtu)) when icmp_code == RTE_ICMP_CODE_UNREACH_FRAG, but line 86 still sets it unconditionally to zero. The past comment was marked addressed in commits c9debf0–4d0a400; confirm whether the fix is present or was reverted.

#!/bin/bash
# Check if MTU is set anywhere in the ICMP processing path
rg -n 'icmp_seq_nb.*mtu|mtu.*icmp_seq_nb' modules/ip/datapath/
modules/ip/datapath/ip_output.c (1)

92-92: Use ip_output_mbuf_data for consistency.

Downstream code expects ip_output_mbuf_data(mbuf)->iface. Use the same accessor here.

Apply this diff:

-		mbuf_data(mbuf)->iface = iface;
+		ip_output_mbuf_data(mbuf)->iface = iface;
🧹 Nitpick comments (4)
smoke/ip_fragment_test.sh (1)

31-47: Assert the expected ICMP error explicitly.

Consider grepping ping stderr for “Frag needed” to make the failure reason explicit, not just exit-code based.

Example:

ip netns exec $p0 ping -i0.01 -c3 -s 1260 -M do -n 172.16.1.2 2>err && fail "expected PMTU error"
grep -q -E "Frag(mentation)? needed" err || fail "missing ICMP frag-needed indication"
modules/ip6/datapath/ip6_error.c (1)

86-89: Optional: drop redundant ip NULL check.

ip is set before prepend; it won’t be NULL.

modules/ip/datapath/ip_fragment.c (2)

59-61: Use ip_output_mbuf_data for consistency.

Stay consistent with the ip_output path metadata.

Apply:

-		iface = mbuf_data(mbuf)->iface;
+		iface = ip_output_mbuf_data(mbuf)->iface;

(This aligns with the copy at Line 123.)

Also applies to: 123-125


167-172: Remove unused ALREADY_FRAGMENTED drop if forwarding.

If you forward existing frags, the ALREADY_FRAGMENTED edge and drop node can be dropped.

Also applies to: 181-182

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37e7009 and 530fbdb.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (7)
  • modules/infra/control/gr_graph.h (1 hunks)
  • modules/ip/datapath/ip_error.c (6 hunks)
  • modules/ip/datapath/ip_fragment.c (1 hunks)
  • modules/ip/datapath/ip_output.c (3 hunks)
  • modules/ip/datapath/meson.build (1 hunks)
  • modules/ip6/datapath/ip6_error.c (6 hunks)
  • smoke/ip_fragment_test.sh (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

  • gr_vec_free(x) always sets x = NULL. There is no risk of double free.
  • ec_node_*() functions consume their ec_node arguments. No leaks on error.
  • rte_node->ctx is an uint8_t array of size 16, not a pointer.
  • Don't suggest to replace assert() with graceful error checking.
  • We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

  • modules/ip6/datapath/ip6_error.c
  • modules/ip/datapath/ip_fragment.c
  • modules/infra/control/gr_graph.h
  • modules/ip/datapath/ip_error.c
  • modules/ip/datapath/ip_output.c
**/*.sh

⚙️ CodeRabbit configuration file

**/*.sh: - Don't bother about unquoted shell variables.

Files:

  • smoke/ip_fragment_test.sh
🧬 Code graph analysis (5)
modules/ip6/datapath/ip6_error.c (1)
modules/ip/datapath/ip_error.c (1)
  • no_route_init (106-111)
modules/ip/datapath/ip_fragment.c (4)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/trace.c (1)
  • gr_mbuf_trace_add (520-547)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/ip/datapath/ip_error.c (1)
modules/ip6/datapath/ip6_error.c (1)
  • no_route_init (124-129)
modules/ip/datapath/ip_output.c (1)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
smoke/ip_fragment_test.sh (1)
smoke/_init.sh (2)
  • netns_add (93-99)
  • fail (88-91)
🪛 Shellcheck (0.11.0)
smoke/ip_fragment_test.sh

[warning] 6-6: Quote this to prevent word splitting.

(SC2046)


[warning] 8-8: run_id is referenced but not assigned.

(SC2154)


[warning] 39-39: Use semicolon or linefeed before 'do' (or quote to make it literal).

(SC1010)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: rpm
  • GitHub Check: deb
🔇 Additional comments (6)
modules/ip/datapath/meson.build (1)

16-16: Build wiring LGTM.

ip_fragment.c is correctly added to the datapath build.

smoke/ip_fragment_test.sh (1)

39-39: SC1010 warning is a false positive here.

“-M do” is a valid ping argument; no action needed.

If desired, silence ShellCheck by quoting "do" (cosmetic only).

modules/infra/control/gr_graph.h (1)

30-40: Macro looks good; clear and safe.

The size check and typed accessor are helpful. No issues.

modules/ip/datapath/ip_output.c (1)

94-100: LGTM: DF flag handling is correct.

The router correctly honors the DF flag: when set, it routes to FRAG_NEEDED (ICMP); otherwise it fragments via TOO_BIG.

modules/ip/datapath/ip_error.c (2)

16-19: LGTM: Clean context type definition.

The GR_NODE_CTX_TYPE macro provides type-safe accessor with compile-time size checking.


113-118: LGTM: Consistent error node pattern.

The frag_needed_node correctly mirrors the existing error node structure with appropriate ICMP type/code for "fragmentation needed."

Also applies to: 144-154

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modules/ip/datapath/ip_error.c (1)

84-86: Set Next-Hop MTU in icmp_seq_nb for ICMP Fragmentation Needed.

Per RFC 1191, when icmp_code is RTE_ICMP_CODE_UNREACH_FRAG, populate icmp_seq_nb with the outbound interface MTU in network byte order. The iface is available at line 53.

Apply this diff:

 		icmp->icmp_type = ctx->icmp_type;
 		icmp->icmp_code = ctx->icmp_code;
 		icmp->icmp_cksum = 0;
 		icmp->icmp_ident = 0;
-		icmp->icmp_seq_nb = 0;
+		if (ctx->icmp_code == RTE_ICMP_CODE_UNREACH_FRAG)
+			icmp->icmp_seq_nb = rte_cpu_to_be_16(iface->mtu);
+		else
+			icmp->icmp_seq_nb = 0;
🧹 Nitpick comments (2)
modules/ip/datapath/ip_output.c (1)

92-92: Use ip_output_mbuf_data() for consistency.

Line 74 reads from ip_output_mbuf_data(mbuf)->nh. For consistency within this datapath context, use the same accessor when setting iface.

Apply this diff:

-		mbuf_data(mbuf)->iface = iface;
+		ip_output_mbuf_data(mbuf)->iface = iface;
modules/ip/datapath/ip_fragment.c (1)

59-59: Use ip_output_mbuf_data() for consistency.

Line 123 uses ip_output_mbuf_data(frag_mbuf) for bulk copy. Use the same accessor here to match the ip_output datapath context.

Apply this diff:

-		iface = mbuf_data(mbuf)->iface;
+		iface = ip_output_mbuf_data(mbuf)->iface;
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 530fbdb and fbdc7f3.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (5)
  • modules/ip/datapath/ip_error.c (3 hunks)
  • modules/ip/datapath/ip_fragment.c (1 hunks)
  • modules/ip/datapath/ip_output.c (3 hunks)
  • modules/ip/datapath/meson.build (1 hunks)
  • smoke/ip_fragment_test.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • modules/ip/datapath/meson.build
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

  • gr_vec_free(x) always sets x = NULL. There is no risk of double free.
  • ec_node_*() functions consume their ec_node arguments. No leaks on error.
  • rte_node->ctx is an uint8_t array of size 16, not a pointer.
  • Don't suggest to replace assert() with graceful error checking.
  • We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

  • modules/ip/datapath/ip_fragment.c
  • modules/ip/datapath/ip_output.c
  • modules/ip/datapath/ip_error.c
**/*.sh

⚙️ CodeRabbit configuration file

**/*.sh: - Don't bother about unquoted shell variables.

Files:

  • smoke/ip_fragment_test.sh
🧬 Code graph analysis (2)
modules/ip/datapath/ip_fragment.c (4)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/trace.c (1)
  • gr_mbuf_trace_add (522-549)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
smoke/ip_fragment_test.sh (1)
smoke/_init.sh (2)
  • netns_add (93-99)
  • fail (88-91)
🪛 Shellcheck (0.11.0)
smoke/ip_fragment_test.sh

[warning] 6-6: Quote this to prevent word splitting.

(SC2046)


[warning] 8-8: run_id is referenced but not assigned.

(SC2154)


[warning] 39-39: Use semicolon or linefeed before 'do' (or quote to make it literal).

(SC1010)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-cross-aarch64
  • GitHub Check: rpm
  • GitHub Check: deb

Copy link
Collaborator

@rjarry rjarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
modules/ip/datapath/ip_output.c (1)

92-92: Use ip_output_mbuf_data(mbuf)->iface for consistency.

This line uses mbuf_data(mbuf) but the rest of the file and related code (e.g., ip_fragment.c line 123) use ip_output_mbuf_data(mbuf) in the ip_output datapath context.

-		mbuf_data(mbuf)->iface = iface;
+		ip_output_mbuf_data(mbuf)->iface = iface;
modules/ip/datapath/ip_fragment.c (3)

59-59: Use ip_output_mbuf_data(mbuf)->iface for consistency.

Line 123 uses ip_output_mbuf_data to access mbuf metadata. Use the same accessor here.

-		iface = mbuf_data(mbuf)->iface;
+		iface = ip_output_mbuf_data(mbuf)->iface;

76-138: Critical: Trim mbuf before enqueueing it.

Line 92 enqueues mbuf, then line 138 trims it. Modifying an mbuf after enqueueing is a use-after-enqueue bug—the graph may have already processed or freed it.

Move the trim before line 92:

 		// Prepare and enqueue first fragment (using original mbuf)
 		ip->total_length = rte_cpu_to_be_16(ip_hdr_len + frag_size);
 		ip->fragment_offset = RTE_BE16(RTE_IPV4_HDR_MF_FLAG);
 		ip->hdr_checksum = 0;
 		ip->hdr_checksum = rte_ipv4_cksum(ip);
+
+		// Trim first fragment to the right size
+		rte_pktmbuf_trim(mbuf, data_len - frag_size);
 
 		if (gr_mbuf_is_traced(mbuf)) {
 			struct ip_fragment_trace_data *t;
 			t = gr_mbuf_trace_add(mbuf, node, sizeof(*t));
 			t->packet_id = rte_be_to_cpu_16(ip->packet_id);
 			t->frag_num = 0;
 			t->offset = 0;
 			t->more_frags = 1;
 		}
 
 		// Enqueue first fragment
 		rte_node_enqueue_x1(graph, node, IP_OUTPUT, mbuf);
 
 		// Create and enqueue remaining fragments
 		for (i = 1; i < num_frags; i++) {
 			...
 		}
-
-		// Trim first fragment to the right size
-		rte_pktmbuf_trim(mbuf, data_len - frag_size);
 
 		continue;

94-135: Critical: Avoid partial fragment sets on allocation failure.

If rte_pktmbuf_copy (line 97) or rte_pktmbuf_append (line 106) fails, the first fragment (line 92) and any previously created fragments (line 134) are already enqueued. This produces incomplete fragment sets on the network.

Stage all fragments before enqueueing:

 		// Enqueue first fragment
 		rte_node_enqueue_x1(graph, node, IP_OUTPUT, mbuf);
 
-		// Create and enqueue remaining fragments
+		// Build remaining fragments
+		struct rte_mbuf *frags[num_frags - 1];
+		uint16_t built = 0;
+
 		for (i = 1; i < num_frags; i++) {
 			// Create new fragment, copying the original IPv4 header.
 			frag_mbuf = rte_pktmbuf_copy(mbuf, mbuf->pool, 0, ip_hdr_len);
 			if (unlikely(frag_mbuf == NULL)) {
+				// Free partial fragments; first already enqueued
+				for (uint16_t k = 0; k < built; k++)
+					rte_pktmbuf_free(frags[k]);
-				break;
+				goto next_packet;
 			}
 
 			frag_ip = rte_pktmbuf_mtod(frag_mbuf, struct rte_ipv4_hdr *);
 			offset = i * frag_size;
 			frag_data_len = RTE_MIN(frag_size, data_len - offset);
 
 			payload = rte_pktmbuf_append(frag_mbuf, frag_data_len);
 			if (unlikely(payload == NULL)) {
 				rte_pktmbuf_free(frag_mbuf);
+				for (uint16_t k = 0; k < built; k++)
+					rte_pktmbuf_free(frags[k]);
-				break;
+				goto next_packet;
 			}
 
 			memcpy(payload,
 			       rte_pktmbuf_mtod_offset(mbuf, const void *, ip_hdr_len + offset),
 			       frag_data_len);
 
 			frag_ip->total_length = rte_cpu_to_be_16(ip_hdr_len + frag_data_len);
 			frag_ip->fragment_offset = rte_cpu_to_be_16(
 				(offset / 8) | ((i < num_frags - 1) ? RTE_IPV4_HDR_MF_FLAG : 0)
 			);
 			frag_ip->hdr_checksum = 0;
 			frag_ip->hdr_checksum = rte_ipv4_cksum(frag_ip);
 
 			*ip_output_mbuf_data(frag_mbuf) = *ip_output_mbuf_data(mbuf);
 			frag_mbuf->packet_type = mbuf->packet_type;
 			if (gr_mbuf_is_traced(mbuf)) {
 				struct ip_fragment_trace_data *t;
 				t = gr_mbuf_trace_add(frag_mbuf, node, sizeof(*t));
 				t->packet_id = rte_be_to_cpu_16(frag_ip->packet_id);
 				t->frag_num = i;
 				t->offset = offset;
 				t->more_frags = (i < num_frags - 1) ? 1 : 0;
 			}
 
-			rte_node_enqueue_x1(graph, node, IP_OUTPUT, frag_mbuf);
+			frags[built++] = frag_mbuf;
 		}
 
+		// Enqueue all remaining fragments
+		for (uint16_t k = 0; k < built; k++)
+			rte_node_enqueue_x1(graph, node, IP_OUTPUT, frags[k]);
+
+next_packet:
 		continue;

Note: The first fragment (enqueued at line 92) cannot be recovered on failure. Consider refactoring to stage the first fragment as well for true atomicity.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbdc7f3 and 7e2da03.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (5)
  • modules/ip/datapath/ip_error.c (3 hunks)
  • modules/ip/datapath/ip_fragment.c (1 hunks)
  • modules/ip/datapath/ip_output.c (3 hunks)
  • modules/ip/datapath/meson.build (1 hunks)
  • smoke/ip_fragment_test.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • modules/ip/datapath/meson.build
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

  • gr_vec_free(x) always sets x = NULL. There is no risk of double free.
  • ec_node_*() functions consume their ec_node arguments. No leaks on error.
  • rte_node->ctx is an uint8_t array of size 16, not a pointer.
  • Don't suggest to replace assert() with graceful error checking.
  • We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

  • modules/ip/datapath/ip_output.c
  • modules/ip/datapath/ip_error.c
  • modules/ip/datapath/ip_fragment.c
**/*.sh

⚙️ CodeRabbit configuration file

**/*.sh: - Don't bother about unquoted shell variables.

Files:

  • smoke/ip_fragment_test.sh
🧬 Code graph analysis (3)
modules/ip/datapath/ip_output.c (1)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/ip/datapath/ip_fragment.c (4)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/trace.c (1)
  • gr_mbuf_trace_add (522-549)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
smoke/ip_fragment_test.sh (1)
smoke/_init.sh (2)
  • netns_add (93-99)
  • fail (88-91)
🪛 Shellcheck (0.11.0)
smoke/ip_fragment_test.sh

[warning] 6-6: Quote this to prevent word splitting.

(SC2046)


[warning] 8-8: run_id is referenced but not assigned.

(SC2154)


[warning] 39-39: Use semicolon or linefeed before 'do' (or quote to make it literal).

(SC1010)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it weird to have the node ip_error_frag_needed without any link to icmp_output ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea.. I guess it's like "ip_error_ttl_exceeded" that doesn't have any link with "icmp_output" ?? Maybe a bug with the graph generation ?

When a packet exceeds the output interface MTU and has the DF (Don't
Fragment) flag set, the router must drop the packet and send an ICMP
"Destination Unreachable" error (Type 3, Code 4: Fragmentation Needed
and DF Set) back to the source.

Previously, grout would silently drop packets that exceeded the MTU
regardless of the DF flag setting. This prevented proper Path MTU
Discovery (RFC 1191) from functioning.

Add a new error node ip_error_frag_needed that generates ICMP Type 3
Code 4 errors using the existing ip_error_process() infrastructure.

Signed-off-by: Anthony Harivel <[email protected]>
Reviewed-by: Christophe Fontaine <[email protected]>
Add an ip_fragment node that fragments IPv4 packets exceeding the
outgoing interface MTU. The node is invoked from ip_output when
a packet is too large and does not have the DF (Don't Fragment) flag
set.

Fragmentation follows RFC 791 requirements: each fragment payload size
is rounded down to a multiple of 8 bytes, fragment offset field is
expressed in 8-byte units, and the MF (More Fragments) flag is set on
all fragments except the last one. All fragments share the same packet
ID from the original packet.

The ip_output node now stores the output interface in mbuf metadata
before the size check, ensuring ip_fragment can access the MTU for
proper fragment sizing. The previous ip_output_too_big drop node is
removed as oversized packets are now handled by ip_fragment.

Fragments inherit trace state from the original packet to maintain
debugging capability across fragmentation boundaries.

Closes: DPDK#336
Signed-off-by: Anthony Harivel <[email protected]>
Reviewed-by: Christophe Fontaine <[email protected]>
Add a smoke test that verifies IPv4 packet fragmentation behavior by
configuring asymmetric MTUs between two TAP interfaces (1500 and 1280
bytes) and sending packets through grout.

The test validates three scenarios: normal forwarding without
fragmentation, ICMP error generation when a large packet has the DF
flag set, and successful fragmentation when the DF flag is not set.
This ensures both the fragmentation logic and the DF flag handling
work correctly.

Signed-off-by: Anthony Harivel <[email protected]>
Reviewed-by: Christophe Fontaine <[email protected]>
@rjarry rjarry merged commit dc88ff2 into DPDK:main Oct 24, 2025
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants