-
Notifications
You must be signed in to change notification settings - Fork 4k
libbpf-tools: add tcpdrop to trace TCP packet drops #5329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Added tcpdrop tool, consisting of tcpdrop.bpf.c and tcpdrop.c, to trace TCP kernel-dropped packets using eBPF. Supports IPv4/IPv6 filtering and network namespace filtering, with output including timestamp, PID, IP addresses, ports, TCP state, and drop reason. Based on tcptop(8) from BCC. Signed-off-by: Lance Yang <[email protected]> Signed-off-by: Zi Li <[email protected]> Signed-off-by: Amaindex <[email protected]>
|
Hi @chenhengqi , we’ve got a C version of tcpdrop in this PR (#5329), sticking close to the Python version’s features and options. Could you take a peek when you’ve got a sec? Would love your thoughts :) |
|
Hi @ekyooo and @chenhengqi , Thanks for the great feedback! I've made the following updates based on your suggestions:
Please take a look and let me know if there's anything else I can tweak! |
|
Hi @chenhengqi , Regarding your suggestion to copy reason enums from the kernel for tcpdrop, we previously used this approach in tcpdrop.py. However, recent experience shows these enums vary across kernel versions and distros, and they're easy to verify. So, I think dynamic loading via parse_reason_enum is more robust. It might be good to update tcpdrop.py to match this approach for consistency. What do you think, or is there another way to handle this? |
- Use ksyms__load and ksyms__map_addr for kernel symbol resolution. - Follow Linux kernel coding style in tcpdrop.bpf.c and tcpdrop.c. - Optimize IPv6 address handling with __u32 arrays and in6_u.u6_addr32. - Remove bpf_printk debug statements from tcpdrop.bpf.c. - Add /tcpdrop to .gitignore to exclude the binary. - Define event struct in tcpdrop.h to prevent duplicate definitions. - Check drop reason with bpf_core_field_exists in tcpdrop.bpf.c. Signed-off-by: Zi Li <[email protected]> Signed-off-by: Amaindex <[email protected]>
Do you have an example of |
Take This is reflected in the tracepoint format for Now, fast forward to kernel v6.15.4, and things shift in The tracepoint format in v6.15.4 confirms this, with Considering the skb_drop_reason index changes across kernel versions, parse_reason_enum for dynamic loading feels more adaptable than hardcoding the enums. |
Sounds reasonable. I am OK with this approach. |
Remove print_drop_reasons function and replace its call with a warning message in main when parse_reason_enum fails. Signed-off-by: Zi Li <[email protected]> Signed-off-by: Amaindex <[email protected]>
|
Hi @chenhengqi , |
|
Some comments are not resolved, please check. |
…cpdrop Move ipv4_only, ipv6_only, and netns_id to rodata section for better memory management. Optimize tcpdrop.bpf.c by declaring variables upfront and reordering operations for clarity. Update event struct to place stack_id correctly. Fix missing newlines at file ends. Signed-off-by: Zi Li <[email protected]> Signed-off-by: Amaindex <[email protected]>
Hi @chenhengqi , |
Defer the BPF ring buffer event allocation in tcpdrop.bpf.c until all preliminary checks are passed, reducing unnecessary discards and improving performance. This ensures the event is only reserved when the skb meets all processing conditions, minimizing resource waste. Signed-off-by: Zi Li <[email protected]> Signed-off-by: Amaindex <[email protected]>
|
Hi @chenhengqi, |
Merge protocol validation and event population in tcpdrop.bpf.c for better readability and efficiency. Remove braces from single-line if statements to streamline code while preserving functionality. Signed-off-by: Zi Li <[email protected]> Signed-off-by: Amaindex <[email protected]>
|
Hi @chenhengqi, appreciate the input! I split the code before to cut down on event discards, but it was probably overdone. I’ve merged the protocol checks and trimmed the single-line ifs for a cleaner approach. What do you think of this version? |
@chenhengqi Hi, hope all’s good! I replied to your last comments—any chance you could take a look or let me know if there’s more to tweak? Appreciate your time! |
|
I got this locally: Verifier logslibbpf: prog 'tp__skb_free_skb': BPF program load failed: -EACCES
libbpf: prog 'tp__skb_free_skb': -- BEGIN PROG LOAD LOG --
Unrecognized arg#0 type PTR
; skb = args->skbaddr;
0: (79) r7 = *(u64 *)(r1 +8)
; if (!skb)
1: (15) if r7 == 0x0 goto pc+110
R1=ctx(id=0,off=0,imm=0) R7_w=inv(id=0) R10=fp0
2: (b7) r2 = 1
; if (bpf_core_field_exists(args->reason))
3: (15) if r2 == 0x0 goto pc+3
last_idx 3 first_idx 0
regs=4 stack=0 before 2: (b7) r2 = 1
; if (args->reason <= SKB_DROP_REASON_NOT_SPECIFIED)
4: (61) r2 = *(u32 *)(r1 +28)
5: (b7) r3 = 3
; if (args->reason <= SKB_DROP_REASON_NOT_SPECIFIED)
6: (2d) if r3 > r2 goto pc+105
R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0,umin_value=3,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R3_w=inv3 R7_w=inv(id=0) R10=fp0
; protocol = args->protocol;
7: (69) r8 = *(u16 *)(r1 +24)
; if (protocol != ETH_P_IP && protocol != ETH_P_IPV6)
8: (15) if r8 == 0x86dd goto pc+1
R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umin_value=3,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R3=inv3 R7=inv(id=0) R8_w=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)) R10=fp0
9: (55) if r8 != 0x800 goto pc+102
R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umin_value=3,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R3=inv3 R7=inv(id=0) R8_w=inv2048 R10=fp0
; if (ipv4_only && protocol != ETH_P_IP)
10: (18) r2 = 0xffffc900003c5000
12: (71) r2 = *(u8 *)(r2 +0)
R1=ctx(id=0,off=0,imm=0) R2_w=map_value(id=0,off=0,ks=4,vs=8,imm=0) R3=inv3 R7=inv(id=0) R8_w=inv2048 R10=fp0
; if (ipv4_only && protocol != ETH_P_IP)
13: (15) if r8 == 0x800 goto pc+1
last_idx 13 first_idx 7
regs=100 stack=0 before 12: (71) r2 = *(u8 *)(r2 +0)
regs=100 stack=0 before 10: (18) r2 = 0xffffc900003c5000
regs=100 stack=0 before 9: (55) if r8 != 0x800 goto pc+102
regs=100 stack=0 before 8: (15) if r8 == 0x86dd goto pc+1
regs=100 stack=0 before 7: (69) r8 = *(u16 *)(r1 +24)
; if (ipv6_only && protocol != ETH_P_IPV6)
15: (18) r2 = 0xffffc900003c5001
17: (71) r2 = *(u8 *)(r2 +0)
R1=ctx(id=0,off=0,imm=0) R2_w=map_value(id=0,off=1,ks=4,vs=8,imm=0) R3=inv3 R7=inv(id=0) R8_w=invP2048 R10=fp0
; if (ipv6_only && protocol != ETH_P_IPV6)
18: (15) if r8 == 0x86dd goto pc+1
19: (55) if r2 != 0x0 goto pc+92
last_idx 19 first_idx 18
regs=4 stack=0 before 18: (15) if r8 == 0x86dd goto pc+1
R1=ctx(id=0,off=0,imm=0) R2_rw=invP0 R3=inv3 R7=inv(id=0) R8_rw=invP2048 R10=fp0
parent didn't have regs=4 stack=0 marks
last_idx 17 first_idx 7
regs=4 stack=0 before 17: (71) r2 = *(u8 *)(r2 +0)
20: (b7) r2 = 32
21: (bf) r3 = r7
22: (0f) r3 += r2
23: (bf) r2 = r10
; bpf_core_read(&sk, sizeof(sk), &skb->sk);
24: (07) r2 += -8
25: (bf) r6 = r1
26: (bf) r1 = r2
27: (b7) r2 = 8
28: (85) call bpf_probe_read_kernel#113
last_idx 28 first_idx 18
regs=4 stack=0 before 27: (b7) r2 = 8
29: (79) r3 = *(u64 *)(r10 -8)
; if (netns_id && sk) {
30: (18) r1 = 0xffffc900003c5004
32: (61) r1 = *(u32 *)(r1 +0)
R0=inv(id=0) R1_w=map_value(id=0,off=4,ks=4,vs=8,imm=0) R3_w=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv(id=0) R8=invP2048 R10=fp0 fp-8=mmmmmmmm
; if (netns_id && sk) {
33: (15) if r1 == 0x0 goto pc+22
last_idx 33 first_idx 29
regs=2 stack=0 before 32: (61) r1 = *(u32 *)(r1 +0)
; if (inum != netns_id)
56: (b7) r1 = 208
57: (bf) r3 = r7
58: (0f) r3 += r1
59: (bf) r1 = r10
; if (bpf_core_read(&head, sizeof(head), &skb->head) ||
60: (07) r1 += -16
61: (b7) r2 = 8
62: (85) call bpf_probe_read_kernel#113
last_idx 62 first_idx 29
regs=4 stack=0 before 61: (b7) r2 = 8
; if (bpf_core_read(&head, sizeof(head), &skb->head) ||
63: (55) if r0 != 0x0 goto pc+48
R0=inv0 R6=ctx(id=0,off=0,imm=0) R7=inv(id=0) R8=invP2048 R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm
64: (b7) r1 = 196
65: (bf) r3 = r7
66: (0f) r3 += r1
67: (bf) r1 = r10
; bpf_core_read(&network_header, sizeof(network_header),
68: (07) r1 += -18
69: (b7) r2 = 2
70: (85) call bpf_probe_read_kernel#113
last_idx 70 first_idx 63
regs=4 stack=0 before 69: (b7) r2 = 2
; &skb->network_header) ||
71: (55) if r0 != 0x0 goto pc+40
R0=inv0 R6=ctx(id=0,off=0,imm=0) R7=inv(id=0) R8=invP2048 R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mm??????
72: (b7) r1 = 194
73: (0f) r7 += r1
74: (bf) r1 = r10
; bpf_core_read(&transport_header, sizeof(transport_header),
75: (07) r1 += -20
76: (b7) r2 = 2
77: (bf) r3 = r7
78: (85) call bpf_probe_read_kernel#113
last_idx 78 first_idx 71
regs=4 stack=0 before 77: (bf) r3 = r7
regs=4 stack=0 before 76: (b7) r2 = 2
; if (bpf_core_read(&head, sizeof(head), &skb->head) ||
79: (55) if r0 != 0x0 goto pc+32
R0=inv0 R6=ctx(id=0,off=0,imm=0) R7=inv(id=0) R8=invP2048 R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm????
; event = bpf_ringbuf_reserve(&events, sizeof(*event), 0);
80: (18) r1 = 0xffff888a8d1f5400
82: (b7) r2 = 80
83: (b7) r3 = 0
84: (85) call bpf_ringbuf_reserve#131
; if (!event)
85: (15) if r0 == 0x0 goto pc+26
R0_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R6=ctx(id=0,off=0,imm=0) R7=inv(id=0) R8=invP2048 R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? refs=2
86: (bf) r9 = r0
;
87: (bf) r7 = r0
;
88: (69) r1 = *(u16 *)(r10 -18)
89: (79) r3 = *(u64 *)(r10 -16)
90: (0f) r3 += r1
; if (protocol == ETH_P_IP) {
91: (55) if r8 != 0x800 goto pc+22
92: (bf) r1 = r10
; if (bpf_core_read(&ip, sizeof(ip), head + network_header) ||
93: (07) r1 += -80
94: (b7) r2 = 20
95: (85) call bpf_probe_read_kernel#113
last_idx 95 first_idx 91
regs=4 stack=0 before 94: (b7) r2 = 20
; if (bpf_core_read(&ip, sizeof(ip), head + network_header) ||
96: (55) if r0 != 0x0 goto pc+12
R0_w=inv0 R6=ctx(id=0,off=0,imm=0) R7=mem(id=0,ref_obj_id=2,off=0,imm=0) R8=invP2048 R9=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-64=????mmmm fp-72=mmmmmmmm fp-80=mmmmmmmm refs=2
97: (bf) r1 = r10
; ip.protocol != IPPROTO_TCP ||
98: (07) r1 += -80
99: (71) r1 = *(u8 *)(r1 +9)
; ip.protocol != IPPROTO_TCP ||
100: (55) if r1 != 0x6 goto pc+8
R0=inv0 R1=inv6 R6=ctx(id=0,off=0,imm=0) R7=mem(id=0,ref_obj_id=2,off=0,imm=0) R8=invP2048 R9=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-64=????mmmm fp-72=mmmmmmmm fp-80=mmmmmmmm refs=2
; bpf_core_read(&tcp, sizeof(tcp), head + transport_header)) {
101: (69) r1 = *(u16 *)(r10 -20)
102: (79) r3 = *(u64 *)(r10 -16)
103: (0f) r3 += r1
104: (bf) r1 = r10
105: (07) r1 += -100
106: (b7) r2 = 20
107: (85) call bpf_probe_read_kernel#113
last_idx 107 first_idx 100
regs=4 stack=0 before 106: (b7) r2 = 20
; if (bpf_core_read(&ip, sizeof(ip), head + network_header) ||
108: (15) if r0 == 0x0 goto pc+23
R0=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=mem(id=0,ref_obj_id=2,off=0,imm=0) R8=invP2048 R9=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=mmmmmmmm fp-24=mmmm???? fp-64=????mmmm fp-72=mmmmmmmm fp-80=mmmmmmmm fp-88=mmmmmmmm fp-96=mmmmmmmm fp-104=mmmm???? refs=2
;
109: (bf) r1 = r7
110: (b7) r2 = 0
111: (85) call bpf_ringbuf_discard#133
; }
112: (b7) r0 = 0
113: (95) exit
|
libbpf-tools/tcpdrop.bpf.c
Outdated
| if (ipv6_only && protocol != ETH_P_IPV6) | ||
| return 0; | ||
|
|
||
| bpf_core_read(&sk, sizeof(sk), &skb->sk); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically, we use BPF_CORE_READ and BPF_CORE_READ_INFO instead.
| const volatile __u32 netns_id = 0; | ||
|
|
||
| SEC("tracepoint/skb/kfree_skb") | ||
| int tp__skb_free_skb(struct trace_event_raw_kfree_skb *args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use raw tracepoint instead ?
Replace bpf_core_read with BPF_CORE_READ for CO-RE compatibility and use __builtin_memcpy for IPv6 addresses to fix verifier type mismatch (R1 type=inv expected=fp). Simplify sk and state reads for clarity and efficiency. Signed-off-by: Zi Li <[email protected]> Signed-off-by: Amaindex <[email protected]>
|
Hi @chenhengqi, thank you for your guidance and feedback! I’ve replaced |
Added tcpdrop tool, consisting of tcpdrop.bpf.c and tcpdrop.c, to trace TCP kernel-dropped packets using eBPF. Supports IPv4/IPv6 filtering and network namespace filtering, with output including timestamp, PID, IP addresses, ports, TCP state, and drop reason. Based on tcptop(8) from BCC.