Skip to content

Commit 41ab1cb

Browse files
committed
content: Information on kexec and bpftrace
Overviews on how to use kexec and bpftrace for kernel development. These are very lightweight posts with room for more detail. Signed-off-by: Rahul Rameshbabu <[email protected]>
1 parent d1c09a0 commit 41ab1cb

File tree

2 files changed

+286
-0
lines changed

2 files changed

+286
-0
lines changed

content/posts/bpftrace.org

Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
---
2+
title: "bpftrace"
3+
date: 2023-12-10T21:32:59-08:00
4+
draft: false
5+
---
6+
7+
* What is bpftrace?
8+
9+
~bpftrace~ is both a CLI tool and a tracing language that compiles down to Linux
10+
enhanced Berkeley Packet Filter (eBPF) instructions. The BPF VM subsystem in the
11+
Linux kernel is immensely powerful. Covering it would require a separate
12+
article. For now, think of eBPF as some kind of mechanism that magically enables
13+
sandboxed loading of injected logic into priveleged contexts from userspace.
14+
~bpftrace~ makes use of eBPF to inject dynamic tracing instruments. The
15+
~bpftrace~ language can work with tranditional static tracing probes as well.
16+
17+
* Different types of probes available
18+
19+
The different types of probes are demonstrated in the bpftrace [[https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md#probes][reference guide]]
20+
in the source tree. As I continue to learn more about ~bpftrace~ and think more
21+
details should be expanded about each probe type, I will add that information to
22+
this post. For now, I will cover what kernel build configuration options are
23+
required for supporting these probes.
24+
25+
** kprobe / kretprobe
26+
27+
Minimal configuration required.
28+
29+
#+BEGIN_SRC
30+
CONFIG_BPF_EVENTS=y
31+
CONFIG_KPROBES=y
32+
#+END_SRC
33+
34+
Additional helpful configuration options.
35+
36+
#+BEGIN_SRC
37+
CONFIG_BPF_KPROBE_OVERRIDE=y # Enables overriding functions that would be executed after kprobe point
38+
CONFIG_KPROBES_ON_FTRACE=y # Optimizes kprobes with ftrace tracers already generated
39+
CONFIG_KPROBE_EVENTS=y # Support dynamically inserting tracing events using kprobes
40+
#+END_SRC
41+
42+
** kfunc / kretfunc
43+
44+
Minimal configuration required.
45+
46+
#+BEGIN_SRC
47+
CONFIG_DEBUG_INFO_BTF=y
48+
#+END_SRC
49+
50+
Additional helpful configuration options.
51+
52+
#+BEGIN_SRC
53+
CONFIG_MODULE_ALLOW_BTF_MISMATCH=y # Allows modules with mismatching BTF information against running kernel to be loaded
54+
#+END_SRC
55+
56+
** uprobe / uretprobe
57+
58+
#+BEGIN_SRC
59+
CONFIG_UPROBE=y
60+
#+END_SRC
61+
62+
Additional helpful configuration options.
63+
64+
#+BEGIN_SRC
65+
CONFIG_UPROBE_EVENTS=y # Support dynamically setting uprobes/uretprobes using memory offsets of userspace programs
66+
# Link: https://docs.kernel.org/trace/uprobetracer.html
67+
#+END_SRC
68+
69+
** tracepoint
70+
71+
Minimal configuration required.
72+
73+
#+BEGIN_SRC
74+
CONFIG_TRACEPOINTS=y
75+
#+END_SRC
76+
77+
Additional helpful configuration options.
78+
79+
#+BEGIN_SRC
80+
CONFIG_TRACEPOINT_BENCHMARK=y # For benchmarking the tracepoint feature in the kernel using a kernel tracepoint
81+
#+END_SRC
82+
83+
Here is [[https://docs.kernel.org/trace/tracepoints.html][documentation for how to implement tracepoints in kernel code]]. Even
84+
without ~bpftrace~, there are mechanisms such as [[https://docs.kernel.org/trace/events.html][event tracing]] that can be used
85+
to handle tracepoint activity.
86+
87+
* Knowing what's supported on your kernel
88+
89+
Running ~bpftrace --info~ provides information on what is and is not supported.
90+
91+
#+BEGIN_EXAMPLE
92+
bpftrace --info
93+
System
94+
OS: Linux 6.1.31 #1-NixOS SMP PREEMPT_DYNAMIC Tue May 30 13:03:33 UTC 2023
95+
Arch: x86_64
96+
97+
Build
98+
version: v0.18.0
99+
LLVM: 14.0.6
100+
unsafe probe: no
101+
bfd: yes
102+
libdw (DWARF support): yes
103+
104+
Kernel helpers
105+
probe_read: yes
106+
probe_read_str: yes
107+
probe_read_user: yes
108+
probe_read_user_str: yes
109+
probe_read_kernel: yes
110+
probe_read_kernel_str: yes
111+
get_current_cgroup_id: yes
112+
send_signal: yes
113+
override_return: no
114+
get_boot_ns: yes
115+
dpath: yes
116+
skboutput: yes
117+
118+
Kernel features
119+
Instruction limit: 1000000
120+
Loop support: yes
121+
btf: yes
122+
module btf: yes
123+
map batch: yes
124+
uprobe refcount (depends on Build:bcc bpf_attach_uprobe refcount): yes
125+
126+
Map types
127+
hash: yes
128+
percpu hash: yes
129+
array: yes
130+
percpu array: yes
131+
stack_trace: yes
132+
perf_event_array: yes
133+
134+
Probe types
135+
kprobe: yes
136+
tracepoint: yes
137+
perf_event: yes
138+
kfunc: yes
139+
iter:task: yes
140+
iter:task_file: yes
141+
iter:task_vma: yes
142+
kprobe_multi: no
143+
raw_tp_special: yes
144+
#+END_EXAMPLE
145+
146+
A lot of the kernel dependent features will require certain configuration
147+
options to be selected. The output shared is the default for the build
148+
configuration used in NixOS for the ~linuxPackages_latest~ kernel. Convenient
149+
for me in general for demonstrations. However, I need to compile the kernel
150+
myself for development purposes. I present the needed configuration options for
151+
each type of probe.
152+
153+
* Useful resources for learning more about bpftrace
154+
155+
Honestly, I am pretty new to both ~bpftrace~ and eBPF myself. I plan on updating
156+
this page as I continue to learn more. One of my goals is learning how to use
157+
the [[https://github.com/brendangregg/FlameGraph/blob/master/stackcollapse-bpftrace.pl][stackcollapse-bpftrace.pl]] script for generating flamegraphs. Right now, I
158+
use ~perf~ for generating flamegraphs. I am also collecting useful bpftace
159+
snippets that I build along my journey as a kernel developer and systems
160+
enthusiast. These snippets can be found on my GitHub repository,
161+
[[https://github.com/Binary-Eater/bpftrace-scripts][Binary-Eater/bpftrace-scripts]].
162+
163+
In general, the [[https://github.com/iovisor/bpftrace][iovisor/bpftrace]] GitHub repository has a nice [[https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md][reference]] and
164+
[[https://github.com/iovisor/bpftrace/blob/master/docs/tutorial_one_liners.md][one-liner tutorial]] for new users to follow along. The [[https://github.com/iovisor/bpftrace/blob/master/man/adoc/bpftrace.adoc][manpage]] is an even more
165+
thorough resource.
166+
167+
The ~tools/~ directory of the ~bpftrace~ repository also serves as a great
168+
reference.
169+
170+
[[https://www.brendangregg.com/index.html][Brendan Gregg's blog]] has a number of additional examples as well.

content/posts/kexec.org

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
---
2+
title: "kexec"
3+
date: 2023-12-10T21:32:56-08:00
4+
draft: false
5+
---
6+
7+
* What is kexec?
8+
9+
~kexec~ is short for kernel execute. At a high level, it is analogous with the
10+
syscall ~exec~. ~kexec~ replaces the memory of the currently loaded kernel image
11+
and begins executing the newly loaded kernel image. tl;dr it enables users to
12+
run new kernels without needing to do a full power cycle of a system.
13+
14+
* What is the value of kexec?
15+
16+
+ Enables a "boot once" flow for testing a potentially problematic kernel.
17+
+ No bootloader configuration required.
18+
+ Does not require a complete system reboot (BIOS boot) to swap between kernel
19+
images for testing.
20+
+ Lowers the difficulty for setting up ad-hoc kernel development environments
21+
using the latest kernel source tree.
22+
23+
* What is required to use kexec?
24+
25+
** Kconfig
26+
27+
The kernel that will have its image rewritten (the currently booted kernel) will
28+
need the following kernel configuration options.
29+
30+
#+BEGIN_SRC
31+
CONFIG_KEXEC=y # Enables support for general kexec syscall functionality.
32+
CONFIG_KEXEC_FILE=y # Enables kexec_file_load syscall. See the manual for kexec_load(2) for more details.
33+
34+
# Optional security options
35+
CONFIG_KEXEC_BZIMAGE_VERIFY_SIG=n # Verify the signing signature of the bzImage used in kexec
36+
CONFIG_KEXEC_SIG=n # If the kernel image has a signature, make sure the signature is valid when using the kexec_file_load syscall
37+
CONFIG_KEXEC_SIG_FORCE=n # Enforce that the kernel image used in the kexec_file_load syscall has a valid signature
38+
#+END_SRC
39+
40+
** Userspace
41+
42+
You will need ~kexec-tools~ in userspace to be able to configure a kernel image
43+
for ~kexec~.
44+
45+
* Using kexec
46+
47+
Here is an example of setting up a new kernel image to execute in place of the
48+
currently running image while reusing the kernel commandline of the previous
49+
image.
50+
51+
#+BEGIN_SRC sh
52+
kexec -l /path/to/vmlinuz --initrd=/path/to/initramfs.img --reuse-cmdline
53+
kexec -e
54+
#+END_SRC
55+
56+
The ~--initrd~ flag is optional. It is used when an initramfs image is needed to
57+
properly bring up a system. ~kexec -e~ will then execute the configured kernel
58+
without gracefully taking down any system services momentarily that might be
59+
impacted.
60+
61+
~systemd~ can assist with a more graceful ~kexec~ flow for taking down services
62+
and even supporting ad-hoc clean-up services using ~WantedBy=kexec.target~ in
63+
the ~Install~ section of a systemd service definition. ~kexec -e~ is replaced
64+
with ~systemctl kexec~.
65+
66+
If you want to unload the target kernel that was previously loaded by ~kexec~,
67+
~kexec -u~ will unload the currently running ~kexec~ target kernel.
68+
69+
There are a number of options for the ~kexec~ commandline tool that are
70+
documented in the manual for ~kexec(8)~.
71+
72+
Usage details for ~kexec~ are also well documented on the Arch Linux wiki and
73+
Gentoo wiki.
74+
75+
OpenSUSE also has a more thorough [[https://documentation.suse.com/de-de/sles/15-GA/html/SLES-all/cha-tuning-kexec.html][write-up]] on ~kexec~ and ~kdump~.
76+
77+
NOTE: It seems Gentoo uses a patched version of ~reboot~ to offer a graceful
78+
~-k~ flag. Gentoo probably does this since it offers an alternative to
79+
~systemd~, OpenRC.
80+
81+
* Why is kexec not enabled everywhere?
82+
83+
Lightly mentioned in this article already, ~kexec~ is not an infallible process.
84+
This has to do with the fact that, unlike a userspace program that might have
85+
its application instructions overwritten with an ~exec~ syscall, the kernel
86+
image being overwritten is in charge of managing devices at a very low-level.
87+
Device teardown and initialization may not occur in a way that leaves an
88+
"already-running" system in a stable state.
89+
90+
The idea of being able to safely move to a different kernel without compromising
91+
the system is called kernel livepatching. Kernel livepatching is a hot area of
92+
research with multiple entities taking their own approaches on the matter.
93+
94+
+ [[https://ubuntu.com/blog/an-overview-of-live-kernel-patching][Canonical's solution without kexec by using ftrace hooking]]
95+
+ [[https://documentation.suse.com/sles/12-SP4/html/SLES-kgraft/index.html][kGraft by OpenSUSE that similarly uses ftrace hooking]]
96+
+ [[https://www.redhat.com/en/topics/linux/what-is-linux-kernel-live-patching#the-two-spaces-of-linux-system-operations][RedHat's ftrace hooking kpatch livepatch solution]]
97+
+ [[https://en.wikipedia.org/wiki/Ksplice#Design][Ksplice using its own injector and hooking mechanism]]
98+
99+
*NOTE:* Ksplice was initially developed by MIT students during its initial
100+
development, as a reference illustrating that kernel livepatching is a topic
101+
worth academic exploration.
102+
103+
*DISCLAIMER:* I have not read any of the above approaches in detail. I just felt
104+
I should draw attention to them for curious readers interested in ways to
105+
potentially make ~kexec~ "more-robust".
106+
107+
* Alternatives to kexec
108+
109+
For "boot once" testing, using bootloader configurations to do a "boot once" is
110+
an option. However, many bootloaders have an involved process for achieving
111+
this.
112+
113+
I have not found decent documentation on how to do this for GRUB 2 (not GRUB
114+
Legacy). In reality, I might work with a number of systems using different
115+
bootloaders such as ~systemd-boot~ and learning how to do this with every
116+
bootloader implementation out there seems like an adventure.

0 commit comments

Comments
 (0)