4.9 Further Reading and Exploration Paths
Along this journey, we've torn down the kernel probe mechanism to its very core—from the lowest-level assembly instruction replacement, to the register parsing of pt_regs, to the dynamic insertion of ftrace, and finally standing on the shoulders of eBPF.
But as we've repeatedly emphasized throughout this book: In the kernel world, knowing "how" is just the admission ticket; knowing "why" gets you a seat.
This section has no code and no hands-on exercises. Instead, it's a map. When you find a concept from this chapter starting to blur in your mind, or when you encounter edge cases we didn't cover, you can follow these paths forward.
📚 Mechanisms and Principles—Into the Black Box
If you want to figure out exactly "how Kprobes replaces an instruction with a breakpoint," the official documentation is the starting point, but often not the finish line.
-
Kernel Probes (Kprobes) - Official Kernel Documentation This is the "bible" for querying APIs and behavioral guidelines. Whenever your code behaves strangely, your first reaction should be to check here to see if you've triggered some unstated constraint.
-
How Linux kprobes works (Dec 2016) If you find the official documentation too dry, this blog post is a great antidote. It visually breaks down the details of the underlying implementation. If you're interested in the micro-mechanics of
int3breakpoints and instruction jumps, you'll find the answers here. -
[Kernel] Kprobe, Brian Pan (Nov 2020) A relatively modern overview article, suitable for reviewing and connecting the dots between concepts.
-
Traps, Handlers (x86 specific) Don't let the title intimidate you. A prerequisite for understanding kprobes is understanding interrupt and trap gates. Although this material leans toward the x86 architecture, the concepts are universal—when you hear the words "exception handling," you need to know what the CPU is actually doing.
🛠️ The Art of Dynamic Tracing—Ftrace and Perf
If static kprobes are "brute force," then ftrace-based kprobe events are "tai chi." The resources below will show you how to apply force more elegantly.
-
Taming Tracepoints in the Linux Kernel, Keenan (Mar 2020) Tracepoints are the "backdoors" the kernel leaves for us. This article teaches you how to find and leverage these backdoors.
-
Fun with Dynamic Kernel Tracing Events, Steven Rostedt (Oct 2018) Note that the author is Steven Rostedt—the principal maintainer of ftrace. This presentation doesn't just show you how to use it; it shows you "things you never thought possible." If you want to witness the power of dynamic tracing, look no further.
-
Dynamic tracing in Linux user and kernel space, Pratyush Anand (July 2017) We focused mainly on kernel space in this chapter, but this article broadens the horizon. It covers
uprobe(user-space probes), making you realize that this mechanism actually spans the entire system.
🦋 eBPF—The Future of Observability
If you reach the end of this chapter feeling like "this isn't enough," then eBPF is your next stop. It's not just a tracing tool; it's redefining kernel programming.
-
BCC Python Reference & BCC Installation Guide This is the practical entry point. Installing BCC and successfully running
execsnoopis the crucial step from "understanding" to "using." -
Linux Extended BPF (eBPF) Tracing Tools, Brendan Gregg Brendan Gregg's homepage. It doesn't just offer tools; it provides a wealth of performance analysis case studies with visualizations. When you don't know which tool to use, come here for inspiration.
-
How eBPF Turns Linux into a Programmable Kernel, Jackson (Oct 2020) This advanced article explains why eBPF is called "revolutionary." It answers a core question: why do we need to run scripts inside the kernel?
-
A Gentle Introduction to eBPF, InfoQ (May 2021) If you need an introduction to show your boss or non-embedded peers, this is the best choice.
-
A thorough introduction to eBPF (Kernel-level), Matt Fleming, LWN (Dec 2017) LWN articles never disappoint. This long-form piece dives deep into the internal kernel data structures and implementation details—maximum hardcore level.
-
How io_uring and eBPF Will Revolutionize Programming in Linux, Glauber Costa (Apr 2020) Look at eBPF and asynchronous I/O (io_uring) together, and you'll see that Linux's performance model is undergoing a massive reconstruction.
⚙️ ABI and Assembly—Talking to the Machine
We spent a lot of time in Section 4.6 discussing pt_regs and registers. If that still wasn't enough to satisfy you, or if you need to handle a different architecture like ARM64, the resources below are your compass in the register jungle.
-
APPLICATION BINARY INTERFACE (ABI) DOCS AND THEIR MEANING Read this first. It explains why the ABI is as important to system programmers as traffic rules are to drivers.
-
x64 Cheat Sheet A single page covering x86-64 registers and calling conventions. Print it out and pin it to your wall.
-
X86 64 Register and Instruction Quick Start If you suddenly forget whether
RDIstores the first or second argument, look here. -
Overview of ARM64 ABI conventions, Microsoft (Mar 2019) Even though it's Microsoft documentation, the description of ARM64 standards is universal. When your dev board switches from x86 to ARM, you'll need this.
-
ARMv8-A64-bit Android on ARM - Architecture Overview, Campus London (Sept 2015) Skip to page 32. There's a reference table for ARMv8 terminology that is well worth bookmarking.
-
ARM Cortex-A Series Programmer's Guide for ARMv8-A The official ARM bible. If you want to understand AArch64 stack frame structures and exception handling, this is the final word.
🛠️ Toolbox—Brendan Gregg's Treasure Trove
We mentioned Brendan Gregg multiple times earlier. His tool library is the arsenal of every system programmer.
- Brendan Gregg's perf-tools page The homepage.
- kprobes-perf examples The examples here directly show how to use the perf interface to manipulate kprobes.
- kprobes-perf and related tooling code If you're curious about how these tools are wrapped under the hood, just read the source code.
🧩 Miscellanea—History and Monitoring Perspectives
Finally, here are some resources on historical evolution and different monitoring perspectives.
-
Locating System Problems Using Dynamic Instrumentation, Prasad, Cohen, et al (2005) This 2005 paper focuses primarily on SystemTap. Although we mostly use eBPF now, from a historical perspective, it shows how dynamic instrumentation technology was originally designed and used. Reading it helps you understand how people solved problems in the "pre-eBPF era."
-
Different Approaches to Linux Host Monitoring, Kelly Shortridge Step out of the code and look at monitoring from a higher architectural level. This article compares different monitoring approaches, helping you build a holistic view.
Out of the Maze
Alright, the resources are listed.
But remember one thing: No matter how many bookmarks you collect, it's not as practical as actually running a dmesg.
Further reading is a ladder to use when you hit a wall, not a TV show for you to watch while lying on the couch. When you actually encounter a tricky crash, or when you need to capture a fleeting bug in a production environment, you'll naturally recall these links and know exactly where to find the answers.
Now, close your browser and go take a look at your kernel in dmesg.
Exercises
Exercise 1: Understanding
Question: On the x86-64 architecture, the Linux kernel follows the System V AMD64 ABI, where the first 6 integer/pointer arguments of a function are passed via the RDI, RSI, RDX, RCX, R8, and R9 registers respectively. Suppose you need to use a kprobe's pre-handler to intercept the kernel function do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode) and extract the filename parameter. Which member of the struct pt_regs structure should you access in the pre-handler callback to obtain the filename pointer?
Answer and Explanation
Answer: regs->si
Explanation: According to the x86-64 ABI specification, the first 6 arguments of a function are stored in the RDI, RSI, RDX, RCX, R8, and R9 registers in order.
- The second argument of
do_sys_openisfilename. - Therefore, it corresponds to the second register, RSI.
- In the Linux kernel's
struct pt_regsstructure, the membersi(orrsi) corresponds to the value of the RSI register.
Exercise 2: Understanding
Question: You are writing a kernel module and trying to probe the kprobe_exceptions_notify function using register_kprobe(), but you find that the probe always fails. What is the most likely reason? Please explain in the context of the blacklist mechanism mentioned in this chapter.
Answer and Explanation
Answer: Because kprobe_exceptions_notify is on the kprobe blacklist, or it is used internally by the kprobe implementation to prevent recursive faults.
Explanation: Kprobes cannot probe functions used internally by its own implementation, otherwise it would cause recursion or deadlocks. The kernel maintains a blacklist (usually located in /sys/kernel/debug/kprobes/blacklist) that lists all functions prohibited from being probed. kprobe_exceptions_notify belongs to the core kprobe processing logic and is therefore blacklisted, making it impossible to be hooked by a regular kprobe.
Exercise 3: Application
Question: You want to dynamically trace the execution of the kernel function do_sys_open without recompiling the kernel module, and record the process PID and return value for each call. Describe how to achieve this using /sys/kernel/debug/tracing/kprobe_events (ftrace) (assuming the function name is known).
Answer and Explanation
Answer: 1. Enable the kprobe event: echo 'p:myprobe do_sys_open dfd=%dx filename=%si flags=%dx mode=%cx' > /sys/kernel/debug/tracing/kprobe_events (parameters are optional).
2. Enable the return value probe: echo 'r:myretprobe do_sys_open ret=%ax' >> /sys/kernel/debug/tracing/kprobe_events.
3. Enable tracing: echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable.
4. View the results: cat /sys/kernel/debug/tracing/trace.
Explanation: This is the standard workflow for applying dynamic probes. The p: prefix defines a pre-handler to capture entry parameters (determining registers based on the ABI), while the r: prefix defines a kretprobe to capture the return value (usually via %ax). This method requires no C code writing or recompilation, leveraging the ftrace infrastructure to achieve dynamic tracing.
Exercise 4: Thinking
Question: In a high-load production environment, you need to locate all instances where the execve system call fails, and you want to capture the error code causing the failure. Although you could use traditional kprobes to write a kernel module, what is the more modern, safer, and lower-overhead method recommended in this chapter? (Please provide the tool name or technology category, and briefly explain why.)
Answer and Explanation
Answer: Use eBPF (extended Berkeley Packet Filter) tools, such as execsnoop in BCC (BPF Compiler Collection) or a custom BCC script.
Explanation: Traditional kprobes require writing a kernel module, where code errors can cause system crashes, and reloading modules in a high-load environment is highly risky. eBPF allows sandboxed bytecode to run in the kernel, offering high safety (the verifier guarantees no crashes) and requiring no kernel recompilation. Through the BCC frontend, you can quickly write Python scripts to capture the return value (error code) of execve. This not only provides high development efficiency but is also perfectly suited for dynamic observability in production environments.
Key Takeaways
Kprobes provides a "god's-eye view" observability capability, allowing us to dynamically insert hooks at the entry or exit of kernel functions without recompiling the kernel. It mainly includes three types of handlers: the Pre-handler triggers before function execution and is commonly used to grab parameters; the Post-handler triggers after function execution and is used to check side effects or calculate execution time; the Fault-handler acts as a safety net to handle exceptions caused by the probe itself. Understanding these mechanisms is the foundation for building dynamic tracing systems, as it allows developers to monitor system behavior with minimal invasiveness.
Since the kernel lacks an independent runtime environment, using printk for debugging directly is extremely risky. Therefore, the static implementation of Kprobes relies on writing a kernel module and populating the struct kprobe structure. Developers can register a probe to any kernel function by specifying symbol_name, while also implementing the corresponding handler functions to intercept control flow. However, this approach lacks flexibility; every change to the probe target or log format requires recompiling and reloading the module, and unregister_kprobe must be executed when unloading the module, otherwise it will cause a kernel crash or resource leak.
To truly extract valuable data from a probe, simply triggering a callback is not enough—you also need a deep understanding of the processor architecture's ABI (Application Binary Interface). Function arguments are not stored in easily accessible variables; instead, they are passed via CPU registers or the stack according to architecture-specific rules (such as RDI/RSI registers on x86-64 or X0/X1 registers on ARM64). In a Pre-handler, developers must consult the struct pt_regs to manually extract argument pointers from the corresponding registers and use safe interfaces like strncpy_from_user to copy user-space data into kernel space, in order to successfully obtain critical context information like file paths.
As a complement to Kprobes, Kretprobe is specifically designed to solve the challenge of capturing function return values. Since the instruction pointer has already returned after a function exits, it's difficult for conventional means to obtain the result. Kretprobe intercepts control flow by modifying the return address on the stack frame at the function entry, catching it right before the function actually returns, thereby using the regs_return_value() macro to grab the return value from registers in a hardware-agnostic manner. This mechanism is decisive for diagnosing bugs that depend on return values, such as allocation failures or permission check errors.
Although static Kprobes are powerful, writing C modules is not only tedious but also error-prone. The modern Linux kernel provides a more elegant dynamic tracing mechanism. Through the /sys/kernel/debug/tracing interface (tracefs), developers don't need to write any kernel code; they can create kprobe events simply by writing configurations via the command line. This ftrace-based dynamic instrumentation approach abstracts low-level details into "events," which not only significantly lowers the barrier to entry but also serves as the underlying cornerstone for efficient tracing in advanced observability tools like perf and eBPF.