Skip to main content

4.7 Dynamic kprobe Event Tracing on Kernel Modules

In the previous section, we were still looking at kernel stack traces. Now, let's switch to a different scenario.

So far, most of the scenarios we've encountered have been either "the kernel has already crashed, go put out the fire" or "recompiling a module ten thousand times just to add a print statement." But in real-world production environments—especially when your users are relying on a critical service to make money—you often can't just reboot the machine or arbitrarily unload a module.

What you need here is the ability to perform surgery on a live system: dynamic probe insertion.

In this section, we'll perform surgical observation on a running kernel module using an x86_64 Ubuntu virtual machine (simulating a production environment). No code changes, no recompilation, and no downtime required.


Preparation: Catching the Test Subject

First, we need a test subject. Let's load the miscdrv_rdwr driver module we wrote in Chapter 3.

$ cd <lkd-src-tree>/ch3/miscdrv_rdwr
$ ../../lkm
Usage: lkm name-of-kernel-module-file (without the .c)
$ ../../lkm miscdrv_rdwr
Version info:
Distro: Ubuntu 20.04.3 LTS
Kernel: 5.10.60-prod01
[...]
$ sudo dmesg
------------------------------
[ 1987.178246] miscdrv_rdwr:miscdrv_rdwr_init(): LLKD misc driver (major # 10) registered, minor# = 58, dev node is /dev/llkd_miscdrv_rdwr
$

The module is in place. Next, let's confirm it's within our "field of vision."

In the world of dynamic probes, the first prerequisite is that the target function must be in the kernel's global symbol table. If it's a private static function, the probe won't be able to find it (unless you use source-level fgraph). Standard functions registered via module_init, like ours, will typically obediently register themselves in /proc/kallsyms.

Let's search for them:

$ sudo grep miscdrv /proc/kallsyms
ffffffffc0562000 t write_miscdrv_rdwr [miscdrv_rdwr]
ffffffffc0562982 t write_miscdrv_rdwr.cold [miscdrv_rdwr]
ffffffffc0562290 t open_miscdrv_rdwr [miscdrv_rdwr]
ffffffffc0562480 t close_miscdrv_rdwr [miscdrv_rdwr]
ffffffffc0562650 t read_miscdrv_rdwr [miscdrv_rdwr]
ffffffffc05629b5 t read_miscdrv_rdwr.cold [miscdrv_rdwr]
[...]

They're all there.

By the way, you might have noticed that some functions have a .cold suffix attached to them. What is that?

That's GCC's doing. .cold is a compiler attribute that tells the CPU: "This function will hardly ever run, so shove it into a corner and keep it away from the hot code." This is an optimization strategy called "hot-cold separation." Looking at the output above, our read and write have both normal versions and .cold versions—the compiler decided that certain error-handling paths were "cold" and gave them their own separate treatment.

This gives us an important tip: when setting up a probe, it's best to confirm exactly which version you're hooking. Otherwise, you might end up staring at a cold path all day and catch nothing.


Setting the Hook: Dynamic kprobe Enters

Now, let's begin the surgery.

Open a terminal and navigate to the tracefs command center:

cd /sys/kernel/tracing

We want to attach a probe to the write_miscdrv_rdwr function. This is where we use the kprobe_events control file. You can think of it as a "surgery form" where we specify exactly where to cut and how.

Write this command into it:

echo "p:mymiscdrv_wr write_miscdrv_rdwr" >> kprobe_events
  • p:: Indicates this is a kprobe (entry point).
  • mymiscdrv_wr: A custom name we give to this probe. Choose a good name so it's easy to find later.
  • write_miscdrv_rdwr: The target function name.

Confirm that it was written successfully:

# cat kprobe_events
p:kprobes/mymiscdrv_wr write_miscdrv_rdwr
#

The system has replied: OK, the probe is registered, but the safety is still on.

Now, take the safety off:

echo 1 > events/kprobes/mymiscdrv_wr/enable

The Real Test: Witnessing Data Flow

The preparation is done. Now it's time for the magic. We need two terminal windows to work together.

Window A (Observer):

Stay in the /sys/kernel/tracing directory and run:

cat trace_pipe

This command will wait patiently like a hunter at a tree. As soon as the probe triggers, data will immediately flow out.

Window B (Trigger):

In another terminal, run our userspace test program and write something to the device. This step will definitely call the driver's write function—the exact spot where we laid our ambush.

$ ./rdwr_test_secret w /dev/llkd_miscdrv_rdwr "dyn kprobes event tracing is awesome"

The moment you hit Enter, Window A should instantly flash a log entry.

Let's "review the footage" (even though there's no screenshot here, we can picture that screen):

You'll see records related to mymiscdrv_wr pop up in the log. This shows the typical format of trace output: timestamp, CPU number, process info, and most importantly—the function was hit.

What does this prove? It proves that without modifying any driver code, without recompiling the kernel, and without rebooting the machine, we precisely captured a function call inside a kernel module.

This is the aesthetics of dynamic kprobes: non-intrusiveness.


Cleaning Up: Defusing the Bomb

The experiment is over. Don't leave the probe there, or it will needlessly consume performance and fill up the trace buffer. Remember to clean up the scene.

# echo 0 > events/kprobes/mymiscdrv_wr/enable
# echo "-:mymiscdrv_wr" >> kprobe_events
# cat kprobe_events
#

The - prefix means delete. Finally, clear the trace buffer to make room for the next experiment:

# echo > trace

A More Elegant Approach: kprobe-perf Scripts

To be honest, while the echo operations we just did help you understand the underlying principles, using them in practice feels a lot like drilling a hole with a hand crank.

In real-world engineering, we use well-packaged tools. Remember the kprobe-perf script we mentioned earlier? It essentially automates that long string of commands and does a much nicer job of formatting the output. It's almost a one-liner and perfectly suited for your debug toolbox.

But knowing the underlying principles is crucial—when a script doesn't work, you need to know whether the kprobe_events was written incorrectly or if it's a permissions issue.


Advanced: Don't Forget Uprobes

Since we can hook kernel functions, what about userspace programs?

The answer is yes. The userspace counterpart to this mechanism is called Uprobes. Its usage is almost identical to kprobes, and it's also configured through the tracefs interface (uprobe_events).

This means you can use the same approach to debug a running Nginx or MySQL instance without restarting them. For detailed usage of Uprobes, refer to Documentation/trace/uprobetracer.txt in the kernel documentation.


Capturing Return Values: kretprobe in Action

Often, just knowing "the function was called" isn't enough. We also want to know: "What did it return?"

For example, with do_sys_open in the kernel, I want to know the file descriptor it returns after successfully opening a file. This is where kretprobe (return probe) comes in.

We can achieve this very easily using the kprobe-perf script. Look at this example:

rpi # kprobe-perf 'r:do_sys_open ret=$retval'
Tracing kprobe do_sys_open. Ctrl-C to end.
kprobe-perf-2287 [000] d... 13013.021003: do_sys_open: (sys_openat+0x1c/0x20 <- do_sys_open) ret=0x3
<...>-2289 [000] d... 13013.027167: do_sys_open: (sys_openat+0x1c/0x20 <- do_sys_open) ret=0x3
<...>-2289 [000] d... 13013.027504: do_sys_open: (sys_openat+0x1c/0x20 <- do_sys_open) ret=0x3
^C
Ending tracing...
rpi #

The key here is ret=$retval. It tells kretprobe: grab the function's return value, store it in the variable ret, and print it out.

Look at the ret=0x3 in the output. This makes perfect sense. In a process's context, file descriptors 0, 1, and 2 are typically the established seats for stdin, stdout, and stderr. The newly opened file naturally received 3.

Now let's look at the parenthetical part of this output line:

(sys_openat+0x1c/0x20 <- do_sys_open)

This line contains rich call stack information:

  1. Our probe point do_sys_open is returning.
  2. Returning to the sys_openat function.
  3. The <func>+off/len format of +0x1c/0x20 reveals instruction-level details:
    • off (0x1c): The offset of the call return point relative to the start address of the sys_openat function.
    • len (0x20): The kernel's estimated total length of this function.

The rest of the line is standard ftrace format, which you should be quite familiar with by now.


Limitless Potential: Diving into Kernel Structures

If I said that's all there is, I'd definitely be lying to you.

This function-based dynamic kprobes framework is actually a portal to the interior of the kernel. As long as you understand the memory layout of kernel data structures well enough (via offsets), you can even directly print the values of fields inside a specific struct right from a probe.

Steven Rostedt (the primary author of ftrace) once demonstrated in his slides how to peel back function parameters and struct member values layer by layer through dynamic probes, just like peeling an onion. Now that's true dark magic.


Summary

In this section, we shifted our focus to kernel modules and verified the power of dynamic kprobe event tracing on kernel code that we didn't write ourselves.

We took away two main points:

  1. Practical confirmation: Any module function exported to the global symbol table can become a target for dynamic kprobes.
  2. The power of kretprobe: It's not just about watching the entry point—we can also catch the exit and grab the function's return value, which is crucial for debugging logic errors.

Before we wrap up the next chapter, we have one last highly practical scenario to cover: process tracing. This goes beyond debugging—it's an audit-log-like capability that lets us know exactly who is doing what in the system.