9.7 Hands-on: Tracing a Single Ping Request with the set_event Interface
In the previous section, we used the available_filter_functions "cast a wide net" approach to extract the function call graph of the entire network stack. While intuitive, it's like turning on everyone's microphone in a building just to hear what two people are saying—too much noise, too much information overload.
Actually, ftrace offers another perspective. Instead of watching "functions," it watches "events."
This isn't just about swapping interfaces—the kernel we see through these two methods represents two completely different facets.
From Functions to Events
Recall the workflow from the previous section: we wrote function names into set_ftrace_filter and told ftrace to "watch these functions." This is intuitive, but it has a catch: you have to know the function names beforehand.
For those "key nodes" already defined in the kernel, why should we have to guess the function names? Kernel developers have already planted the markers for us. These markers are Tracepoints.
Think of Tracepoints as induction loops buried under a highway. The previous method (
available_filter_functions) is like stationing someone on the roadside to watch every single car; whereasset_eventdirectly reads the records from the induction loops.But be careful with this analogy: induction loops only tell you "a car passed," lacking the rich context of a physical stakeout. In ftrace, this means you'll see specific parameter values, but you might lose the indentation hierarchy of function calls.
All these events are neatly organized under the /sys/kernel/tracing/events directory.
Take a look at this figure (Figure 9.16)—this is the kernel's "event supermarket":
Figure 9.16 – Contents of the /sys/kernel/tracing/events directory—all kernel tracepoints
Remember the dynamic kprobes we discussed in Chapter 4 (Debug via Instrumentation – Kprobes)? We actually used this event tracing framework back then. At the time, we used it to trace functions without built-in tracepoints; now, we are using the kernel's native, "stock" tracepoints.
Narrowing the Scope—Focusing on Networking
If we only care about networking, the net, sock, skb, tcp, and udp subdirectories under events are our targets.
The syntax for using the set_event interface is very simple: just echo a "system:event" string into it. If we want to enable all events under a specific subsystem at once, the * wildcard comes in handy.
This is much cleaner than the script from the previous section. We just need to set FILTER_VIA_AVAIL_FUNCS to 0, and the script will take a different branch:
// ch9/ftrace/ping_ftrace.sh
[...]
FILTER_VIA_AVAIL_FUNCS=0
echo "[+] Function filtering:"
if [ ${FILTER_VIA_AVAIL_FUNCS} -eq 1 ] ; then
[... 上一节的逻辑 ...]
else
# 通过 set_event 接口进行过滤
# 这种方法很快,但细节会少一些!
# 我们会失去函数图的缩进,但能看到函数参数!
echo " Alternate event-based filtering (via set_event):"
echo 'net:* sock:* skb:* tcp:* udp:* napi:* qdisc:* neigh:* syscalls:*' >> set_event
fi
Notice that I added syscalls:* here. Why look at system calls? Because we need context. Tracing the ins and outs of network packets without knowing who sent them is like watching a movie with no main character.
Output Results—Trade-offs
Run this script, and then initiate another Ping. This time, the content in the trace file looks completely different.
Here is a partial output (Figure 9.17):
Figure 9.17 – ftrace report obtained via the set_event interface (filtered)
See the difference?
We lost some things, but we gained others.
❌ Lost: That beautiful, code-indentation-like function call hierarchy is gone. It's as if the call stack was flattened; you can't tell at a glance whether udp_sendmsg is nested inside ip_send_fill_frag.
✅ Gained: Detailed information is appended to each line—the values of the function parameters. This is crucial. For debugging, knowing "this function was called" is only the first step; knowing "what the buffer pointer passed in was" is what ultimately settles the issue.
If we use the trace-cmd tool (covered in the next section) without the function_graph plugin, it prints the parameters by default, which is consistent with the behavior we see here.
Returning to the "highway induction loop" analogy: In the report output now, each parameter value is like a photo snapped by the induction loop. You might not see how the car drove in (lost the indentation hierarchy), but you can clearly see the license plate and the payload (gained the parameter values). For troubleshooting issues like "why did a wrong parameter cause packet drops," this perspective is often much more effective than staring at thousands of lines of indented graphs.
Quick Debugging Insertions—trace_printk()
Speaking of debugging parameters, here is an incredibly useful API to interject: trace_printk().
Many kernel novices like to use printk() for logging, only to realize when they run it: "Damn, the logging is so slow it messes up the system timing," or even worse, "The printk() buffer overflowed and the critical information is lost."
trace_printk() exists specifically to solve this pain point. Its syntax is exactly the same as printf():
trace_printk("myprj: at %s:%s():%d\n",__FILE__, __func__, __LINE__);
The fundamental differences from printk are:
- Speed: It only writes to memory (ftrace's ring buffer) and never touches the console device. This is critical when handling "lightning-fast" code paths like interrupts; otherwise, you get Heisenbugs—observing it changes its behavior.
- Capacity: ftrace's buffer is typically much larger than printk's log buffer, and it's adjustable.
- How to read it: You won't see it in
dmesg. You must read it from/sys/kernel/tracing/traceortrace_pipe.
Here's another small trick: if you use the function_graph tracer, the output from trace_printk() will appear as comments (/* ... */) mixed into the function call flow, making it very easy to pinpoint exactly which step printed the message.
⚠️ Warning The kernel documentation explicitly states that this is for debugging use only. Do not leave it in production code.
If you only need to print a fixed string (no formatting required), using trace_puts() is faster. The kernel developers even thought of that.
If you want to disable this feature (to prevent trace_printk from polluting your trace), you can write notrace_printk into trace_options, or write 0 into options/trace_printk.