9.12 Further Reading and Technical Map
This chapter ends here, but your exploration is just beginning.
In this section, I've put together a "further reading map." This isn't just a list of links from the book—it's about placing them on your cognitive map, telling you which resources are the map, which are the mines, and which are the specialized weapons you draw when facing specific monsters.
Treat these resources as an extended manual for your everyday toolkit.
🗺️ The Big Picture: Understanding the Tracing Landscape
Before diving into specific commands, let's look at the forest from above. The following articles are great for building an overall understanding, especially when you're confused about the relationship between ftrace, perf, eBPF, and LTTng.
-
Unified Tracing Platform – Bringing tracing together
- Steven Rostedt (VMware), 2019
- https://static.sched.com/hosted_files/osseu19/5f/unified-tracing-platform-oss-eu-2019.pdf
- Why read it: The primary maintainer of Ftrace draws the blueprint himself. If you want to understand how these tools will eventually unify under a single architecture, this is the primary source.
-
Unifying kernel tracing
- Jack Edge, LWN, Oct 2019
- https://lwn.net/Articles/803347/
- Why read it: LWN's depth, as always. This article mainly discusses how to connect the three massive mountains of tracefs, perf, and eBPF.
-
Linux tracing systems & how they fit together
- Julia Evans (@b0rk), July 2017
- https://jvns.ca/blog/2017/07/05/linux-tracing-systems/
- Why read it: Julia's diagramming skills are top-notch. If you find text descriptions too abstract, see how she pieces together Ftrace, Uprobes, SystemTap, and more using charts.
-
Using the Linux Tracing Infrastructure
- Jan Altenberg (Linutronix GmbH), Nov 2017
- https://events.static.linuxfound.org/sites/events/files/slides/praesentation_0.pdf
- Why read it: A hands-on slide deck packed with real-world screenshots and parameter combinations.
-
The comprehensive kernel index – all articles on tracing on LWN
- https://lwn.net/Kernel/Index/#Tracing
- Why bookmark it: LWN is the news bible for kernel developers. This index page is the main entry point for all Tracing-related articles.
⚙️ Ftrace: The Kernel's Hidden Light Switch
If you want to unlock Ftrace's full potential, or simply figure out how set_ftrace_pid actually works, here are the core resources.
Official Documentation and Must-Read Classics
-
Official kernel documentation – ftrace
- https://www.kernel.org/doc/html/v5.10/trace/ftrace.html
- Role: Desk reference. So detailed you probably won't read it in one sitting, but a must-check when you run into issues.
-
Ftrace: The hidden light switch
- Brendan Gregg, Aug 2014
- https://lwn.net/Articles/608497/
- Role: Conceptual introduction. A must-read if you're interested in "why Ftrace's overhead is so low."
-
Ftrace internals
- Brendan Gregg, Oct 2019
- https://www.brendangregg.com/blog/2019-10-15/kernelrecipes-kernel-ftrace-internals.html
- Role: Hardcore kernel mechanisms. Covers the dark magic of
mcount, trampolines, and dynamic instrumentation.
Timeless Classic Series Though a bit dated, these three articles by Steven Rostedt (the father of Ftrace) remain excellent for understanding the design logic:
-
Debugging the kernel using Ftrace - part 1 & 2
-
Secrets of the Ftrace function tracer
- LWN, Jan 2010
- https://lwn.net/Articles/370423/
Beginner Guides and Cheat Sheets
-
Welcome to ftrace & the Start of Your Journey...
- Steven Rostedt, Nov 2019
- https://blogs.vmware.com/opensource/2019/11/12/ftrace-linux-kernel/
-
ftrace: trace your kernel functions!
- Julia Evans, Mar 2017
- https://jvns.ca/blog/2017/03/19/getting-started-with-ftrace/
- Role: If you just want one tutorial to get started quickly, Julia's article is all you need.
-
Ftrace cheat sheets
- Common commands cheat sheet: Linux-tipps Cheat Sheet
- General Kernel Tracing cheat sheet: lzone.de
- Tip: Print these pages out and tape them next to your monitor until you've memorized the common
echocommands.
Kernel Stack Depth Deep Dive
We mentioned the risk of stack overflows in this chapter. If you want to dive deep into kernel stack mechanisms (especially CONFIG_VMAP_STACK), these two LWN articles are must-reads:
- Virtually mapped kernel stacks
- Jon Corbet, June 2016
- https://lwn.net/Articles/692208/
- Virtually mapped stacks 2: thread_info strikes back
- Jon Corbet, June 2016
- https://lwn.net/Articles/692953/
🛠️ Toolchain: trace-cmd and KernelShark
Although we demonstrated directly manipulating tracefs, in real-world work, you'll rely more heavily on these frontend tools.
trace-cmd (Command-Line Frontend)
-
trace-cmd: A front-end for Ftrace
- Steven Rostedt, LWN, Oct 2010
- https://lwn.net/Articles/410200/
-
Kernel tracing with trace-cmd
- G Kamathe (RedHat), July 2021
- https://opensource.com/article/21/7/linux-kernel-trace-cmd
- Practical value: Includes concrete
trace-cmd recordandreportexamples.
KernelShark (The Graphical Powerhouse)
- Official KernelShark Documentation
- Swimming with the New KernelShark
- Yordan Karadzhov (VMware), 2018
- Tip: KernelShark v2 underwent a massive architectural rewrite (now Qt-based). This slide deck walks you through the new features quickly.
📦 perf-tools: Brendan Gregg's Script Toolkit
Don't let the name fool you—perf-tools is actually a massive collection of Bash script wrappers built on top of Ftrace and tracefs. In the pre-eBPF era, they were the go-to tools, and they remain highly valuable today because they "run anywhere without compilation."
-
GitHub Repository
-
Examples Directory
- https://github.com/brendangregg/perf-tools/tree/master/examples
- Tip: Go straight to the source code of these scripts—they are the best Ftrace tutorials you'll find. You can see exactly how the author assembles simple
echocommands to achieve complex functionality.
-
Linux Performance Analysis: New Tools and Old Secrets
🚀 eBPF Extensions
We discussed Kprobes and Instrumentation in Chapter 4. eBPF has completely transformed the kernel tracing landscape. While this chapter focuses on Ftrace, you need to understand their relationship: Ftrace is the foundational bedrock, while eBPF is the advanced architecture built on top.
- eBPF and frontends resources
- See the "Further reading" section in Chapter 4 (Debug via Instrumentation – Kprobes).
📡 LTTng: The Industrial-Grade Solution for High-Frequency Data
When Ftrace's overhead becomes a bottleneck under high-frequency events, or when you need to correlate user-space and kernel-space analysis, LTTng is the best choice.
-
LTTng Main Website & Quick start
-
Babeltrace 2 (CLI Tool)
- https://lttng.org/blog/2020/06/01/bt2-cli/
- Note: The CTF (Common Trace Format) data generated by LTTng is binary.
babeltrace2is the tool used to convert it into human-readable text.
-
Finding the Root Cause of a Web Request Latency
- Julien Desfossez, Feb 2015
- https://lttng.org/blog/2015/02/04/web-request-latency-root-cause/
- Case study value: Demonstrates how to combine kernel and user-space tracing to pinpoint a real-world business problem.
-
Tutorial: Remotely tracing an embedded Linux system
- C Babeux, Mar 2016
- https://lttng.org/blog/2016/03/07/tutorial-remote-tracing/
- Applicable scenario: As we mentioned at the end of this chapter, embedded boards have limited resources. This tutorial teaches you the proper approach: collect on the board, analyze on the host.
-
LTTng: A Comprehensive User's Guide (version 2.3)
- Daniel U. Thibault (DRDC Valcartier Research Centre)
- Note: This is essentially a small book. If you need a physical book (or PDF) on your desk to read cover-to-cover, this is the one.
Trace Compass (LTTng's Visual Frontend)
- Trace Compass Website
- Alternate tracing tools
- https://lttng.org/docs/v2.13/#doc-lttng-alternatives
- Comparison: Lists a comparison between LTTng and other tools (like SystemTap, perf) to help you decide when to use LTTng.
🔧 Miscellaneous: Special Scenarios and Tricks
Finally, here are a few resources that can be lifesavers in specific scenarios:
-
Boot-time tracing via ftrace
- Documentation
- Pain point solved: How do you trace the early kernel boot process? At that point,
/sysisn't even mounted yet. This article shows you how to solve this via kernel parameters (ftrace=).
-
Trace Linux System Calls with Least Impact on Performance
- PingCAP, Dec 2020
- https://en.engpic.com/blog/how-to-trace-linux-system-calls-in-production-with-minimal-impact-on-performance/
- Scenario: Performing system call tracing in production. This discusses how to gather data without dragging down your business.
Put all these tools into your toolbox. In different scenarios and for different problems, choose the sharpest knife.
In the next chapter, we'll face a heavier topic: Kernel Panic. Don't panic—with today's debugging weapons in hand, even when the kernel crashes, we can still read something useful from the corpse.
Exercises
Exercise 1: Understanding
Question: Compare the main differences between Tracing and Profiling. If a kernel developer wants to capture the complete function call flow within the kernel for a specific system call, which technique should they use and why?
Answer and Analysis
Answer: Tracing should be used.
Differences:
- Tracing is capture-based, recording all details along the code execution path (function calls, parameters, timestamps, etc.), providing a complete execution history.
- Profiling is statistical, capturing events through periodic sampling without capturing every detail, primarily used to discover performance hotspots.
Reason: Obtaining a "complete function call flow" requires recording every step of code execution, not just statistical samples, so Tracing is mandatory.
Analysis: This question tests your ability to distinguish between core concepts. As defined at the beginning of the chapter, Tracing is like a "black box" that records all details, while Profiling aims to monitor performance through sampling. Strace is Tracing at the system call boundary, while Ftrace performs Tracing deep inside the kernel. To inspect internal flows, you must use a Tracing technique with full recording capabilities.
Exercise 2: Understanding
Question: In the Linux kernel configuration, what is the purpose of the CONFIG_DYNAMIC_FTRACE option? If this option is not enabled and you use the -pg compiler option for instrumentation directly, what impact would it have on kernel performance in a production environment?
Answer and Analysis
Answer: Purpose: It allows the kernel to dynamically modify machine instructions at runtime (replacing instructions at function entry points with NOP instructions or jumps to a trampoline), thereby achieving zero performance overhead when tracing is disabled.
Impact: Without this option enabled, the kernel will retain the mcount calls inserted by the compiler. The entry point of every function call will execute conditional logic (similar to if tracing_enabled { ... }), which introduces massive CPU overhead and is unsuitable for production environments.
Analysis: This tests your understanding of Ftrace's implementation mechanism. Standard compiler instrumentation (-pg) permanently adds code to every function entry point, leading to performance degradation. Dynamic Ftrace, on the other hand, leverages the kernel's self-modifying code capability to maintain native performance by patching in NOP instructions when tracing is inactive.
Exercise 3: Application
Question: Suppose you are debugging a random kernel crash and suspect it occurs after a series of complex kernel function calls. You want to automatically capture the data in the Ftrace buffer when the crash happens, so you can analyze the execution flow leading up to it. Provide the specific steps or configuration method.
Answer and Analysis
Answer: This can be achieved through one of the following methods:
-
Boot parameter configuration: Add
ftrace_dump_on_oopsto the kernel boot parameters. For example:ftrace_dump_on_oops=1(dump to the console on Oops) orftrace_dump_on_oops=2(dump more raw buffer contents). -
Runtime configuration (Procfs): While the system is running, enable it via the Proc filesystem interface:
echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
This way, once the kernel encounters a Panic or Oops, Ftrace will automatically dump the trace log from the Ring Buffer to the console and logs for post-mortem analysis.
Analysis: This tests your ability to apply knowledge to real debugging scenarios. The concept comes from ftrace_dump_on_oops. This is a crucial technique for "post-mortem analysis" problems, because once the kernel crashes, you usually can't manually execute commands to copy the trace files—you must rely on the kernel's automatic dump mechanism.
Exercise 4: Application
Question: You are using the function_graph tracer to analyze a latency-sensitive kernel module. To filter out noise, you only want to trace functions related to that module and observe only function calls that take longer than 100 microseconds. Describe how to configure this by combining set_ftrace_filter and Ftrace's latency marker features.
Answer and Analysis
Answer: Configuration steps are as follows:
-
Set the filter: Write the module-related function names into
set_ftrace_filter.echo 'mod:*' > /sys/kernel/tracing/set_ftrace_filter(Note:mod:is a filter command used to match all functions of a specific module. If the module name is known, you can also use wildcards likemy_module_*) -
Enable latency markers: Although
function_graphitself displays the Duration, Ftrace provides a latency marker feature to visually highlight anomalies. Ensureoptions/funcgraph-cpuandoptions/funcgraph-durationare enabled (they usually are by default). In the trace output, symbols appearing to the left of the duration column (like!) indicate that the function's execution time exceeded a specific threshold (e.g.,!typically represents exceeding 100us or a higher threshold, depending on kernel configuration). -
Read and analyze:
cat /sys/kernel/tracing/trace| grep '!'
Alternatively, if using perf-tools, you can directly use the funcslower script:
./funcslower 100 (this will directly display functions exceeding 100us).
Analysis: This tests your comprehensive application of Ftrace filtering and output interpretation. The question requires combining set_ftrace_filter (narrowing the trace scope) with an understanding of latency markers (identifying long-duration functions). In practice, using the mod: filter command or wildcards is a common troubleshooting method, while observing the symbols in the Duration column (like $, +, !) is key to quickly locating performance bottlenecks.
Exercise 5: Thinking
Question: While reading Ftrace's function_graph output, you notice that a specific kernel thread (PID 123) shows function call indentation and duration in the log, but based on your understanding, that thread should have been completely asleep at the time. Analyze the possible causes of this "ghost" trace data, and explain how to use Ftrace's latency-format option to verify your hypothesis.
Answer and Analysis
Answer: Possible Cause Analysis:
The most likely cause of this phenomenon is interrupt context interference. Although the log shows the "PID 123 thread" context, it's actually recording hardware interrupt (Hard IRQ) or softirq handlers that executed in kernel mode right when the thread was scheduled out or just about to run. Under the default function_graph output, the execution of interrupt handlers is often attributed to the context of the currently interrupted process, which can be misleading.
Verification Method:
- Enable the
latency-formatoption:echo 1 > /sys/kernel/tracing/options/latency-format - Check the trace output again. This format adds a detailed column showing interrupt status, preemption counters, and context flags.
- Analyze the flags: Look at the flag bits in the new column (e.g.,
dNindicates interrupt depth,Xindicates an NMI is present, etc.). If you see interrupt flags set, or markers related toirqs-off, it proves that these functions were actually executed in an interrupt context, not initiated actively by the PID 123 thread.
This reveals a limitation of the default function_graph view: it tends to attribute the "cost" of low-level activities to the currently running task.
Analysis: This is a deep-thinking question that tests your understanding of kernel execution contexts (process context vs. interrupt context) and the inherent limitations of Tracing tools themselves. Relying solely on the PID can be misleading. Understanding the underlying hardware state provided by latency-format (such as whether interrupts are disabled or preemption is locked) is the key to distinguishing between "active process behavior" and "passive interrupt behavior."
Key Takeaways
The core of this chapter is how to use Linux kernel tracing technologies to open up the "black box" of system operation. Ftrace, as the kernel's built-in tracing engine, provides a control interface through the tracefs filesystem and leverages compiler instrumentation (like replacing function entry points with NOP instructions) to achieve zero performance overhead when tracing is disabled. Understanding the difference between Tracing and Profiling is a prerequisite for choosing the right tool: the former focuses on continuous flow details, recording every function call to answer "what exactly happened at a specific moment"; the latter focuses on statistical hotspots, using sampling to answer "where did the time go?"
To achieve effective kernel observability, simply enabling a tracer isn't enough—you must master context identification and advanced filtering techniques. By enabling options like latency-format and funcgraph-proc, users can decode kernel state codes like d.h2 to distinguish whether code is running in a process context or a hard interrupt context, and whether locks are held. At the same time, using set_ftrace_filter combined with Glob matching or index-based filtering can narrow down massive function call logs to a specific scope (such as only the TCP stack or a specific driver module). This is key to pinpointing issues amidst a flood of information.
In real-world debugging, tracing Tracepoints yields much more valuable information than simply tracing function calls. Tracepoints are pre-planted "hooks" left by kernel developers, located on critical paths like the scheduler and interrupts. Using the set_event interface to listen to these events (like net:* or skb:*) might sacrifice the hierarchical indentation graph of function calls, but it provides concrete function parameter values, which is crucial for troubleshooting issues like "packet loss due to incorrect parameters."
For the need to dynamically insert debug information, the kernel provides the trace_printk() lightweight API, which is superior to the traditional printk. trace_printk only writes to an in-memory ring buffer, avoiding the performance overhead and timing interference caused by console I/O, and it won't lose critical data due to buffer overflows. Combined with trace_pipe for real-time streaming output, developers can monitor kernel behavior like watching a live broadcast without interrupting system operation, truly achieving "microscope-level" dynamic observation of the kernel.