Skip to main content

9.12 Further Reading and Technical Map

This chapter ends here, but your exploration is just beginning.

In this section, I've put together a "further reading map." This isn't just a list of links from the book—it's about placing them on your cognitive map, telling you which resources are the map, which are the mines, and which are the specialized weapons you draw when facing specific monsters.

Treat these resources as an extended manual for your everyday toolkit.


🗺️ The Big Picture: Understanding the Tracing Landscape

Before diving into specific commands, let's look at the forest from above. The following articles are great for building an overall understanding, especially when you're confused about the relationship between ftrace, perf, eBPF, and LTTng.


⚙️ Ftrace: The Kernel's Hidden Light Switch

If you want to unlock Ftrace's full potential, or simply figure out how set_ftrace_pid actually works, here are the core resources.

Official Documentation and Must-Read Classics

Timeless Classic Series Though a bit dated, these three articles by Steven Rostedt (the father of Ftrace) remain excellent for understanding the design logic:

Beginner Guides and Cheat Sheets

Kernel Stack Depth Deep Dive We mentioned the risk of stack overflows in this chapter. If you want to dive deep into kernel stack mechanisms (especially CONFIG_VMAP_STACK), these two LWN articles are must-reads:


🛠️ Toolchain: trace-cmd and KernelShark

Although we demonstrated directly manipulating tracefs, in real-world work, you'll rely more heavily on these frontend tools.

trace-cmd (Command-Line Frontend)

KernelShark (The Graphical Powerhouse)

  • Official KernelShark Documentation
  • Swimming with the New KernelShark
    • Yordan Karadzhov (VMware), 2018
    • PDF
    • Tip: KernelShark v2 underwent a massive architectural rewrite (now Qt-based). This slide deck walks you through the new features quickly.

📦 perf-tools: Brendan Gregg's Script Toolkit

Don't let the name fool you—perf-tools is actually a massive collection of Bash script wrappers built on top of Ftrace and tracefs. In the pre-eBPF era, they were the go-to tools, and they remain highly valuable today because they "run anywhere without compilation."

  • GitHub Repository

  • Examples Directory

  • Linux Performance Analysis: New Tools and Old Secrets

    • Brendan Gregg, USENIX LISA14, Nov 2014
    • Video | Slides
    • Status: A classic talk. If you want to systematically learn the methodology of Linux performance analysis rather than just memorizing tools, watching this video is the best investment of your time.

🚀 eBPF Extensions

We discussed Kprobes and Instrumentation in Chapter 4. eBPF has completely transformed the kernel tracing landscape. While this chapter focuses on Ftrace, you need to understand their relationship: Ftrace is the foundational bedrock, while eBPF is the advanced architecture built on top.

  • eBPF and frontends resources
    • See the "Further reading" section in Chapter 4 (Debug via Instrumentation – Kprobes).

📡 LTTng: The Industrial-Grade Solution for High-Frequency Data

When Ftrace's overhead becomes a bottleneck under high-frequency events, or when you need to correlate user-space and kernel-space analysis, LTTng is the best choice.

  • LTTng Main Website & Quick start

  • Babeltrace 2 (CLI Tool)

  • Finding the Root Cause of a Web Request Latency

  • Tutorial: Remotely tracing an embedded Linux system

  • LTTng: A Comprehensive User's Guide (version 2.3)

    • Daniel U. Thibault (DRDC Valcartier Research Centre)
    • PDF
    • Note: This is essentially a small book. If you need a physical book (or PDF) on your desk to read cover-to-cover, this is the one.

Trace Compass (LTTng's Visual Frontend)


🔧 Miscellaneous: Special Scenarios and Tricks

Finally, here are a few resources that can be lifesavers in specific scenarios:


Put all these tools into your toolbox. In different scenarios and for different problems, choose the sharpest knife.

In the next chapter, we'll face a heavier topic: Kernel Panic. Don't panic—with today's debugging weapons in hand, even when the kernel crashes, we can still read something useful from the corpse.


Exercises

Exercise 1: Understanding

Question: Compare the main differences between Tracing and Profiling. If a kernel developer wants to capture the complete function call flow within the kernel for a specific system call, which technique should they use and why?

Answer and Analysis

Answer: Tracing should be used.

Differences:

  1. Tracing is capture-based, recording all details along the code execution path (function calls, parameters, timestamps, etc.), providing a complete execution history.
  2. Profiling is statistical, capturing events through periodic sampling without capturing every detail, primarily used to discover performance hotspots.

Reason: Obtaining a "complete function call flow" requires recording every step of code execution, not just statistical samples, so Tracing is mandatory.

Analysis: This question tests your ability to distinguish between core concepts. As defined at the beginning of the chapter, Tracing is like a "black box" that records all details, while Profiling aims to monitor performance through sampling. Strace is Tracing at the system call boundary, while Ftrace performs Tracing deep inside the kernel. To inspect internal flows, you must use a Tracing technique with full recording capabilities.

Exercise 2: Understanding

Question: In the Linux kernel configuration, what is the purpose of the CONFIG_DYNAMIC_FTRACE option? If this option is not enabled and you use the -pg compiler option for instrumentation directly, what impact would it have on kernel performance in a production environment?

Answer and Analysis

Answer: Purpose: It allows the kernel to dynamically modify machine instructions at runtime (replacing instructions at function entry points with NOP instructions or jumps to a trampoline), thereby achieving zero performance overhead when tracing is disabled.

Impact: Without this option enabled, the kernel will retain the mcount calls inserted by the compiler. The entry point of every function call will execute conditional logic (similar to if tracing_enabled { ... }), which introduces massive CPU overhead and is unsuitable for production environments.

Analysis: This tests your understanding of Ftrace's implementation mechanism. Standard compiler instrumentation (-pg) permanently adds code to every function entry point, leading to performance degradation. Dynamic Ftrace, on the other hand, leverages the kernel's self-modifying code capability to maintain native performance by patching in NOP instructions when tracing is inactive.

Exercise 3: Application

Question: Suppose you are debugging a random kernel crash and suspect it occurs after a series of complex kernel function calls. You want to automatically capture the data in the Ftrace buffer when the crash happens, so you can analyze the execution flow leading up to it. Provide the specific steps or configuration method.

Answer and Analysis

Answer: This can be achieved through one of the following methods:

  1. Boot parameter configuration: Add ftrace_dump_on_oops to the kernel boot parameters. For example: ftrace_dump_on_oops=1 (dump to the console on Oops) or ftrace_dump_on_oops=2 (dump more raw buffer contents).

  2. Runtime configuration (Procfs): While the system is running, enable it via the Proc filesystem interface: echo 1 > /proc/sys/kernel/ftrace_dump_on_oops

This way, once the kernel encounters a Panic or Oops, Ftrace will automatically dump the trace log from the Ring Buffer to the console and logs for post-mortem analysis.

Analysis: This tests your ability to apply knowledge to real debugging scenarios. The concept comes from ftrace_dump_on_oops. This is a crucial technique for "post-mortem analysis" problems, because once the kernel crashes, you usually can't manually execute commands to copy the trace files—you must rely on the kernel's automatic dump mechanism.

Exercise 4: Application

Question: You are using the function_graph tracer to analyze a latency-sensitive kernel module. To filter out noise, you only want to trace functions related to that module and observe only function calls that take longer than 100 microseconds. Describe how to configure this by combining set_ftrace_filter and Ftrace's latency marker features.

Answer and Analysis

Answer: Configuration steps are as follows:

  1. Set the filter: Write the module-related function names into set_ftrace_filter. echo 'mod:*' > /sys/kernel/tracing/set_ftrace_filter (Note: mod: is a filter command used to match all functions of a specific module. If the module name is known, you can also use wildcards like my_module_*)

  2. Enable latency markers: Although function_graph itself displays the Duration, Ftrace provides a latency marker feature to visually highlight anomalies. Ensure options/funcgraph-cpu and options/funcgraph-duration are enabled (they usually are by default). In the trace output, symbols appearing to the left of the duration column (like !) indicate that the function's execution time exceeded a specific threshold (e.g., ! typically represents exceeding 100us or a higher threshold, depending on kernel configuration).

  3. Read and analyze: cat /sys/kernel/tracing/trace | grep '!'

Alternatively, if using perf-tools, you can directly use the funcslower script: ./funcslower 100 (this will directly display functions exceeding 100us).

Analysis: This tests your comprehensive application of Ftrace filtering and output interpretation. The question requires combining set_ftrace_filter (narrowing the trace scope) with an understanding of latency markers (identifying long-duration functions). In practice, using the mod: filter command or wildcards is a common troubleshooting method, while observing the symbols in the Duration column (like $, +, !) is key to quickly locating performance bottlenecks.

Exercise 5: Thinking

Question: While reading Ftrace's function_graph output, you notice that a specific kernel thread (PID 123) shows function call indentation and duration in the log, but based on your understanding, that thread should have been completely asleep at the time. Analyze the possible causes of this "ghost" trace data, and explain how to use Ftrace's latency-format option to verify your hypothesis.

Answer and Analysis

Answer: Possible Cause Analysis: The most likely cause of this phenomenon is interrupt context interference. Although the log shows the "PID 123 thread" context, it's actually recording hardware interrupt (Hard IRQ) or softirq handlers that executed in kernel mode right when the thread was scheduled out or just about to run. Under the default function_graph output, the execution of interrupt handlers is often attributed to the context of the currently interrupted process, which can be misleading.

Verification Method:

  1. Enable the latency-format option: echo 1 > /sys/kernel/tracing/options/latency-format
  2. Check the trace output again. This format adds a detailed column showing interrupt status, preemption counters, and context flags.
  3. Analyze the flags: Look at the flag bits in the new column (e.g., dN indicates interrupt depth, X indicates an NMI is present, etc.). If you see interrupt flags set, or markers related to irqs-off, it proves that these functions were actually executed in an interrupt context, not initiated actively by the PID 123 thread.

This reveals a limitation of the default function_graph view: it tends to attribute the "cost" of low-level activities to the currently running task.

Analysis: This is a deep-thinking question that tests your understanding of kernel execution contexts (process context vs. interrupt context) and the inherent limitations of Tracing tools themselves. Relying solely on the PID can be misleading. Understanding the underlying hardware state provided by latency-format (such as whether interrupts are disabled or preemption is locked) is the key to distinguishing between "active process behavior" and "passive interrupt behavior."


Key Takeaways

The core of this chapter is how to use Linux kernel tracing technologies to open up the "black box" of system operation. Ftrace, as the kernel's built-in tracing engine, provides a control interface through the tracefs filesystem and leverages compiler instrumentation (like replacing function entry points with NOP instructions) to achieve zero performance overhead when tracing is disabled. Understanding the difference between Tracing and Profiling is a prerequisite for choosing the right tool: the former focuses on continuous flow details, recording every function call to answer "what exactly happened at a specific moment"; the latter focuses on statistical hotspots, using sampling to answer "where did the time go?"

To achieve effective kernel observability, simply enabling a tracer isn't enough—you must master context identification and advanced filtering techniques. By enabling options like latency-format and funcgraph-proc, users can decode kernel state codes like d.h2 to distinguish whether code is running in a process context or a hard interrupt context, and whether locks are held. At the same time, using set_ftrace_filter combined with Glob matching or index-based filtering can narrow down massive function call logs to a specific scope (such as only the TCP stack or a specific driver module). This is key to pinpointing issues amidst a flood of information.

In real-world debugging, tracing Tracepoints yields much more valuable information than simply tracing function calls. Tracepoints are pre-planted "hooks" left by kernel developers, located on critical paths like the scheduler and interrupts. Using the set_event interface to listen to these events (like net:* or skb:*) might sacrifice the hierarchical indentation graph of function calls, but it provides concrete function parameter values, which is crucial for troubleshooting issues like "packet loss due to incorrect parameters."

For the need to dynamically insert debug information, the kernel provides the trace_printk() lightweight API, which is superior to the traditional printk. trace_printk only writes to an in-memory ring buffer, avoiding the performance overhead and timing interference caused by console I/O, and it won't lose critical data due to buffer overflows. Combined with trace_pipe for real-time streaming output, developers can monitor kernel behavior like watching a live broadcast without interrupting system operation, truly achieving "microscope-level" dynamic observation of the kernel.