Skip to main content

9.2 The Panorama of Kernel Tracing Technologies

Don't rush to type commands just yet.

We have our tools and our directory, but before we actually get our hands dirty with ftrace, it's best to take a step back and get a clear picture of what we're dealing with. If you think of tracing tools as mere "recorders," you'll easily get lost in the configuration options later.

The world of kernel tracing is full of acronyms and subsystems that look like a pile of scattered parts. In reality, it's a finely engineered machine. Once we understand how this machine is built—where the data comes from, how it's collected, and how it's presented—all those seemingly mysterious configuration files that follow become nothing more than logical connectors.


Data Sources and Frontends — Peeling Back the Layers of Tools

First, a fundamental question: where exactly does the data we want to trace come from?

In the previous section, you installed lttng and modified kernel parameters, but you didn't see the "data" itself. The data is hidden in the execution details of the kernel. To capture it, we need data sources.

We're actually not unfamiliar with the most common categories of data sources in the kernel:

  • Tracepoints: These are "hooks" pre-planted by kernel developers. At certain key locations in the kernel (such as when the scheduler performs a context switch, or when an interrupt arrives), developers have written code to tell the system, "something happened here." The list of these tracepoints sits under /sys/kernel/tracing/events/.
    • We actually touched on this back in Chapter 4 when we tinkered with Kprobes—those were "dynamic kprobes," which is equivalent to you temporarily drilling a hole in a place that didn't have a hook.
  • Kprobes / Uprobes: These are not just debugging tools, but also powerful data sources for tracing. Kprobes handle kernel space, while Uprobes handle user space. They can intercept the flow of execution at almost any address.
  • LTTng Modules and USDT: LTTng has its own set of kernel modules and user space probes, designed for more efficient, lower-latency data capture.

Having data sources is only the first step. Raw data is scattered throughout kernel memory, and we need an infrastructure to extract it.

This is why your system has the /sys/kernel/tracing directory. This is tracefs, the "control panel" of ftrace. Tools read and write these pseudo-files to tell the kernel: start recording, stop recording, only record the schedule function, or record with more detail.

Finally, as humans, directly reading those binary buffer files in cat is pure torture. So we also need frontends.

This forms the three-layer pyramid of the entire Linux tracing technology stack:

  1. Bottom Layer: Data Sources. The signals generated by the kernel.
  2. Middle Layer: Infrastructure. Mechanisms like ftrace, tracefs, and perf_events, responsible for pulling signals out of the kernel and organizing them.
  3. Top Layer: Frontends/Tools. Tools like trace-cmd and KernelShark, or those perf-tools scripts we used in Chapter 4. They type the tedious low-level commands for you and turn the data into charts and graphs you can actually understand.

There's a very important mindset shift here: these technologies are not competing with each other.

You might have previously thought, "Should I use ftrace or perf?" But the answer now is, "Why not both?" Julia Evans has a classic blog post titled "Linux tracing systems & how they fit together." If you haven't read it, I highly recommend searching for it. The diagram she drew explains everything clearly: modern Linux tracing technologies share code and ideas. The underlying ftrace data can be viewed through tools like perf, or through trace-cmd, or even analyzed via the LTTng ecosystem.

This is exactly the "unified tracing platform" vision that kernel stalwart Steven Rostedt (the original author of ftrace) has been advocating for years. He once presented a diagram (shown below) illustrating how these technologies are intricately yet harmoniously intertwined:

[Figure 9.1 – Schematic of the Linux Tracing Infrastructure] (Original book figure: Shows Tracepoints/Probes as data sources, connected through interfaces like tracefs/debugfs to underlying tools like ftrace/perf, which in turn support various frontend scripts and GUI tools above them)

[Figure 9.2 – The Rich Linux Tracing Ecosystem] (Original book figure: Steven Rostedt's slide, showing how underlying technologies are reused by upper-layer tools)

See? The underlying "meat" is the same; only the outer "skin" is different. This means that when we learn ftrace, we are actually learning a universal language.


Stepping into ftrace Territory

Alright, we've looked at the theoretical architecture diagram. Now let's step through the door.

The star of this chapter is ftrace. It's the tracer built into the Linux kernel, and the f in its name originally stood for function (function), because it was first designed to trace kernel function call graphs. But today's ftrace goes far beyond that—it's a generic tracing engine.

Starting with Linux kernel 4.1, ftrace's standard operational interface is implemented through a virtual filesystem called tracefs.

Don't let the name intimidate you. You can think of it as a filesystem dedicated to "transmitting control commands." Just as /proc carries process information, /sys carries hardware parameters, and tracefs carries tracing control instructions.

Checking the Mount Point

Under normal circumstances, modern Linux distributions (especially those using Systemd) will automatically mount tracefs for you at boot.

You can use this command to verify its existence:

mount | grep "^tracefs"

If everything is normal, you should see output similar to this:

tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
tracefs on /sys/kernel/debug/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)

Notice that there are two mount points here.

  1. /sys/kernel/tracing: This is the new standard (4.1+ kernels). Regardless of whether your debugfs is allowed to be mounted, this point should be present.
  2. /sys/kernel/debug/tracing: This is the old tradition. Previously, ftrace was mounted as a subtree under debugfs.

A Pitfall in Production Environments

Why maintain two mount points? It's not just for nostalgia.

Imagine you're running a production machine with strict security policies. For security reasons, a system administrator might disable mounting debugfs via the kernel configuration CONFIG_DEBUG_FS_DISALLOW_MOUNT. Because debugfs exposes too many kernel internals, it is indeed a risk for production machines.

However, debugging performance issues in production is a hard requirement. So, the kernel separated tracefs out. Even if debugfs is disabled, /sys/kernel/tracing can still be mounted independently, allowing you to access only the tracing-related interfaces without touching those sensitive kernel debugging details.

This is why in this book (assuming you're using a newer 5.10 kernel), we default to using /sys/kernel/tracing as our working directory.

If you're working on a very old system (older than 4.1), you might only have /sys/kernel/debug/tracing as an option. Although the paths differ, the file contents inside are the same. In the following sections, when we say "switch to the tracing directory," our default action is:

cd /sys/kernel/tracing

Alternatively, if you want to ensure you find the correct path on your current system, you can take a more brute-force approach:

cd /sys/kernel/debug/tracing 2>/dev/null || cd /sys/kernel/tracing

This directory is the birthplace of all the magic we'll work with next. In this directory, every file is not meant for storing data, but rather acts as a switch, knob, or display on a control panel.

Ready? Hands on the keyboard. In the next section, we're going to start flipping these switches.