Skip to main content

9.8 Ftrace Miscellanea and Lingering Questions (FAQ)

There are a few scattered but crucial topics left to cover regarding ftrace. Rather than just dumping them in a list, let's use a more intuitive format—an FAQ—to put the final pieces of this puzzle together.

These are all "wait, why isn't this working?" moments I've personally run into.

What exactly does that README say?

You might have been looking for a "quick start guide" for ftrace. It's actually hidden right there in the kernel source tree, mounted directly on your system.

Just cat it:

sudo cat /sys/kernel/tracing/README

This is a mini-HOWTO. You'll see a large block of text containing all the basic ftrace usage. Don't expect it to read like a novel, but it is the most authoritative "cheat sheet" you'll find.

Why do my files look different from yours?

Sometimes you follow a tutorial, only to find that a certain file doesn't exist under /sys/kernel/tracing/, or some options simply refuse to appear.

This is completely normal—don't doubt your sanity.

Because tracefs is a pseudo-filesystem that is part of the kernel itself. What you see depends entirely on two things:

  1. CPU Architecture: x86_64 is usually the first-class citizen, getting the fullest features and fastest updates. If you're working on ARM or RISC-V, some advanced features might not have been ported over yet.
  2. Kernel Version: This text is based on 5.10.60. If you're using 4.x or a very new 6.x, the interfaces might have changed.

If you feel like something is missing, check these two points first.

I want "live data"—can I continuously stream it like a log?

In the previous section, we mentioned trace, which is a static snapshot file. But if you want to see flowing, "happening-right-now" data, you need to use trace_pipe.

Think of it like video:

  • trace: A photograph (static snapshot).
  • trace_pipe: A live stream.

You can use tail -f or any custom script to read it:

tail -f /sys/kernel/tracing/trace_pipe

And since it's a "stream," you can pipe it directly into other tools just like a water hose. For example, if you only want to see traces containing "usb":

cat /sys/kernel/tracing/trace_pipe | grep usb

This is incredibly useful when tracking down sudden storms in a specific subsystem.

Can I control the ftrace switch from within my code?

This is a hardcore requirement. Sometimes, in a specific driver function, you want to immediately disable tracing as soon as you detect an anomaly, preventing subsequent junk logs from washing away the crucial information.

You absolutely can. The kernel provides two APIs for this (note that they are GPL-exported):

  • tracing_on(): Enable tracing.
  • tracing_off(): Disable tracing.

This is essentially the code equivalent of writing 1 or 0 to the tracing_on pseudo-file. You can write it in a kernel module like this:

/* 检测到故障点,立马冻结现场 */
tracing_off();

⚠️ Warning There is also a higher-level switch: /proc/sys/kernel/ftrace_enabled. Writing 0 here shuts down the entire ftrace system. This is a "nuclear" level operation—don't use it lightly.

When the system crashes, where does my trace go?

This is a pain point, but also one of ftrace's greatest strengths.

Imagine this scenario: the system suddenly panics, or triggers an Oops. At this point, you have powerful kdump and crash tools at your disposal, which can help you save a memory snapshot of the kernel crash.

But—a snapshot only captures the exact moment of "death." It tells you "it died here," but not "how it got here." This is the equivalent of a post-mortem missing the medical history.

This is where ftrace_dump_on_oops comes in.

The design philosophy behind this mechanism is very simple: since the system crashed and can't write to files anymore, we just dump the contents of the ftrace buffer directly to the console and kernel log. Because kdump captures these logs as well, you can look at the crash scene post-mortem and simultaneously see its final trajectory before death.

Enabling it is simple—just write 1 to a proc file:

echo 1 > /proc/sys/kernel/ftrace_dump_on_oops

The default is 0 (disabled).

If you want it prepped and ready at boot, you can also add it to the kernel boot parameters:

ftrace_dump_on_oops

Here's an advanced trick. If you're on a multi-core system, you might only care about the trace from the CPU that triggered the Oops, without wanting the noise from other CPUs. You can use this parameter:

ftrace_dump_on_oops=orig_cpu

This will only dump the ftrace buffer of the "offending CPU," making the output much cleaner.

What are those latency-measuring tracers for?

You've probably seen a bunch of weird names in available_tracers: irqsoff, preemptoff, wakeup... These are the "special forces" of ftrace, dedicated to measuring system latency.

Why do we need them?

Ideally, interrupts and preemption should toggle as fast as lightning. But in reality, driver code might accidentally keep interrupts disabled for too long. Once that exceeds a few dozen microseconds, system real-time performance is ruined—audio stutters, and the network drops packets.

A few key players:

  • irqsoff: Specifically catches "bad guys" who keep interrupts disabled for too long. It records the maximum duration hardware interrupts were disabled and tells you exactly which function caused it.
  • preemptoff: Specifically catches situations where preemption is disabled for too long. This is another source of latency.
  • preemptirqsoff: The ultimate combination of the above two. As long as the kernel cannot schedule (whether due to disabled interrupts or disabled preemption), it handles it.
  • wakeup / wakeup_rt: Focuses on scheduling latency. How long is the gap between a task waking up and actually running? This is critical for real-time systems.

How do we use them?

Just like normal ftrace, switch over to it:

echo irqsoff > /sys/kernel/tracing/current_tracer

Then put your system under heavy load. As soon as any code keeps interrupts disabled for too long, ftrace will catch the call stack of the "culprit."

How long is "too long"?

For hardware interrupts or kernel preemption, anything over 10 microseconds starts warranting attention; if it hits the millisecond range, that's a serious incident.

Many driver bugs are caught exactly this way—you think that spinlock is only held for a few lines of code, but it turns out there's a hidden memcpy inside, and it's uncached, directly disabling interrupts for several milliseconds.

Can I run multiple traces simultaneously?

By default, you only have one global buffer. But ftrace has a very cool Instances model.

You can think of the instances directory as a "hypervisor." Under this directory, you can mkdir as many subdirectories as you want, and each subdirectory is a completely independent ftrace environment.

cd /sys/kernel/tracing/instances
mkdir my_test_trace

Now you can see another complete set of trace, trace_pipe, tracing_on... under /sys/kernel/tracing/instances/my_test_trace/.

What's the point of this? It's incredibly useful. Suppose you want to use function_graph to trace the network stack, while simultaneously using irqsoff to monitor interrupt latency. With only a single global buffer, these two would clash. With instances, you can isolate them and let each do its own job.

This is the ultimate weapon for tackling complex system issues. Steven Rostedt (the primary author of ftrace) has a great talk specifically about this—highly recommended to look it up.