Chapter 1: Setting Up the Debugging Environment
This chapter contains 10 sections. Click the links below to read them:
Chapter 1: On the Edge of Low-Level Disorder
1.1 Technical Preparation: Setting Up Your Operating Table
.10 Further Reading
With this, Chapter 1 is essentially complete.
.2 Software Debugging — Essence, Origins, and Myths
If the previous section was about building the physical space of our lab, then in this section, we first need to align on a fundamental understanding: in this space, what exactly are we up against?
.3 Software Defects — Real-World Tragedies
In the previous section, we were casually chatting about moths and the etymology of "Debug." In this section, the tone needs to change.
.4 Setting Up the Workbench
The gruesome case studies from the previous section should have left you with enough psychological shadow. Now, carrying this sense of "walking on thin ice" reverence, we can finally get our hands dirty.
.7 Two Kernels: The Frugal Producer and the Detective with a Magnifying Glass
We've stocked our toolbox, but before we start working, I want to show you a kernel developer's desk.
.7 Building Your Custom Production Kernel
At this point, I'll assume you're no longer a stranger to the basic workflow of building a Linux kernel from source: fetching the source tree, configuring, and compiling. If you feel a bit rusty or want to brush up on the details, I highly recommend flipping through Linux Kernel Programming or checking out the "Further Reading" section at the end of this chapter.
.7 Building Our Custom Debug Kernel
In the previous section, we built a "production kernel." It's like an agile field agent—lean, efficient, and always ready for deployment. But as developers, having just an agent isn't enough—sometimes, we need a chatterbox. Someone who shouts what they're doing before every single operation, even printing out their inner monologue.
.8 Seeing the Difference — Production vs. Debug Kernel Config Showdown
In the previous section, we turned on every debug switch we could find — which is great, but it raises a question: what exactly did we change? In other words, how much does all this y and n toggling actually divide this kernel from its production sibling?
.9 The Zen and Intuition of Debugging
In the previous section, we spent a fair amount of time "sharpening our tools"—building production and debug kernels. Now, you have two different blades in your toolbox.
Chapter 2: Bug Classification and Debugging Methodology
This chapter contains 3 sections. Click the links below to read:
Chapter 2: A Torch in the Dark Forest
The core insight established in this chapter is that debugging the kernel and debugging user-space programs are entirely different beasts. In user space, you have the entire libc library at your disposal, isolated process spaces, and a GDB you can attach to at any time. In kernel space, a single bad pointer can paralyze the entire machine—and once it halts, all context evaporates with the power loss, leaving nothing behind but a pile of hex numbers.
.2 Bug Classification
Just as a doctor must determine whether an infection is viral or physical before treating it, we need to identify the exact type of Bug we're facing before swinging our debugging hammer.
.3 The Kernel Debugging Panorama — When to Use What
Alright, we've classified bugs and figured out why they make the system misbehave.
Chapter 3: Kernel Printing and Log Debugging
This chapter contains 5 sections. Click the links below to read:
.0 Preparation and Motivation for This Chapter
In this chapter, we dive inside the kernel's "black box."
.1 The Ubiquitous Kernel printk
Why is the first example program in K&R's famous The C Programming Language always Hello, world?
.3 Debugging with printk
You might think sending a debug message to the kernel log is simple—just fire off a printk at KERN_DEBUG level. Your intuition isn't wrong, but reality is far more complex.
.4 Wielding the Kernel's Dynamic Debug Powerhouse
In the previous section, we saw that trace_printk() is a lifeline for high-frequency paths. But this creates a dilemma: for debugging, we need to print; for performance, we dare not print too much—especially in production environments.
.4 Remaining printk Miscellanea and Ultimate Debugging Techniques
At this point, we've actually covered 90% of your day-to-day debugging scenarios. From the most basic printk, to the leveled pr_debug, to the rate-limiting ratelimit that prevents log flooding, and that god-like Dynamic Debug. Your toolbox should now contain a pretty solid set of gear.
Chapter 4: Kprobes Dynamic Tracing
This chapter contains 9 sections. Click the links below to read:
Chapter 4: Seeing the Elephant Through a Needle's Eye — Kernel Probes and Dynamic Tracing
We are staring at a high-speed tangle of chaos.
Chapter 4: Inside the Black Box: Kprobes and Kernel Instrumentation
4.2 The Classic Approach: Hardcore Static Kprobes
.3 Using Static kprobes — Demo 3 & Demo 4
In the previous section, we figured out how parameters are quietly passed deep within the kernel by manually inspecting processor registers. It's like learning the grammar of a dialect — now you can finally understand what they're saying.
.4 Getting Started with kretprobes
In the previous section, we saw the power of kprobes—like forcibly inserting a breakpoint inside a kernel function. But that only lets you see what a function looks like when it goes "in." What if we want to see what result it brings back when it comes "out"?
.5 Kprobe-Based Event Tracing — The Internals
Remember that cliffhanger at the end of the last section? Is there a way to "bug" any function in the kernel without writing a single line of C code or compiling a kernel module?
.6 Setting Up Dynamic Kprobes (via kprobe events) — Placing a Watchpoint on Any Function
In the previous section, we mentioned that reality is often harsher than a demo. What if the function you need to monitor—perhaps an unassuming internal function in your own kernel module, or some obscure system call—doesn't even show up under `/sys/kernel/tracing/events`?
.7 Dynamic kprobe Event Tracing on Kernel Modules
In the previous section, we were still looking at kernel stack traces. Now, let's switch to a different scenario.
.8 The God's-Eye View of Process Tracing — Exploring execve with perf and eBPF Tools
In the previous section, we demonstrated that dynamic kprobes are virtually omnipotent when it comes to tracing kernel module functions. But that was for "our own code."
.9 Further Reading and Exploration Paths
Along this journey, we've torn down the kernel probe mechanism to its very core—from the lowest-level assembly instruction replacement, to the register parsing of pt_regs, to the dynamic insertion of ftrace, and finally standing on the shoulders of eBPF.
Chapter 5: Memory Debugging Tools: KASAN and UBSAN
This chapter contains 9 sections. Click the links below to read:
Chapter 5: Gazing into the Abyss — Dynamic Analysis of Memory Corruption and Undefined Behavior
There is a class of bugs more frustrating and insidious than logic errors.
.2 What Exactly Is Wrong with Memory?
In the previous section, we prepared a kernel source tree specifically for "breaking things," and even switched our compiler to the sharper Clang. All this preparation is for facing the oldest, craftiest, and most lethal enemy in the C programming world: memory issues.
.3 Understanding the Fundamentals of KASAN
In the previous section, we laid out a long "wish list," exposing the various memory bugs that can lurk in the kernel. The question now is: how exactly does KASAN track down the culprits on this list one by one?
.4 Configuring Generic KASAN Mode
Since we decided to start with Generic KASAN, we first need to "arm" the kernel.
.5 Catching Bugs with KASAN
Assume you have already followed the detailed steps from the previous section to configure, compile, and successfully boot a debug kernel with KASAN enabled. In my environment—an x86_64 Ubuntu 20.04 LTS virtual machine—everything is ready to go.
.6 Catching Undefined Behavior with the UBSAN Kernel Checker
In the previous section, we discussed how KASAN is the "heavy artillery" of the memory error world, but it's clearly not a silver bullet. Some incredibly sneaky bugs—like the ones we saw earlier—can perfectly evade KASAN's radar. So, what other weapons do we have at our disposal?
.7 Building the Kernel and Modules with Clang
Now, let's step into the world of Clang.
.8 Catching Memory Defects in the Kernel — Comparison and Notes (Part 1)
Now it's time for a retrospective.
.9 Further Reading: Rust, the Security Abyss, and the End of the Toolchain
We spent five chapters peeling back the layers of C's memory management like archaeologists—except the garden we unearthed isn't filled with treasure, it's filled with landmines. From KASAN to UBSAN, from KFENCE to the venerable Valgrind, our detectors have grown increasingly sophisticated. But you have to admit one fact: we are still patching a foundational road built on human fallibility.
Chapter 6: SLUB Debugging and Memory Leak Detection
This chapter contains 7 sections. Click the links below to read:
Chapter 6: Who Touched My Memory? (Part 1) — Catching Ghosts in the Kernel Stack
6.1 Preparation and SLUB Debugging Basics
.2 Precision Strikes: Planting Landmines with the slub_debug Parameter
In the previous section, we mentioned that the SLUB debugging mechanism is already locked and loaded (CONFIGSLUBDEBUG), but the safety is on by default.
ch06_3
Now we reach the validation phase—we throw those problematic test modules into the kernel and see if the security checkpoint we just configured can actually catch the vulnerabilities.
.4 Decoding SLUB Debug Error Reports
Alright, now that we've added slub_debug=FZPU to the kernel boot parameters and successfully triggered some bugs, let's look at what happened. As we saw at the end of the previous section, the SLUB debug mechanism did catch the culprit and spat out a rather intimidating pile of error logs.
.5 Using slabinfo and Its Companion Tools
A New Tool in the Box
Using kmemleak
Recall the scenario we saw at the end of the previous section: hundreds of vmareastruct objects were allocated, piling up in memory with nobody paying attention to them. We said at the time that this was reasonable system behavior—until it isn't.
.7 Practical Tips for Developers
Honestly, no amount of debugging tools beats writing cleaner code in the first place.
Chapter 7: Kernel Oops and Crash Analysis
This chapter contains 7 sections. Click the links below to read:
Chapter 7: When the Kernel Roars
There is a class of problems that can instantly send a chill down a systems engineer's spine.
.2 Generating a Simple Kernel Bug and Oops
Now that our tools are in place, it's time to break things.
.3 The Devil in the Details — Anatomy of a Crash Scene
In the previous section, we deliberately triggered a kernel crash. Watching the Oops logs flood the screen was satisfying, but once the thrill wears off, the real question arises: what are all these hex codes actually saying?
.4 Precision Strike: Pinpointing the Culprit with objdump and GDB
In the previous section, we finished reading the autopsy report. We know the `dothework function crashed, and we know the RIP register stopped at offset 0x124`.
.5 Decoding Kernel Bug Diagnostics (Part 2)
7.5 Leveraging Kernel Scripts — Don't Reinvent the Wheel
.6 Capturing Crash Logs in Interrupt Context Using a Console Device
In the previous section, we added a few more sharp tools to our toolbox—ranging from stack space checks to source code locating scripts. You might think that with these, even if the kernel crashes, we can pin it down effortlessly.
.7 The Heterogeneous Battlefield: Oops and netconsole in Action on ARM Linux
In the previous section's x86 virtual machine environment, we used a virtual serial cable to "fish out" the kernel logs. But in a real embedded battlefield—like a Raspberry Pi—things are rarely that elegant.
Chapter 8: Lock Debugging and Concurrency Issues
This chapter contains 5 sections. Click the links below to read:
.1 Lock Debugging Overview
There is a class of bugs whose favorite trick is invisibility.
.2 Locking Mechanisms — Key Concepts Cheat Sheet
I've rambled on about a lot of prerequisites earlier, but to make sure we're on the same page, there are a few core principles about "locks" that bear repeating. Think of this as a cheat sheet — but remember, in concurrent programming, memorizing the rules and truly understanding them are two very different things. The latter requires paying some tuition in the form of debugging pain.
.3 Catching Concurrency Bugs with KCSAN
Since relying on the human brain to wrestle with the sheer complexity of LKMM is becoming impractical, we had better find a tool to help. Entering the stage for this section is the "security scanner" of kernel concurrency: KCSAN.
.4 Real-World Lock Defect Cases
In the previous section, we covered KCSAN, which acts like a tireless night watchman, helping us monitor those fleeting data races.
.5 Further Reading
This chapter is incredibly information-dense. We covered locks, Heisenberg bugs, LKMM, and compiler-level magic like KCSAN.
Chapter 9: Ftrace Tracing Technology
This chapter contains 12 sections. Click the links below to read them:
Chapter 9: The Kernel Under a Microscope: Tracing, Profiling, and the End of the Black Box
In this chapter, we tackle an awkward problem.
.10 Using trace-cmd, KernelShark, and perf-tools Frontends
Before diving in, a quick side note. When you spend a lot of time staring at ftrace reports, you might notice something—security-related interface calls occur at an astonishingly high frequency.
.11 LTTng and Trace Compass: The God Mode of High-Level Perspectives
Entering the Scene: A Different Perspective on the Kernel
.12 Further Reading and Technical Map
This chapter ends here, but your exploration is just beginning.
.2 The Panorama of Kernel Tracing Technologies
Don't rush to type commands just yet.
.3 Configuring the Kernel for ftrace Support
Most modern Linux distributions come with ftrace support enabled out of the box.
.4 Tracing Kernel Flow with ftrace
In the previous section, we mentioned that although tracingon is 1, as long as currenttracer remains nop, the system incurs zero overhead. It's like a light socket that has power but no bulb installed.
.5 Practical ftrace Filter Options: From Firing Blind to Precision Guided
In the previous section, we solved the "how to see" problem—using function_graph tracer with various formatting options to turn kernel behavior into readable logs.
.6 Hands-on: Tracing a Single Ping Request with Raw ftrace
Alright, the toolbox is open. We now have the ability to configure the kernel, simple tracing methods at our disposal, and a whole bunch of advanced filtering techniques—Glob, index, blacklist, and commands.
.7 Hands-on: Tracing a Single Ping Request with the set_event Interface
In the previous section, we used the availablefilterfunctions "cast a wide net" approach to extract the function call graph of the entire network stack. While intuitive, it's like turning on everyone's microphone in a building just to hear what two people are saying—too much noise, too much information overload.
.8 Ftrace Miscellanea and Lingering Questions (FAQ)
There are a few scattered but crucial topics left to cover regarding ftrace. Rather than just dumping them in a list, let's use a more intuitive format—an FAQ—to put the final pieces of this puzzle together.
.9 Ftrace in Action: From Stack Overflow Monitoring to Android Debugging
Let's set the Instances topic aside for now.
Chapter 10: Kernel Panic and Deadlock Detection
This chapter contains 5 sections. Click the links below to read:
Chapter 10: The Undying Echo: When the Kernel Gives Up
10.1 Technical Preparation and Site Inspection
0.2 When the Kernel Gives Up — A Complete Guide to the Panic Mechanism
To conquer this beast, you must first understand it.
0.3 Writing a Custom Kernel Panic Handler
In the previous section, we dissected the standard procedure during a kernel panic like bomb disposal experts—from printing the last words to deciding whether to reboot. We even learned how to use the `panic_print` tuning knob to control how much information the kernel spits out before it dies.
0.4 Detecting Deadlocks and CPU Stalls in the Kernel
At the end of the previous section, we talked about the "red line" in the Panic handler: don't make it too complex, or you won't even be able to leave a "dying message." But sometimes, the kernel dies in a more insidious way—it doesn't crash immediately or scream for help; it just suddenly goes silent.
0.5 Leveraging the Kernel's Hung Task and Workqueue Stall Detectors
Following up on the previous section, we just mentioned that the system might experience tasks getting stuck — the so-called "Hung Task."
Chapter 11: KGDB Kernel Debugger
This chapter contains 7 sections. Click the links below to read:
Chapter 11: Deep into the Kernel: When the Debugger Becomes Part of the Kernel
There is a class of problems that ordinary debuggers simply cannot reach.
1.2 Understanding How KGDB Works Conceptually
In the previous section, we set up the SEALS project and got our "hardware" (albeit virtual) and rootfs. Now, we need to bring the real weapon—the debugger—onto the kernel battlefield.
1.3 Building the ARM Target System and Kernel
Don't rush to flash anything just yet.
1.4 Debugging the Kernel Hard with KGDB
In the previous section, we managed to keep our virtual sheep, "ARM VExpress," alive and kicking. It boots happily in QEMU, spits out a pile of boot logs, and finally obediently gives you a shell prompt.
1.5 Debugging Kernel Modules: When the Symbol Table Hides in Memory
In the previous section, we essentially "hacked" into a running kernel and watched it wake up in `start_kernel`. But honestly, that felt more like watching a stage play—we were just the audience, watching a plot arranged by the director.
1.6 Advanced [K]GDB Tips and Tricks
In the previous section, we used hbreak to latch onto doinitmodule, thoroughly solving the "module vanishes on load" problem. But once you actually start running KGDB, you'll find it's like a bottomless toolbox—most of the time you only use the screwdriver, but when you really need that angled needle-nose plier, you'd better know which corner it's hiding in.
1.7 Further Reading
The main text ended with the previous section.
Chapter 12: Summary and Further Reading
This chapter contains 2 sections. Click the links below to read:
Chapter 12: The Debugging Arsenal: No Silver Bullet
There is a class of problems that appear to be about tools, but are actually philosophical in nature.
2.2 Further Reading
Books eventually end, but the depths of the kernel do not.