Linux Kernel Debugging

📄️Chapter 1: Setting Up the Debugging Environment

This chapter contains 10 sections. Click the links below to read them:

📄️Chapter 1: On the Edge of Low-Level Disorder

1.1 Technical Preparation: Setting Up Your Operating Table

1.10 Further Reading

With this, Chapter 1 is essentially complete.

1.2 Software Debugging — Essence, Origins, and Myths

If the previous section was about building the physical space of our lab, then in this section, we first need to align on a fundamental understanding: in this space, what exactly are we up against?

1.3 Software Defects — Real-World Tragedies

In the previous section, we were casually chatting about moths and the etymology of "Debug." In this section, the tone needs to change.

The gruesome case studies from the previous section should have left you with enough psychological shadow. Now, carrying this sense of "walking on thin ice" reverence, we can finally get our hands dirty.

1.7 Two Kernels: The Frugal Producer and the Detective with a Magnifying Glass

We've stocked our toolbox, but before we start working, I want to show you a kernel developer's desk.

1.7 Building Your Custom Production Kernel

At this point, I'll assume you're no longer a stranger to the basic workflow of building a Linux kernel from source: fetching the source tree, configuring, and compiling. If you feel a bit rusty or want to brush up on the details, I highly recommend flipping through Linux Kernel Programming or checking out the "Further Reading" section at the end of this chapter.

1.7 Building Our Custom Debug Kernel

In the previous section, we built a "production kernel." It's like an agile field agent—lean, efficient, and always ready for deployment. But as developers, having just an agent isn't enough—sometimes, we need a chatterbox. Someone who shouts what they're doing before every single operation, even printing out their inner monologue.

1.8 Seeing the Difference — Production vs. Debug Kernel Config Showdown

In the previous section, we turned on every debug switch we could find — which is great, but it raises a question: what exactly did we change? In other words, how much does all this y and n toggling actually divide this kernel from its production sibling?

1.9 The Zen and Intuition of Debugging

In the previous section, we spent a fair amount of time "sharpening our tools"—building production and debug kernels. Now, you have two different blades in your toolbox.

📄️Chapter 2: Bug Classification and Debugging Methodology

This chapter contains 3 sections. Click the links below to read:

📄️Chapter 2: A Torch in the Dark Forest

The core insight established in this chapter is that debugging the kernel and debugging user-space programs are entirely different beasts. In user space, you have the entire libc library at your disposal, isolated process spaces, and a GDB you can attach to at any time. In kernel space, a single bad pointer can paralyze the entire machine—and once it halts, all context evaporates with the power loss, leaving nothing behind but a pile of hex numbers.

2.2 Bug Classification

Just as a doctor must determine whether an infection is viral or physical before treating it, we need to identify the exact type of Bug we're facing before swinging our debugging hammer.

2.3 The Kernel Debugging Panorama — When to Use What

Alright, we've classified bugs and figured out why they make the system misbehave.

📄️Chapter 3: Kernel Printing and Log Debugging

This chapter contains 5 sections. Click the links below to read:

3.0 Preparation and Motivation for This Chapter

In this chapter, we dive inside the kernel's "black box."

3.1 The Ubiquitous Kernel printk

Why is the first example program in K&R's famous The C Programming Language always Hello, world?

3.3 Debugging with printk

You might think sending a debug message to the kernel log is simple—just fire off a printk at KERN_DEBUG level. Your intuition isn't wrong, but reality is far more complex.

3.4 Wielding the Kernel's Dynamic Debug Powerhouse

In the previous section, we saw that trace_printk() is a lifeline for high-frequency paths. But this creates a dilemma: for debugging, we need to print; for performance, we dare not print too much—especially in production environments.

3.4 Remaining printk Miscellanea and Ultimate Debugging Techniques

At this point, we've actually covered 90% of your day-to-day debugging scenarios. From the most basic printk, to the leveled pr_debug, to the rate-limiting ratelimit that prevents log flooding, and that god-like Dynamic Debug. Your toolbox should now contain a pretty solid set of gear.

📄️Chapter 4: Kprobes Dynamic Tracing

This chapter contains 9 sections. Click the links below to read:

📄️Chapter 4: Seeing the Elephant Through a Needle's Eye — Kernel Probes and Dynamic Tracing

We are staring at a high-speed tangle of chaos.

📄️Chapter 4: Inside the Black Box: Kprobes and Kernel Instrumentation

4.2 The Classic Approach: Hardcore Static Kprobes

4.3 Using Static kprobes — Demo 3 & Demo 4

In the previous section, we figured out how parameters are quietly passed deep within the kernel by manually inspecting processor registers. It's like learning the grammar of a dialect — now you can finally understand what they're saying.

4.4 Getting Started with kretprobes

In the previous section, we saw the power of kprobes—like forcibly inserting a breakpoint inside a kernel function. But that only lets you see what a function looks like when it goes "in." What if we want to see what result it brings back when it comes "out"?

4.5 Kprobe-Based Event Tracing — The Internals

Remember that cliffhanger at the end of the last section? Is there a way to "bug" any function in the kernel without writing a single line of C code or compiling a kernel module?

4.6 Setting Up Dynamic Kprobes (via kprobe events) — Placing a Watchpoint on Any Function

In the previous section, we mentioned that reality is often harsher than a demo. What if the function you need to monitor—perhaps an unassuming internal function in your own kernel module, or some obscure system call—doesn't even show up under `/sys/kernel/tracing/events`?

4.7 Dynamic kprobe Event Tracing on Kernel Modules

In the previous section, we were still looking at kernel stack traces. Now, let's switch to a different scenario.

4.8 The God's-Eye View of Process Tracing — Exploring execve with perf and eBPF Tools

In the previous section, we demonstrated that dynamic kprobes are virtually omnipotent when it comes to tracing kernel module functions. But that was for "our own code."

4.9 Further Reading and Exploration Paths

Along this journey, we've torn down the kernel probe mechanism to its very core—from the lowest-level assembly instruction replacement, to the register parsing of pt_regs, to the dynamic insertion of ftrace, and finally standing on the shoulders of eBPF.

📄️Chapter 5: Memory Debugging Tools: KASAN and UBSAN

This chapter contains 9 sections. Click the links below to read:

📄️Chapter 5: Gazing into the Abyss — Dynamic Analysis of Memory Corruption and Undefined Behavior

There is a class of bugs more frustrating and insidious than logic errors.

5.2 What Exactly Is Wrong with Memory?

In the previous section, we prepared a kernel source tree specifically for "breaking things," and even switched our compiler to the sharper Clang. All this preparation is for facing the oldest, craftiest, and most lethal enemy in the C programming world: memory issues.

5.3 Understanding the Fundamentals of KASAN

In the previous section, we laid out a long "wish list," exposing the various memory bugs that can lurk in the kernel. The question now is: how exactly does KASAN track down the culprits on this list one by one?

5.4 Configuring Generic KASAN Mode

Since we decided to start with Generic KASAN, we first need to "arm" the kernel.

5.5 Catching Bugs with KASAN

Assume you have already followed the detailed steps from the previous section to configure, compile, and successfully boot a debug kernel with KASAN enabled. In my environment—an x86_64 Ubuntu 20.04 LTS virtual machine—everything is ready to go.

5.6 Catching Undefined Behavior with the UBSAN Kernel Checker

In the previous section, we discussed how KASAN is the "heavy artillery" of the memory error world, but it's clearly not a silver bullet. Some incredibly sneaky bugs—like the ones we saw earlier—can perfectly evade KASAN's radar. So, what other weapons do we have at our disposal?

5.7 Building the Kernel and Modules with Clang

Now, let's step into the world of Clang.

5.8 Catching Memory Defects in the Kernel — Comparison and Notes (Part 1)

Now it's time for a retrospective.

5.9 Further Reading: Rust, the Security Abyss, and the End of the Toolchain

We spent five chapters peeling back the layers of C's memory management like archaeologists—except the garden we unearthed isn't filled with treasure, it's filled with landmines. From KASAN to UBSAN, from KFENCE to the venerable Valgrind, our detectors have grown increasingly sophisticated. But you have to admit one fact: we are still patching a foundational road built on human fallibility.

📄️Chapter 6: SLUB Debugging and Memory Leak Detection

This chapter contains 7 sections. Click the links below to read:

📄️Chapter 6: Who Touched My Memory? (Part 1) — Catching Ghosts in the Kernel Stack

6.1 Preparation and SLUB Debugging Basics

6.2 Precision Strikes: Planting Landmines with the slub_debug Parameter

In the previous section, we mentioned that the SLUB debugging mechanism is already locked and loaded (CONFIGSLUBDEBUG), but the safety is on by default.

📄️ch06_3

Now we reach the validation phase—we throw those problematic test modules into the kernel and see if the security checkpoint we just configured can actually catch the vulnerabilities.

6.4 Decoding SLUB Debug Error Reports

Alright, now that we've added slub_debug=FZPU to the kernel boot parameters and successfully triggered some bugs, let's look at what happened. As we saw at the end of the previous section, the SLUB debug mechanism did catch the culprit and spat out a rather intimidating pile of error logs.

6.5 Using slabinfo and Its Companion Tools

A New Tool in the Box

📄️Using kmemleak

Recall the scenario we saw at the end of the previous section: hundreds of vmareastruct objects were allocated, piling up in memory with nobody paying attention to them. We said at the time that this was reasonable system behavior—until it isn't.

6.7 Practical Tips for Developers

Honestly, no amount of debugging tools beats writing cleaner code in the first place.

📄️Chapter 7: Kernel Oops and Crash Analysis

This chapter contains 7 sections. Click the links below to read:

📄️Chapter 7: When the Kernel Roars

There is a class of problems that can instantly send a chill down a systems engineer's spine.

7.2 Generating a Simple Kernel Bug and Oops

Now that our tools are in place, it's time to break things.

7.3 The Devil in the Details — Anatomy of a Crash Scene

In the previous section, we deliberately triggered a kernel crash. Watching the Oops logs flood the screen was satisfying, but once the thrill wears off, the real question arises: what are all these hex codes actually saying?

7.4 Precision Strike: Pinpointing the Culprit with objdump and GDB

In the previous section, we finished reading the autopsy report. We know the `dothework function crashed, and we know the RIP register stopped at offset 0x124`.

7.5 Decoding Kernel Bug Diagnostics (Part 2)

7.5 Leveraging Kernel Scripts — Don't Reinvent the Wheel

7.6 Capturing Crash Logs in Interrupt Context Using a Console Device

In the previous section, we added a few more sharp tools to our toolbox—ranging from stack space checks to source code locating scripts. You might think that with these, even if the kernel crashes, we can pin it down effortlessly.

7.7 The Heterogeneous Battlefield: Oops and netconsole in Action on ARM Linux

In the previous section's x86 virtual machine environment, we used a virtual serial cable to "fish out" the kernel logs. But in a real embedded battlefield—like a Raspberry Pi—things are rarely that elegant.

📄️Chapter 8: Lock Debugging and Concurrency Issues

This chapter contains 5 sections. Click the links below to read:

8.1 Lock Debugging Overview

There is a class of bugs whose favorite trick is invisibility.

8.2 Locking Mechanisms — Key Concepts Cheat Sheet

I've rambled on about a lot of prerequisites earlier, but to make sure we're on the same page, there are a few core principles about "locks" that bear repeating. Think of this as a cheat sheet — but remember, in concurrent programming, memorizing the rules and truly understanding them are two very different things. The latter requires paying some tuition in the form of debugging pain.

8.3 Catching Concurrency Bugs with KCSAN

Since relying on the human brain to wrestle with the sheer complexity of LKMM is becoming impractical, we had better find a tool to help. Entering the stage for this section is the "security scanner" of kernel concurrency: KCSAN.

8.4 Real-World Lock Defect Cases

In the previous section, we covered KCSAN, which acts like a tireless night watchman, helping us monitor those fleeting data races.

8.5 Further Reading

This chapter is incredibly information-dense. We covered locks, Heisenberg bugs, LKMM, and compiler-level magic like KCSAN.

📄️Chapter 9: Ftrace Tracing Technology

This chapter contains 12 sections. Click the links below to read them:

📄️Chapter 9: The Kernel Under a Microscope: Tracing, Profiling, and the End of the Black Box

In this chapter, we tackle an awkward problem.

9.10 Using trace-cmd, KernelShark, and perf-tools Frontends

Before diving in, a quick side note. When you spend a lot of time staring at ftrace reports, you might notice something—security-related interface calls occur at an astonishingly high frequency.

9.11 LTTng and Trace Compass: The God Mode of High-Level Perspectives

Entering the Scene: A Different Perspective on the Kernel

9.12 Further Reading and Technical Map

This chapter ends here, but your exploration is just beginning.

9.2 The Panorama of Kernel Tracing Technologies

Don't rush to type commands just yet.

9.3 Configuring the Kernel for ftrace Support

Most modern Linux distributions come with ftrace support enabled out of the box.

9.4 Tracing Kernel Flow with ftrace

In the previous section, we mentioned that although tracingon is 1, as long as currenttracer remains nop, the system incurs zero overhead. It's like a light socket that has power but no bulb installed.

9.5 Practical ftrace Filter Options: From Firing Blind to Precision Guided

In the previous section, we solved the "how to see" problem—using function_graph tracer with various formatting options to turn kernel behavior into readable logs.

9.6 Hands-on: Tracing a Single Ping Request with Raw ftrace

Alright, the toolbox is open. We now have the ability to configure the kernel, simple tracing methods at our disposal, and a whole bunch of advanced filtering techniques—Glob, index, blacklist, and commands.

9.7 Hands-on: Tracing a Single Ping Request with the set_event Interface

In the previous section, we used the availablefilterfunctions "cast a wide net" approach to extract the function call graph of the entire network stack. While intuitive, it's like turning on everyone's microphone in a building just to hear what two people are saying—too much noise, too much information overload.

9.8 Ftrace Miscellanea and Lingering Questions (FAQ)

There are a few scattered but crucial topics left to cover regarding ftrace. Rather than just dumping them in a list, let's use a more intuitive format—an FAQ—to put the final pieces of this puzzle together.

9.9 Ftrace in Action: From Stack Overflow Monitoring to Android Debugging

Let's set the Instances topic aside for now.

📄️Chapter 10: Kernel Panic and Deadlock Detection

This chapter contains 5 sections. Click the links below to read:

📄️Chapter 10: The Undying Echo: When the Kernel Gives Up

10.1 Technical Preparation and Site Inspection

10.2 When the Kernel Gives Up — A Complete Guide to the Panic Mechanism

To conquer this beast, you must first understand it.

10.3 Writing a Custom Kernel Panic Handler

In the previous section, we dissected the standard procedure during a kernel panic like bomb disposal experts—from printing the last words to deciding whether to reboot. We even learned how to use the `panic_print` tuning knob to control how much information the kernel spits out before it dies.

10.4 Detecting Deadlocks and CPU Stalls in the Kernel

At the end of the previous section, we talked about the "red line" in the Panic handler: don't make it too complex, or you won't even be able to leave a "dying message." But sometimes, the kernel dies in a more insidious way—it doesn't crash immediately or scream for help; it just suddenly goes silent.

10.5 Leveraging the Kernel's Hung Task and Workqueue Stall Detectors

Following up on the previous section, we just mentioned that the system might experience tasks getting stuck — the so-called "Hung Task."

📄️Chapter 11: KGDB Kernel Debugger

This chapter contains 7 sections. Click the links below to read:

📄️Chapter 11: Deep into the Kernel: When the Debugger Becomes Part of the Kernel

There is a class of problems that ordinary debuggers simply cannot reach.

11.2 Understanding How KGDB Works Conceptually

In the previous section, we set up the SEALS project and got our "hardware" (albeit virtual) and rootfs. Now, we need to bring the real weapon—the debugger—onto the kernel battlefield.

11.3 Building the ARM Target System and Kernel

Don't rush to flash anything just yet.

11.4 Debugging the Kernel Hard with KGDB

In the previous section, we managed to keep our virtual sheep, "ARM VExpress," alive and kicking. It boots happily in QEMU, spits out a pile of boot logs, and finally obediently gives you a shell prompt.

11.5 Debugging Kernel Modules: When the Symbol Table Hides in Memory

In the previous section, we essentially "hacked" into a running kernel and watched it wake up in `start_kernel`. But honestly, that felt more like watching a stage play—we were just the audience, watching a plot arranged by the director.

11.6 Advanced [K]GDB Tips and Tricks

In the previous section, we used hbreak to latch onto doinitmodule, thoroughly solving the "module vanishes on load" problem. But once you actually start running KGDB, you'll find it's like a bottomless toolbox—most of the time you only use the screwdriver, but when you really need that angled needle-nose plier, you'd better know which corner it's hiding in.

📄️Chapter 1: Setting Up the Debugging Environment

📄️Chapter 1: On the Edge of Low-Level Disorder

1.10 Further Reading

1.2 Software Debugging — Essence, Origins, and Myths

1.3 Software Defects — Real-World Tragedies

1.4 Setting Up the Workbench

1.7 Two Kernels: The Frugal Producer and the Detective with a Magnifying Glass

1.7 Building Your Custom Production Kernel

1.7 Building Our Custom Debug Kernel

1.8 Seeing the Difference — Production vs. Debug Kernel Config Showdown

1.9 The Zen and Intuition of Debugging

📄️Chapter 2: Bug Classification and Debugging Methodology

📄️Chapter 2: A Torch in the Dark Forest

2.2 Bug Classification

2.3 The Kernel Debugging Panorama — When to Use What

📄️Chapter 3: Kernel Printing and Log Debugging

3.0 Preparation and Motivation for This Chapter

3.1 The Ubiquitous Kernel printk

3.3 Debugging with printk

3.4 Wielding the Kernel's Dynamic Debug Powerhouse

3.4 Remaining printk Miscellanea and Ultimate Debugging Techniques

📄️Chapter 4: Kprobes Dynamic Tracing

📄️Chapter 4: Seeing the Elephant Through a Needle's Eye — Kernel Probes and Dynamic Tracing

📄️Chapter 4: Inside the Black Box: Kprobes and Kernel Instrumentation

4.3 Using Static kprobes — Demo 3 & Demo 4

4.4 Getting Started with kretprobes

4.5 Kprobe-Based Event Tracing — The Internals

4.6 Setting Up Dynamic Kprobes (via kprobe events) — Placing a Watchpoint on Any Function

4.7 Dynamic kprobe Event Tracing on Kernel Modules

4.8 The God's-Eye View of Process Tracing — Exploring execve with perf and eBPF Tools

4.9 Further Reading and Exploration Paths

📄️Chapter 5: Memory Debugging Tools: KASAN and UBSAN

📄️Chapter 5: Gazing into the Abyss — Dynamic Analysis of Memory Corruption and Undefined Behavior

5.2 What Exactly Is Wrong with Memory?

5.3 Understanding the Fundamentals of KASAN

5.4 Configuring Generic KASAN Mode

5.5 Catching Bugs with KASAN

5.6 Catching Undefined Behavior with the UBSAN Kernel Checker

5.7 Building the Kernel and Modules with Clang

5.8 Catching Memory Defects in the Kernel — Comparison and Notes (Part 1)

5.9 Further Reading: Rust, the Security Abyss, and the End of the Toolchain

📄️Chapter 6: SLUB Debugging and Memory Leak Detection

📄️Chapter 6: Who Touched My Memory? (Part 1) — Catching Ghosts in the Kernel Stack

6.2 Precision Strikes: Planting Landmines with the slub_debug Parameter

📄️ch06_3

6.4 Decoding SLUB Debug Error Reports

6.5 Using slabinfo and Its Companion Tools

📄️Using kmemleak

6.7 Practical Tips for Developers

📄️Chapter 7: Kernel Oops and Crash Analysis

📄️Chapter 7: When the Kernel Roars

7.2 Generating a Simple Kernel Bug and Oops

7.3 The Devil in the Details — Anatomy of a Crash Scene

7.4 Precision Strike: Pinpointing the Culprit with objdump and GDB

7.5 Decoding Kernel Bug Diagnostics (Part 2)

7.6 Capturing Crash Logs in Interrupt Context Using a Console Device

7.7 The Heterogeneous Battlefield: Oops and netconsole in Action on ARM Linux

📄️Chapter 8: Lock Debugging and Concurrency Issues

8.1 Lock Debugging Overview

8.2 Locking Mechanisms — Key Concepts Cheat Sheet

8.3 Catching Concurrency Bugs with KCSAN

8.4 Real-World Lock Defect Cases

8.5 Further Reading

📄️Chapter 9: Ftrace Tracing Technology

📄️Chapter 9: The Kernel Under a Microscope: Tracing, Profiling, and the End of the Black Box

9.10 Using trace-cmd, KernelShark, and perf-tools Frontends

9.11 LTTng and Trace Compass: The God Mode of High-Level Perspectives

9.12 Further Reading and Technical Map

9.2 The Panorama of Kernel Tracing Technologies

9.3 Configuring the Kernel for ftrace Support

9.4 Tracing Kernel Flow with ftrace

9.5 Practical ftrace Filter Options: From Firing Blind to Precision Guided

9.6 Hands-on: Tracing a Single Ping Request with Raw ftrace

9.7 Hands-on: Tracing a Single Ping Request with the set_event Interface

9.8 Ftrace Miscellanea and Lingering Questions (FAQ)

9.9 Ftrace in Action: From Stack Overflow Monitoring to Android Debugging

📄️Chapter 10: Kernel Panic and Deadlock Detection

📄️Chapter 10: The Undying Echo: When the Kernel Gives Up

10.2 When the Kernel Gives Up — A Complete Guide to the Panic Mechanism

10.3 Writing a Custom Kernel Panic Handler