Chapter 5: Gazing into the Abyss — Dynamic Analysis of Memory Corruption and Undefined Behavior
There is a class of bugs more frustrating and insidious than logic errors.
Logic errors usually bring a program to an immediate halt, or at least exhibit some obvious symptom of "brokenness"—wrong calculations, deadlocks, or spitting out garbage data. Memory corruption doesn't. Memory corruption is like mold quietly growing inside your apartment walls. By the time you notice the black spots, the rot has already reached the structure; by the time you see a kernel panic, the data corruption may have already spread across hundreds of lines of code after that pointer was returned.
To make matters worse, in a low-level language like C, we wield the scalpel without a safety net. use-after-free, buffer overflow, stack overflow... these terms sound like problems only a seasoned engineer would face, but the truth is: no matter how many years you've been coding, as long as humans are writing the code and pointers are involved, mistakes will happen.
Traditional debugging methods—reading logs, adding print statements, even single-stepping with gdb—are often ineffective against memory errors. Because when you stop and observe at 0xffff8880010, the actual crime may have occurred at 0xffff8880008 ten minutes ago. You're looking at the crime scene, not the security footage.
What we need is a mechanism—one that blows the whistle the exact moment the action happens. Not waiting for the program to crash before performing an autopsy, but catching it the instant a finger touches memory it shouldn't.
This is precisely the mission of this chapter: building this dynamic monitoring system. We will introduce three "heavy weapons" from the Linux kernel—KASAN, UBSAN, and KFENCE. Some act like searchlights, illuminating the entire memory space so that any out-of-bounds operation has nowhere to hide (but at a massive power cost); others act like sonar, scanning occasionally and remaining virtually imperceptible during normal operation (suitable for lurking in production environments).
To make good use of these tools, we can't just be operators who blindly toggle config options. We need to understand: how does shadow memory map physical memory? What exactly does compile-time instrumentation sneak into the code? Why can Clang catch certain bugs while GCC lets them slip?
These questions are the core of this chapter.
5.1 Preparing the Environment and Arsenal
Before diving into the mechanisms, let's get our tools ready. The good news is that our "hardware environment"—your development machine used for compiling and running the kernel—remains exactly the same as described in Chapter 1. You don't need to buy a new ARM board, nor do you need any special emulator.
All the example code—including the "buggy drivers" we will intentionally write—is already sitting in the book's GitHub repository:
🔗 https://github.com/PacktPublishing/Linux-Kernel-Debugging
You can clone it directly to use as our testing range.
⚠️ Warning Do not run these test cases directly in a production environment or on a machine you're using for important projects. We will intentionally trigger kernel panics and deliberately create memory leaks. Please use a virtual machine or a dedicated test board.
The only new face in this step is the Clang compiler.
In Chapter 1, we may have assumed you were still using the veteran GCC. But in the realm of memory detection, Clang (and the LLVM infrastructure behind it) isn't just a GCC substitute—in certain scenarios, it is mandatory. Later, when we discuss specific out-of-bounds detections (such as "left out-of-bounds" access on global variables), you'll find that Clang's diagnostics capabilities are far more aggressive than GCC's.
Don't worry if you haven't installed it yet—we'll cover how to integrate it into the build process in the "Building your kernel and modules with Clang" section.
With the environment confirmed and the compiler ready, we can now start tinkering with the config options that will make the kernel "reveal its true colors."