6.2 Precision Strikes: Planting Landmines with the slub_debug Parameter
In the previous section, we mentioned that the SLUB debugging mechanism is already locked and loaded (CONFIG_SLUB_DEBUG), but the safety is on by default.
Now, it's time to learn how to take off the safety and toss a few grenades into this memory minefield.
Our core tool is the kernel command-line parameter slub_debug. It allows us to finely control the enabling and disabling of debugging logic without recompiling the kernel or changing a single line of code. It's like targeting an area from an aircraft (a slab cache) and then deciding whether to drop leaflets (logs only) or drop bombs (forced crashes).
But before we drop any bombs, we need to take a look at the fuses.
6.2.1 Breaking Down slub_debug: Your Arsenal
The syntax of the slub_debug parameter is very intuitive, but it has many options. You can think of it as a row of toggle switches, each controlling a specific detection mechanism.
First, let's look at this official "arsenal":
[Insert translated or retained content from original Table 6.1]
- F (Sanity checks): Basic sanity checks. Verifies whether metadata is corrupted during allocation and freeing.
- Z (Red zoning): Red zones. Fills specific areas before and after objects to catch out-of-bounds access.
- P (Poisoning): Poisoning. Fills with specific magic numbers to catch Uninitialized Memory Reads (UMR) or Use-After-Free (UAF) errors.
- U (User tracking): User tracking. Records who allocated/freed this object (saves stack traces).
- T (Trace): Traces the allocation/freeing trajectory.
- A (Failure tracking): Failure statistics, recording where allocations failed.
These switches can be used not only individually but also in combination. In addition, there is a very useful sysfs interface document (although it looks a bit dated): Documentation/ABI/testing/sysfs-kernel-slab.
But before diving into hands-on practice, we need to understand the most core concept—Poisoning. This is the "true mirror" that reveals most bizarre memory errors.
6.2.2 Deep Dive into the Mechanics: Those Strange Hex Numbers
How does the SLUB debugging mechanism know which memory you've touched? It relies on "dyeing."
The kernel defines several special dyes in include/linux/poison.h. We can think of memory as a canvas, and the kernel paints specific colors in specific spots on these canvases. If you alter these colors, the kernel knows you did it.
Let's look at the recipes for these colors (magic numbers):
// include/linux/poison.h
#define POISON_INUSE 0x5a /* for use-uninitialised poisoning (ASCII 'Z') */
#define POISON_FREE 0x6b /* for use-after-free poisoning (ASCII 'k') */
#define POISON_END 0xa5 /* end-byte of poisoning */
These three macro definitions are the cornerstone of the entire poisoning mechanism. Let's break them down one by one:
-
POISON_FREE (0x6b): This is the most commonly used "poison." When you enable
slub_debug=Por create a cache with theSLAB_POISONflag, the kernel fills the memory with0x6b(corresponding to the ASCII characterk) after an object is freed, or right after it's allocated but before any data is written to it.- Meaning: This memory is now "dead." If you read it, you should see a screen full of
0x6b; if you see any other value, it means either you read it before initializing it (UMR), or you wrote to it after freeing it (UAF).
- Meaning: This memory is now "dead." If you read it, you should see a screen full of
-
POISON_INUSE (0x5a): This value is primarily used to fill the inside of red zones or for certain special state markers. If you see
0x5ainside a red zone, it means this is a safe boundary; if it changes, it means you've blown through the boundary. -
POISON_END (0xa5): This is a sentinel. The last valid byte of each slab object is set to this value. It acts like a warning sign next to a landmine—once it's trampled (overwritten), the kernel immediately sounds the alarm.
6.2.3 Uninitialized Memory Reads (UMR): Erupting in Silence
Just looking at the definitions is too abstract. Let's run an experiment using the test code ch5/kmembugs_test.c we wrote in Chapter 5.
The umr_slub() function inside is very simple (and very stupid): it requests 32 bytes of memory and then reads it directly without writing anything. This is a classic "Uninitialized Memory Read" (UMR).
/* 代码逻辑示意 */
q = kmalloc(32, GFP_KERNEL);
/* 故意不写 memset 或赋值 */
printk("q[3] is 0x%x\n", q[3]); /* 直接读垃圾值 */
Scenario 1: No Debug Protection (Running Bare)
If we run this test without any slub_debug parameter, what will the kernel log look like?
It will look roughly like this:
[ 6845.100813] testcase to run: 10
[ 6845.101126] test_kmembugs:umr_slub(): testcase 10: simple UMR on slab memory
[ 6845.101771] test_kmembugs:umr_slub(): q[3] is 0x0
[ 6845.102203] q: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 6845.102946] q: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
See that neat row of 00?
This is extremely dangerous. Because it looks "clean," as if this memory was initialized to 0. But this is purely coincidental—the kernel happened to give us a zeroed-out page (like a clean page reclaimed by the page allocator). If you rely on this "coincidence," your code will inevitably crash in other environments or at other times, and this type of Bug is non-reproducible.
Now, let's change our approach, enable debug mode, and run it again.
Scenario 2: Enabling Poison (Poisoning Mode)
Although we won't show the new output here (leaving some suspense), I can tell you the result: those 32 bytes will no longer be 00, but will all turn into 6b.
If you go read q[3], you won't get 0x0; you will get 0x6b6b6b6b.
This creates a huge cognitive dissonance:
- Before: You thought you read a 0, happily continued using it, and as a result, three days later when a certain condition triggered, the program crashed thousands of lines away.
- Now: You read a screen full of
k(0x6b), and you immediately know: "Oh, I forgot to initialize this memory."
This is the power of SLUB debugging—it turns uncertain garbage values into definitive error flags.
6.2.4 Step-by-Step Configuration: Practical Syntax for slub_debug
Now that we understand the mechanics, let's flip the switches.
As long as your kernel is configured with CONFIG_SLUB_DEBUG=y (which we confirmed in the previous chapter), you can pass the slub_debug parameter at boot.
The syntax format is as follows:
slub_debug=<Flags>,<SlabList>
- Flags: Any combination of the letters we just mentioned, like
F,Z,P,U, etc. For example,FZPU. Don't leave spaces; just concatenate them directly. - SlabList (Optional): Specifies which exact cache name to debug. If left empty, it applies to all slab caches by default.
A few typical usages:
-
Full enable (The relentless approach):
slub_debug=FZPUThis enables the full suite of checks for all caches. ⚠️ Warning: This will slow down the system immensely, and memory overhead will skyrocket. Unless you are specifically reproducing a fault, don't do this in a production environment.
-
Precision strike (Targeting a specific cache):
slub_debug=,kmalloc-256Note that the part before the comma is empty, meaning "use default flags (which usually means none)," but only for the
kmalloc-256cache. If you want to enable P and Z forkmalloc-256:slub_debug=PZ,kmalloc-256 -
Disable all (even though it's off by default):
slub_debug=-
Practical Demo: Enabling Full Monitoring
Suppose we need to reproduce an extremely tricky Bug and decide to enable red zones, poisoning, sanity checks, and user tracking for all caches. We need to modify the GRUB configuration and append the following to the kernel boot parameters:
slub_debug=FZPU
Save, reboot, and enter the system.
Let's verify that it actually took effect. Check the kernel boot parameters:
$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.10.60-dbg02-gcc root=UUID=<...> ro quiet splash 3 slub_debug=FZPU
Great, the parameter was passed in. But this is just a boot command—did the kernel actually listen?
We need to look at the actual state of the slab cache under sysfs. Let's use kmalloc-32 (the cache specifically for allocating 32-byte small objects) as our test subject.
$ export SLAB=/sys/kernel/slab/kmalloc-32
$ sudo cat ${SLAB}/sanity_checks ${SLAB}/red_zone ${SLAB}/poison ${SLAB}/store_user
1
1
1
1
$
See those four 1?
This means:
sanity_checks(F): ONred_zone(Z): ONpoison(P): ONstore_user(U): ON
The memory allocator is now like a heavily armed security guard; every single byte going in or out will be thoroughly frisked.
The system is ready. Next, we'll throw our problematic test code into it and see how this mechanism reports errors.