Skip to main content

Chapter 6: Who Touched My Memory? (Part 1) — Catching Ghosts in the Kernel Stack

6.1 Preparation and SLUB Debugging Basics

6.1.1 Introduction: The Invisible Destroyer

There is a class of bugs more frustrating than logic errors, and far harder to reproduce.

When your code logic goes astray, the program usually crashes or produces an incorrect result — at least you know something is wrong. But when memory is corrupted, the world is often silent. Your driver might only occasionally hang, the system might suddenly panic after running for three days, or a variable's value might change for no apparent reason — as if an invisible hand were secretly modifying your code while you sleep.

This is memory corruption. It is the kernel developer's nightmare.

To catch these ghosts, we need a few weapons in our arsenal. In the previous chapter, we discussed KASAN and UBSAN — two heavy artillery pieces. While powerful, they can sometimes be too heavy, carrying significant performance overhead and complex configuration. In this chapter, we'll pull back our focus and look at the kernel's built-in, lighter-weight detection mechanisms: the Slub Allocator's debugging features, and the dedicated memory leak detection tool, kmemleak.

Our mission in this chapter is to build a "memory debugging intuition": when you suspect a memory issue, how to quickly pinpoint the culprit through SLUB's feedback; when you suspect a memory leak, how to let kmemleak help you drag out that forgotten pointer.

Ready? Let's start ghost hunting.


6.1.2 Preparing the Lab Environment

First, as always, the environment must be set up.

The lab environment configuration for this chapter is exactly the same as what we covered in Chapter 1, "Introduction to Debugging Software." If you've already built a debug kernel following the previous chapters, you're ready to jump right in; if you haven't, or if you've switched to a new machine, go back and review the checklist in Chapter 1.

For easy reference, all the example code is available in this book's GitHub repository:

https://github.com/PacktPublishing/Linux-Kernel-Debugging

git clone it and place it in your virtual machine — we'll need it shortly.


6.1.3 Why Do We Need SLUB Debugging?

Before diving into the configuration, let's agree on one thing: what exactly are we debugging?

Memory corruption in the kernel takes countless forms, but it ultimately boils down to a few familiar faces:

  • UMR (Uninitialized Memory Read): Reading memory that hasn't been initialized. You get garbage values, which are typically random, making the bugs random as well.
  • UAF (Use After Free): Using memory after it has been freed. It's like selling a house but keeping the key, then going back one night to sleep, only to find the house now belongs to someone else.
  • UAR (Use After Return): Accessing stack memory via a pointer after the function has returned and that stack frame is no longer valid.
  • Double Free: Freeing the same block of memory twice.
  • OOB (Out Of Bounds): Out-of-bounds access. An array index is too large, a linked list pointer goes wild, and you step into someone else's territory.

These issues are incredibly common. They will find you before you even have a chance to write elegant code. We already know that KASAN is the silver bullet against them, but sometimes you want something lighter-weight — something you can even occasionally enable in a production environment.

That is exactly why the Slub Allocator's built-in debugging features exist.

Recall the kernel's memory hierarchy:

At the very bottom of the kernel is the page allocator, also known as the Buddy System. It manages large chunks of physical pages. However, the kernel frequently needs small chunks of memory — structs of a few dozen or a few hundred bytes. If you request a 4KB page from the Buddy System just to hold a 64-byte struct, that's a massive waste and leads to severe internal fragmentation.

To solve this, the kernel adds a layer on top of the page allocator: the Slab Allocator.

You can think of Slab as a "parts warehouse." The Buddy System handles moving "shipping containers" (whole pages), while Slab breaks the containers open and neatly arranges the "screws" and "washers" (small objects) on shelves. When a kernel driver needs a screw, Slub simply grabs one off the shelf and hands it over; when it's done, the screw goes back on the shelf. This is both fast and space-efficient.

But in the modern Linux kernel, this "warehouse" has three different implementations:

  1. SLAB: The earliest implementation. Classic, but outdated.
  2. SLUB (Unqueued Allocator): The current default. More modern in design, better performance, and the star of this chapter.
  3. SLOB: Designed for extremely memory-constrained embedded systems; almost never seen on standard desktops or servers.

⚠️ Note: Everything that follows in this chapter applies specifically to the SLUB implementation. If you selected SLAB or SLOB in your kernel configuration, the parameters and paths below may not match up. Modern Linux distributions (Ubuntu, Fedora, etc.) default to SLUB almost universally (config option CONFIG_SLUB); you can verify this in General setup | Choose SLAB allocator.

Our goal now is to install "surveillance cameras" in this warehouse — to see who took a screw and didn't return it, or who kicked over the shelving unit.


6.1.4 Configuring the Kernel to Enable SLUB Debugging

Alright, before we get our hands dirty, let's survey the landscape. We need to adjust some kernel configuration options.

SLUB provides a full suite of debugging support, primarily controlled by two config options. Open your make menuconfig (or directly modify .config), and look for these two locations:

1. Enable the Debug Support Master Switch

Path:

General setup | Enable SLUB debugging support

Config option:

CONFIG_SLUB_DEBUG=y

This step is mandatory. Enabling it grants you the following abilities:

  • Viewing detailed status of all slab caches under /sys/kernel/debug/slab (or /sys/slab on some systems).
  • Runtime cache validation.
  • Using the advanced debugging features we'll cover in the next section (Red Zone, Poisoning, etc.).

A quick fact: If you enable Generic KASAN (CONFIG_KASAN), the kernel will automatically select this option. This is one of the reasons KASAN slows down your system — it's already doing a lot of work under the hood.

2. Enable by Default?

Path:

Kernel hacking | Memory Debugging | SLUB debugging on by default

Config option:

CONFIG_SLUB_DEBUG_ON=y

This option is a double-edged sword.

If set to y, SLUB debugging will be fully enabled at kernel boot. That sounds great — like installing 24/7 surveillance in your system. But the cost is enormous — the performance penalty will be very noticeable. For production environments or everyday development debugging, this is usually too heavy.

The recommended strategy is: Leave CONFIG_SLUB_DEBUG_ON disabled (it's off by default), and turn on CONFIG_SLUB_DEBUG. This way, the debugging capabilities are mounted on the system but not activated.

When you need to hunt a bug, you enable them on-demand via the kernel boot parameter slub_debug. That's the flexible approach.


Verifying the Configuration

After changing the configuration, rebuild your kernel (if you haven't already) and reboot into the new kernel.

Once in the system, use grep to confirm the current configuration state:

$ grep SLUB_DEBUG /boot/config-5.10.60-dbg02
CONFIG_SLUB_DEBUG=y
# CONFIG_SLUB_DEBUG_ON is not set

The # CONFIG_SLUB_DEBUG_ON is not set here is critical.

It means: I have a gun, but it's not loaded yet.

This "on standby without over-intervening" state is the foundation for our hands-on work ahead. If you accidentally turned on CONFIG_SLUB_DEBUG_ON, you'll noticeably feel your mouse lagging and file reads/writes slowing down — don't say I didn't warn you.


6.1.5 Next Step: The Magic of slub_debug

Now that the gun is ready, it's time to learn how to pull the trigger.

The core of SLUB lies in the kernel command-line parameter slub_debug. It allows you to finely control which caches get which debugging methods, all without recompiling the kernel.

The detailed official documentation for this parameter is available here: https://www.kernel.org/doc/html/latest/vm/slub.html

Of course, reading the raw documentation can be a bit dry. In the next section, we'll translate those obscure parameters into practical scenarios. Through a few concrete examples, we'll show you exactly how slub_debug helps us catch the culprits behind "out-of-bounds access" or "use-after-free."

Keep this intuition in mind for now: SLUB debugging is about planting landmines around memory objects — step out of bounds, and it blows up. In the next section, we'll plant those mines.