Using kmemleak
Recall the scenario we saw at the end of the previous section: hundreds of vm_area_struct objects were allocated, piling up in memory with nobody paying attention to them. We said at the time that this was reasonable system behavior—until it isn't.
The question is: how do you know when it's "reasonable" and when it's the beginning of a "leak"?
When you're writing a kernel module or debugging a driver, memory is like sand slipping through your fingers—you think you've got a firm grip, but it's leaking all along. We need a spotlight that can illuminate these "orphaned" memory blocks.
That's exactly what kmemleak is for.
Its name sounds like a hacker tool, but its working principle is remarkably simple, even a bit "brute-force": it acts like a tireless cleaner, periodically scanning the entire kernel memory for blocks that have been allocated but have no pointers pointing to them. If it finds a piece of memory that is neither freed nor pointed to by anything, kmemleak declares it a leak.
In this section, that's exactly what we'll do—turn on kmemleak, let it run, and then throw our buggy code at it to see how it exposes these awkward secrets.
6.1 Basic Workflow — The Five-Step Method
The basic workflow for using kmemleak is actually quite mechanical, even a bit tedious. We can boil it down to a standard five-step checklist. But before you follow along, there's an important prerequisite to confirm.
Step 0: Verify the Environment (Crucial)
Before starting, you need to ensure two things:
- Is debugfs mounted?
It's usually under
/sys/kernel/debug. If that directory doesn't exist, you need to mount it yourself first. - Is kmemleak actually enabled?
This is a major pitfall. You think just selecting it in the kernel config is enough? Not necessarily. It might be in a "configured but disabled" state. If the
echo scancommand throws an error, nine times out of ten, this is the problem.
Assuming you've got both of the above sorted out, the following five steps are the standard operating procedure:
Step 1: Make It Dirty
Run your potentially leaky code, or execute your test cases. The goal here is to "create the crime scene"—make the memory leak happen.
Step 2: Scan (This is the Core Step)
Now, manually trigger a memory scan. Execute as root:
echo scan > /sys/kernel/debug/kmemleak
This command wakes up a kernel thread named kmemleak. It doesn't literally "look" with eyes; instead, it traverses memory and checks pointer reference relationships.
This process can be a bit slow. On my virtual machine, it usually takes a few seconds. If your machine has a lot of memory or is under heavy load, you might need to wait a moment.
If it finds suspicious objects, a message like this will pop up in the kernel log:
kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
Step 3: Read the Report
Just as the log suggests, go read that pseudo-file:
cat /sys/kernel/debug/kmemleak
Step 4: Clean Up the Battlefield (Optional)
If you want to run another round of testing without interference from the previous report, clear it:
echo clear > /sys/kernel/debug/kmemleak
6.2 When You Can't Write to kmemleak — A Real-World Troubleshooting Guide
Sounds simple, right?
But in reality, you'll hit a wall at Step 2. When you confidently type echo scan, the terminal might coldly reply:
# echo scan > /sys/kernel/debug/kmemleak
bash: echo: write error: Operation not permitted
Your first reaction might be: "Wait, aren't I running this as root?"
Don't panic, this isn't a permission issue. Go look for clues in the kernel log (using dmesg or journalctl -k), and you'll very likely see this glaring red line:
kmemleak: Kernel memory leak detector disabled
This is weird—you clearly enabled CONFIG_DEBUG_KMEMLEAK in the config, so why does it still say it's disabled?
Here's a very interesting debugging trick we mentioned in earlier chapters. We can reboot with the debug and initcall_debug kernel parameters to see exactly what happened during boot.
Add these two options to your kernel boot parameters:
debug initcall_debug
After rebooting into the system, check the boot log and specifically search for messages related to kmemleak:
[ ... ] kmemleak: kmemleak_late_init() failed (-12)
See that? The return value is -12.
Your intuition for kernel error codes tells you: -12 corresponds to ENOMEM (Out of Memory).
Hold on. This is counterintuitive.
kmemleak is a memory leak detection tool, and it failed to initialize because of insufficient memory? It's like a plumber dying of thirst because the pipes have no water.
Actually, kmemleak needs a small memory pool early in boot to record logs. The size of this pool is determined by the kernel config option CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE, which defaults to only 16,000 bytes. In certain situations, this tiny budget simply isn't enough.
So, you dutifully pull out the scripts/config tool and double that value:
$ scripts/config --set-val CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE 32000
Recompile the kernel and reboot.
The result...
$ dmesg | grep "kmemleak"
kmemleak: Kernel memory leak detector disabled
Still the same old story.
If you bury your head in the logs at this point and spend ages studying the memory pool size, you've fallen into the trap (a classic red herring). The problem isn't here at all.
The real cause is incredibly simple—so simple you might even feel a bit "silly."
You need to do one more thing: explicitly tell it "please enable" on the kernel boot command line.
By default, even if you compiled kmemleak, it is turned off (because of the performance overhead). You must add this to the boot parameters:
kmemleak=on
This is the real key.
Reboot with this parameter and check the log again:
$ dmesg | grep "kmemleak"
[ 6.743927] kmemleak: Kernel memory leak detector initialized (mem pool available: 14090)
[ 6.743956] kmemleak: Automatic memory scanning thread started
There we go. That memory pool size warning even disappeared (because now it initialized successfully).
⚠️ Pitfall Warning This is a classic trap. Key point: Configuring
CONFIG_DEBUG_KMEMLEAKjust builds the car; to drive it away, you still need to turn the key—and that'skmemleak=on. Side effect: If you forget this parameter, you'll drive yourself crazy staring atOperation not permitted, only to find out you're just missing a boot parameter.
By the way, that kmemleak kernel thread that gets woken up has its priority intentionally lowered (nice value of 10), meaning it only comes out to work when the system is relatively idle and won't steal resources from your normal workloads.
6.3 Hands-On Practice — Catching Leaks
Alright, the tool is fixed. Now it's time to find a few real "culprits" to practice on.
We've prepared three test cases. Don't blink—we're going bug hunting.
Test Case 3.1: The Textbook Leak
Let's look at the first piece of code. This code is blatantly bad:
// ch5/kmembugs_test/kmembugs_test.c
void leak_simple1(void)
{
volatile char *p = NULL;
pr_info("testcase 3.1: simple memory leak testcase 1\n");
p = kzalloc(1520, GFP_KERNEL);
if (unlikely(!p))
return;
pr_info("kzalloc(1520) = 0x%px\n", p);
if (0) // test: ensure it isn't freed
kfree((char *)p);
#ifndef CONFIG_MODULES
pr_info("kmem_cache_alloc(task_struct) = 0x%px\n",
kmem_cache_alloc(task_struct, GFP_KERNEL));
#endif
pr_info("vmalloc(5*1024) = 0x%px\n", vmalloc(5*1024));
}
There are three leaks in this code:
kzallocallocates 1520 bytes.kmem_cache_allocallocates atask_struct(ifCONFIG_MODULESis not defined).vmallocallocates 5KB.
The most ironic part is that if (0)—it's like blatantly writing: "I know I should free this, but I just don't want to."
Let's run it:
cd <booksrc>/ch5/kmembugs_test
sudo ./run_tests
...
(Type in the testcase number to run): 3.1
Running testcase "3.1" via test module now...
[ ... ] kzalloc(1520) = 0xffff888003f17000
[ ... ] vmalloc(5*1024) = 0xffffc9000005c000
The code has finished running, and the memory has leaked. Now, let's bring in kmemleak:
sudo sh -c "echo scan > /sys/kernel/debug/kmemleak"
Wait for it to finish (that few-second pause feels a bit surreal), and check the report:
sudo cat /sys/kernel/debug/kmemleak
You'll see a bunch of output. Don't panic, let's just look at the first leak report:
unreferenced object 0xffff8880127f8000 (size 2048):
comm "run_tests", pid 5498, jiffies 4296684850 (age 84.737s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...
backtrace:
[<00000000c0b84cb6>] slab_post_alloc_hook+0x78/0x5b0
[<00000000f76c1d8d>] kmem_cache_alloc_trace+0x16b/0x370
[<000000009f614545>] leak_simple1+0xc0/0x19b [test_kmembugs]
[<00000000747f9f09>] dbgfs_run_testcase+0x1e6/0x51a [test_kmembugs]
[...]
Let's decode this log like a detective:
-
Address and size:
0xffff8880127f8000, this block of memory is 2048 bytes. Wait, we clearly requested 1520 bytes in our code, so why is it 2048? This is the slab allocator's "rounding up" logic. The kernel finds the closest cache size that is larger than the requested size, which iskmalloc-2khere, so it gives us 2KB. -
Context:
comm "run_tests",pid 5498. This tells you who did it. Theagefield tells us this leak has existed for over 84 seconds. -
Memory contents: The hex dump section is all
0x00. This is because we usedkzalloc(zeroed alloc), so the contents were zeroed out. If we had used regularkmalloc, there might be some residual data from before here—that's when things get truly exciting (or terrifying). -
Backtrace — This is the hardest evidence. Reading from bottom to top:
__x64_sys_write: User space initiated a system call (ourechocommand).dbgfs_run_testcase: Entered our debugfs handler function.leak_simple1: Entered that intentionally buggy test function.kmem_cache_alloc_trace: Finally executed the slab allocation.
Smoking gun. kmemleak didn't just catch it; it dragged out its entire ancestry.
As for the second leak (the vmalloc one), it's also in the report with the exact same logic, and the stack will show the call path through vmalloc.
Test Case 3.2: Blaming the Caller
Sometimes leaks aren't so obvious. For example, this pattern:
// caller
else if (!strncmp(udata, "3.2", 4)) {
res2 = (char *)leak_simple2();
pr_info(" res2 = \"%s\"\n", res2 == NULL ? "<whoops, it's NULL>" : (char *)res2);
if (0) /* test: ensure it isn't freed by us, the caller */
kfree((char *)res2);
}
The called leak_simple2 dutifully allocates memory and returns the pointer:
char *leak_simple2(void)
{
char *p = kmalloc(8, GFP_KERNEL);
if (!p) return NULL;
strcpy(p, "leaky!!");
return p;
}
This is a classic gray area in C: the function documentation says "caller is responsible for freeing," but what if the caller forgets? The compiler won't remind you, and the runtime won't crash on the spot.
How does kmemleak see this?
Run test case 3.2, scan, and check the report:
unreferenced object 0xffff8880074b5d20 (size 8):
comm "run_tests", pid 5779, jiffies 4298012622 (age 181.044s)
hex dump (first 8 bytes):
6c 65 61 6b 79 21 21 00 leaky!!.
backtrace:
[<00000000c0b84cb6>] slab_post_alloc_hook+0x78/0x5b0
[<00000000f76c1d8d>] kmem_cache_alloc_trace+0x16b/0x370
[<000000009f614545>] leak_simple2+0xc0/0x19b [test_kmembugs]
[<00000000747f9f09>] dbgfs_run_testcase+0x1e6/0x51a [test_kmembugs]
See that?
The hex dump even prints out that string leaky!!.
kmemleak doesn't care whether it's "designed to be freed by the caller" or "forgotten to be freed." If no pointer points to it, it's an orphan, and it's a leak.
This is also why the kernel community later championed resource management mechanisms like devm_kalloc—shifting responsibility from forgetful humans to the system itself.
Test Case 3.3: Ghosts in Interrupt Context
The last scenario is trickier.
The previous examples all ran in process context. What if the leak happens in interrupt context? After all, interrupt handlers can't sleep and can't just do whatever they want.
To simulate this scenario, we used a mechanism called irq_work to force the code to run in hardirq context.
void leak_simple3(void)
{
pr_info("testcase 3.3: simple memory leak testcase 3\n");
irq_work_queue(&irqwork);
}
/* This function runs in (hardirq) interrupt context */
void irq_work_leaky(struct irq_work *irqwk)
{
// ...
pr_debug("kzalloc(129) = 0x%px\n",
kzalloc(129, GFP_ATOMIC)); // Must use GFP_ATOMIC here!
}
Run it, scan, and check the results:
unreferenced object 0xffff88800b614800 (size 256):
comm "hardirq", pid 0, jiffies 4298048922 (age 12.020s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...
backtrace:
[...]
[<00000000d54114cc>] irq_work_leaky+0x2a/0x40 [test_kmembugs]
[<00000000c4d72c5e>] irq_work_run_list+0x54/0xf0
[<00000000c4d73044>] irq_work_tick+0x64/0x80
Notice this line: comm "hardirq", and pid 0.
This is where kmemleak gets clever. It didn't just find the leak; it accurately identified the context in which the leak occurred. This is an absolute lifesaver when you're debugging memory leaks inside interrupt handlers.
6.4 The Kernel's Built-In Test Module
Actually, you don't need to write this buggy code yourself. The kernel source code has it ready for you.
If you enabled CONFIG_DEBUG_KMEMLEAK_TEST when configuring the kernel, the build system will generate a module named kmemleak-test.ko (usually located under samples/kmemleak/).
Just plug it in:
sudo modprobe kmemleak-test
Then run through scan. You'll be startled—it reports 13 memory leaks in one breath. This is a great validation environment where you can see how kmemleak handles so many different types of leaks all at once.
6.5 The Conductor's Baton on the Console
Finally, let's summarize that magical /sys/kernel/debug/kmemleak file. We can not only read it but also write to it to control it:
| Write Content | Effect |
|---|---|
scan | Immediately trigger a memory scan. |
clear | Clear the current leak report list. |
dump=... | This is an advanced move. You can specify an address and have kmemleak print the reference relationships around it. It's like asking: "Who exactly is pointing to this memory?" |
off | Completely shut down the kmemleak scan thread. |
on | Turn the scan thread back on. |
A recommended debugging workflow:
# 1. 清理现场
echo clear > /sys/kernel/debug/kmemleak
# 2. 跑你的测试模块或操作
...
# 3. 稍微等一下(让内存分配稳定下来)
# 4. 触发扫描
echo scan > /sys/kernel/debug/kmemleak
# 5. 检查日志有没有 "new suspected memory leaks"
dmesg | tail
# 6. 如果有,看详细报告
cat /sys/kernel/debug/kmemleak
This workflow might be simple, but it can save you from those nights of staring blindly at slabinfo and guessing.
kmemleak isn't perfect (it has false positives and performance overhead), but in that dark kernel memory space, it's absolutely the brightest light you have in your hands.