Skip to main content

11.5 Debugging Kernel Modules: When the Symbol Table Hides in Memory

In the previous section, we essentially "hacked" into a running kernel and watched it wake up in start_kernel. But honestly, that felt more like watching a stage play—we were just the audience, watching a plot arranged by the director.

The real nightmare mode is when your own code is going haywire on stage, and all you have is an out-of-focus flashlight.

You load a kernel module with insmod. If you're lucky, the screen goes black and the system reboots. If you're unlucky, nothing happens, as if the module never existed. You sprinkle printk statements everywhere, but still can't figure out exactly which line is trampling memory.

We need KGDB.

Debugging a kernel module with KGDB is very similar to debugging the kernel itself, but there is one key difference: GDB has no idea which corner of memory your module lives in.

Kernel code is fixed—it's there at boot, indifferent to everything. But modules are dynamically loaded and unloaded. They are ELF files loaded into memory, and the kernel casually throws them into whatever free virtual address it finds. No matter how clairvoyant GDB is, it can't predict the kernel's current mood (or the memory allocator's decisions).

So, we must be the "informant"—telling GDB the exact memory location of the module.


Telling GDB Where Your Module Is

The kernel is actually quite thoughtful. Although it doesn't tell GDB directly, it lays out all the information in sysfs.

Take a look at /sys/module/<module-name>/sections/. Each pseudo-file starting with a dot (.) records the load address of that module's ELF section within kernel virtual memory.

Suppose we want to debug the usbhid module (first check with lsmod to see if it's loaded). We can list its section information like this:

ls -a /sys/module/usbhid/sections/
# 输出类似于:
./ ../ .rodata .symtab .bss .init.text .text
.data .exit.text [...]

This isn't a normal directory. If you read the contents of these "files" (requires root privileges), you get the addresses.

Let's read a few key sections:

cd /sys/module/usbhid/sections
cat .text .rodata .data .bss

On my x86_64 Ubuntu 20.04, the output looks roughly like this (I've slightly adjusted the formatting for readability):

0xffffffffc033b000 # .text 的起始地址
0xffffffffc0348060 # .rodata 的起始地址
0xffffffffc034e000 # .data 的起始地址
0xffffffffc0354f00 # .bss 的起始地址

With these numbers in hand, we can feed GDB what it needs.

We need to use the add-symbol-file command. The syntax is a bit long, but the logic is simple: first give it the text section address, then tell it where each data section is, one by one.

(gdb) add-symbol-file </path/to/>usbhid.ko 0xffffffffc033b000 \
-s .rodata 0xffffffffc0348060 \
-s .data 0xffffffffc034e000 \
[...]

Once this command executes successfully, GDB suddenly gains X-ray vision—it can understand function names and variable names inside the module, and even lets you set breakpoints in the module code.

But typing this command manually is agonizing, error-prone, and the address changes every time the module is reloaded.

Fortunately, we can "cheat." I adapted a classic script from LDD3 (Linux Device Drivers 3) called ch11/gdbline.sh. Its working principle is simple: it iterates through all section files under /sys/module/<module>/sections/, automatically assembles that super-long add-symbol-file command, and prints it out directly.

All you need to do is copy, paste, and hit Enter. Long live automation.


Hands-on Practice: Catching a Misbehaving Module

Talk is cheap. We need to use KGDB to debug a buggy kernel module for real.

We've prepared a small experimental module, ch11/kgdb_try. It's simple, but lethal:

  1. Initialization: Starts a delayed workqueue.
  2. Execution: After 2.5 seconds, the workqueue function do_the_work is called.
  3. Mischief: Inside this function, we intentionally wrote an overflow loop that forcefully writes 11 bytes into a local array buf that is only 10 bytes long.
// ch11/kgdb_try/kgdb_try.c
static int __init kgdb_try_init(void)
{
pr_info("Generating Oops via kernel bug in a delayed workqueue function\n");
INIT_DELAYED_WORK(&my_work, do_the_work);
schedule_delayed_work(&my_work, msecs_to_jiffies(2500));
return 0;
}

static void do_the_work(struct work_struct *work)
{
u8 buf[10];
int i;
pr_info("In our workq function\n");
/* The bug: loop goes one too far! */
for (i = 0; i <= 10; i++)
buf[i] = (u8)i;
print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, buf, 10);
// [...]
}

Why the 2.5-second delay? To save your life. Without this delay, the module would crash the instant it loads, leaving you no chance to type GDB commands. These 2.5 seconds are your "debugging window."

This bug looks insignificant, but on my test machine, it directly caused a Kernel Panic—not only killing the process but freezing the entire system. That's the power of a kernel-mode stack overflow: no Exception can catch it; you're at the mercy of fate.

Let's switch things up in this section. We're dropping ARM and playing this game on x86_64. There are quite a few steps, so keep up.


Step 1: Preparation — Gathering Ammunition

To fight this battle, we need three things:

  1. An x86_64 kernel with KGDB enabled.
  2. A bootable Rootfs to hold our modules and scripts.
  3. The kgdb_try.ko module compiled against this kernel.

Step 1.1: Compile the Target Kernel

Here's a brief overview of the process:

  1. Download the source: For this demo, I'm using 5.10.109 LTS. Let's assume you extracted it to ~/linux-5.10.109.
  2. Configure the kernel: make menuconfig. You must select the KGDB-related options here (refer to the previous section). To save time, I've saved a config file as ch11/kconfig_x86-64_target.
  3. Handle a minor pitfall: If you're using a newer 5.10+ kernel, compilation might fail complaining about missing debian/canonical-revoked-certs.pem. This is a certificate check issue; just turn it off:
    scripts/config --disable SYSTEM_REVOCATION_KEYS
    scripts/config --disable SYSTEM_TRUSTED_KEYS
  4. Compile: make -j[n] all. On success, you'll get two important artifacts:
    • arch/x86/boot/bzImage: The compressed kernel image.
    • vmlinux: A massive uncompressed image, fully loaded with debug symbols. This is the main course GDB will feast on.

Step 1.2: Prepare the Root Filesystem

Building a Rootfs from scratch is too painful. I've prepared a Debian Stretch-based image for you. You can find rootfs_deb.img.7z in the resource directory for this book. Extract it:

7z x rootfs_deb.img.7z
# 解压后得到 images/rootfs_deb.img

This is a 512MB disk image. Everything we need is already inside: modules, scripts, and tools.

Step 1.3: Compile the Test Module

Attention! This module cannot be compiled under your host's current kernel. It must be compiled in the target kernel's (that 5.10.109) source tree.

Open ch11/kgdb_try/Makefile, and you'll see a specific comment and configuration I added:

#@@@@@@@@@@@@ NOTE! SPECIAL CASE @@@@@@@@@@@@@@@@@
# We specify the build dir as the linux-5.10.109 kernel src tree
KDIR ?= ~/linux-5.10.109

The Makefile will automatically look for the kernel build headers in that directory. If your path is different, remember to change it here.

Then, under ch11/kgdb_try, run make, and you'll get kgdb_try.ko. (Note: If you modify the code and recompile, remember to copy the new .ko file back into that rootfs image. Mount the image -> copy -> unmount. It's a combo move.)


Step 2: Boot the Target and Stand By

With everything ready, fire up the virtual machine. We'll stick with QEMU.

The book provides a ready-made script run_target.sh, but to make the parameters clear, I'll write it out explicitly:

cd <book_src>/ch11
qemu-system-x86_64 \
-kernel ~/linux-5.10.109/arch/x86/boot/bzImage \
-append "console=ttyS0 root=/dev/sda earlyprintk=serial rootfstype=ext4 rootwait nokaslr" \
-hda images/rootfs_deb.img \
-nographic -m 1G -smp 2 \
-S -s

Here are two old friends:

  • -S: Freeze. QEMU freezes the CPU immediately after starting, waiting dead for GDB to connect.
  • -s: Shorthand. Short for -gdb tcp::1234, it opens a listener on port 1234.

Additionally, if your host machine supports KVM (hardware virtualization), adding -enable-kvm will make it blazingly fast, but watch out for nested virtualization issues.

At this point, the QEMU window should be completely silent, but the CPU is standing by like an obedient soldier.


Step 3: GDB Enters, Establishing the Connection

Go back to your host machine (or your Ubuntu VM), enter the kernel source tree, and launch GDB.

cd ~/linux-5.10.109
gdb ./vmlinux

When GDB starts, it automatically reads your ~/.gdbinit file. In the config I provided with the book, I preloaded a macro called connect_qemu:

define connect_qemu
target remote :1234
hbreak start_kernel
hbreak panic
#hbreak do_init_module
end

Type connect_qemu in GDB. Bang! Connected. GDB automatically stops at start_kernel.


Step 4: Against the Clock — Loading the Module and Injecting Symbols

The main event begins.

Right now, GDB is stopped at start_kernel. Type c (continue) to let it run. The kernel will boot until it prints the login prompt. Just hit Enter to drop into a bare-bones Shell.

This is the crucial step: we need to run that explosive script.

Inside the target system's Shell, there's a script called /myprj/doit. It does three things:

  1. Tells the kernel "Panic on Oops, don't try to recover" (panic_on_oops = 1).
  2. Loads our mischievous module kgdb_try.ko.
  3. Automatically generates that super-long add-symbol-file command.

Now, run it inside the target system:

# Inside the QEMU target console
cd /myprj
./doit

The moment you hit Enter, the countdown begins. You have 2.5 seconds.

The script will spit out a massive block of GDB commands, intimidatingly long, sandwiched between ---snip--- markers.

Move fast!

  1. Copy that block of GDB commands from the target window to your clipboard.
  2. Instantly switch to the host machine's GDB window.
  3. Press Ctrl+C. This forcefully interrupts the target kernel's execution, pulling it back from the brink of collapse and pausing it at some random location. It doesn't matter where it stops, as long as it's paused.
  4. In GDB, cd to the module source directory (ch11/kgdb_try/) so GDB can find the source files.
  5. Paste that super-long command and hit Enter. GDB will ask if you really want to load it; type y.

Now, your GDB is fully armed. It knows where the kgdb_try module is, and it knows its symbol table.

The final step is setting the trap at the breakpoint:

(gdb) hbreak do_the_work
Hardware assisted breakpoint 3 at 0xffffffffc004a000: file [...]/kgdb_try.c, line 43.

Step 5: Witnessing the Crash

Type c to let the target system continue running.

This time it's actually executing. The timer is ticking... the moment 2.5 seconds hit, the workqueue function triggers.

Bang!

GDB instantly catches the breakpoint, stopping at the entry of do_the_work.

Now you can debug it just like a normal C program: step through (s or n), inspect variables (p i), or check the stack (bt).

We know the bug is in the loop, but we don't want to mindlessly press F10 ten times. Let's try something advanced: a conditional breakpoint.

(gdb) b 49 if i==8

This command means: "When you reach line 49, only stop if the variable i equals 8." Type c to continue.

Perfect. It stopped. At this point, i is 8. Press s (step into) a few more times, and you'll see i become 9, then 10...

The moment buf[10] is written, you've corrupted the stack frame.

If you continue execution, the kernel's stack protection mechanism (Stack Canary) will immediately detect it and trigger a spectacular Kernel Panic.

GDB will display:

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa0010030

And on the target system's console, that famous panic stack trace will pop up.

See? That's the thrill of source-level debugging. You don't just know that it "blew up"—you watch with your own eyes as it blows itself up.


The Ultimate Question: How Do You Debug the init Function?

Earlier, we used a delayed workqueue (2.5 seconds) to leave ourselves a way out. But what if your module blows up the moment it loads, right inside the init function, leaving you not even 0.001 seconds?

By then, trying to set a breakpoint at do_the_work is already too late—you won't even have a chance to type the command.

You need an earlier entry point.

Remember that ~/.gdbinit file when GDB connected? There's a commented-out line inside:

#hbreak do_init_module

Uncomment it.

do_init_module is the kernel's internal manager responsible for calling a module's init function. Any module's loading must go through it. If you set a breakpoint here, the kernel will pause no matter which module is loading. At this point, you can check the mod parameter (a pointer to struct module), print its name (p *mod or x/s mod->name), and see if it's the module you want to debug. If it is, step into it (s), and you'll land directly in the module's init function.

This is the ultimate technique for debugging those "die-on-sight" modules.


Exercises

Exercise 11.5 ⭐⭐ (Application)

If you are debugging a module and set a breakpoint at hbreak my_func without loading the symbol table (add-symbol-file) first, what happens? Try to reproduce this phenomenon in your lab environment and explain GDB's feedback.

Hint: GDB might report an error saying it can't find the symbol, or (worse) the breakpoint might be set at the wrong address. Check the info breakpoints output.