11.4 Debugging the Kernel Hard with KGDB
In the previous section, we managed to keep our virtual sheep, "ARM VExpress," alive and kicking. It boots happily in QEMU, spits out a pile of boot logs, and finally obediently gives you a shell prompt.
But that's not enough.
As a kernel developer, just watching it run is far from sufficient—you need to be able to pin it down, cut it open, and see exactly what's going on inside its head. In this section, we're going to turn this somewhat elusive virtual sheep into a transparent glass specimen, where every function call and every pointer jump is under our control.
This brings us to the real main course of this chapter: KGDB.
Waking the Beast: Making the Kernel Stop at Boot
Right now, the kernel is like a train that accelerates wildly the moment it leaves the station. Without taking some measures, it won't voluntarily stop and wait for you.
Our current goal is to make the kernel pause at the very earliest stage of boot, sitting there like a well-behaved student waiting for the teacher (GDB) to ask questions.
If you've flipped through the kernel documentation, you know that to achieve this on real hardware, it usually takes a two-step approach:
- Enable KGDB support when compiling the kernel and specify an I/O driver (like
kgdboc), telling the kernel: "Talk to the outside world via serial port." - Add
kgdbwaitto the boot parameters, telling the kernel: "Wait there as soon as you boot, don't move."
But in QEMU's virtual world, all of this can be much simpler—or rather, more "hacker-ish." QEMU itself has a god's-eye view; it doesn't need slow physical media like serial ports and can deal with GDB directly in memory.
Running the Target System: Adding Some Tricks to QEMU
Open the terminal where you ran QEMU earlier; we need to add some spice to that long command.
QEMU provides two extremely convenient parameters (please make sure to remember them, you'll use them frequently):
-S: Freeze CPU at startup. It freezes the CPU and prevents it from running. It's equivalent to pressing the pause button.-s: Shorthand for-gdb tcp::1234. It opens a listener on port 1234, waiting for GDB to connect.
Now, modify your QEMU startup command. Note that we added -S -s on top of the previous section, and passed the nokaslr parameter to the kernel—this is crucial because KASLR (Kernel Address Space Layout Randomization) will make your function addresses unpredictable, making debugging feel like shooting at a moving target, so we need to strip away this layer of defense first.
$ qemu-system-arm -m 512 \
-M vexpress-a9 -smp 4,sockets=2 \
-kernel <...>/seals_staging_vexpress/images/zImage \
-drive file=<...>/seals_staging_vexpress/images/rfs.img,if=sd,format=raw \
-append "console=ttyAMA0 rootfstype=ext4 root=/dev/mmcblk0 init=/sbin/init nokaslr" \
-nographic -no-reboot -audiodev id=none,driver=none \
-dtb <...>/seals_staging_vexpress/images/vexpress-v2p-ca9.dtb \
-S -s
Hit Enter.
You'll notice that this time there are no scrolling logs, no Freeing unused kernel memory, the screen is completely dead.
That's exactly right. The sheep has been frozen.
⚠️ Warning If you experience any weird behavior at this step on your machine, check if there's already a QEMU process running in the background hogging resources. Although pure software-emulated ARM (running on x86) rarely crashes, it's always better to be safe than sorry.
Connecting the Client: GDB Takes the Stage
Now, the stage is handed over to GDB on the host machine. There is an extremely easily confused point here that I must emphasize:
We need to use the GDB from the cross-compilation toolchain, not the native gdb on your system!
Why? Because this is running ARM code. Using an x86 GDB to debug an ARM world is like a chicken talking to a duck.
Open a new terminal window (because the QEMU window is already occupied) and summon the debugger:
$ arm-none-linux-gnueabihf-gdb -q <...>/seals_staging_vexpress/linux-5.10.109/vmlinux
Reading symbols from <...>/seals_staging_vexpress/linux-5.10.109/vmlinux...
See that Reading symbols line? That's so important.
Note that what's being loaded here is vmlinux—that uncompressed, chubby kernel ELF file packed with debugging symbols, not the skinny zImage we used to boot. GDB is now ingesting all the function addresses and variable structures from the kernel into its brain.
Once the symbols are loaded, we have a complete map of this sheep. Now, we just need to connect the map to the actual terrain:
(gdb) target remote :1234
Remote debugging using :1234
0x60000000 in ?? ()
(gdb)
What is that 0x60000000? That's where the Program Counter (PC) stopped. Although it shows ?? right now because symbols haven't been fully resolved yet, it marks a successful connection.
The First Breakpoint: Pausing Like Magic
Now, you are God.
You can let this beast keep running, or you can make it stop at any time. Let's try the most classic example: setting a trap where the network driver initializes.
(gdb) b register_netdev
Breakpoint 2 at 0x80754bc8: file net/core/dev.c, line 10238.
Type c (continue) to let the kernel keep running.
The screen starts scrolling. But before long, it's like it hit an invisible wall—it stopped. GDB jumps out and tells you the breakpoint was triggered:
(gdb) c
Continuing.
Breakpoint 2, register_netdev (dev=0x818a0800) at net/core/dev.c:10238
10238 {
(gdb)
This is incredibly satisfying.
Type bt (backtrace), and you'll see a long call stack—from the low-level assembly startup code all the way up to the C language network subsystem, layered and clearly visible.
(gdb) bt
#0 register_netdev (dev=0x818a0800) at net/core/dev.c:10238
#1 0x8085b8a4 in smsc911x_probe (...) at drivers/net/ethernet/smsc/smsc911x.c:2466
#2 0x8047d1f8 in platform_drv_probe (...) at drivers/base/platform.c:645
...
(gdb)
This is the scene at the register_netdev function. You are now standing at the door of this function, scalpel in hand.
Practical Exercise: Seeing Through Data Structures
Since we've stopped, we might as well do something more exciting.
The parameter of register_netdev is a struct net_device * pointer. This structure is the core of the network subsystem, and it's absurdly large. Let's see what it looks like:
(gdb) p dev
$1 = (struct net_device *) 0x818a0800
Just an address. Not thrilling enough. Print out all the contents:
(gdb) p *dev
$2 = {name = "eth%d\000...", name_node = 0x0, ifalias = 0x0, mem_end = 0, ...}
If you see this blob of stuff crammed together in your terminal, you'll probably get a headache. GDB's default output format is indeed a bit anti-human.
But we can adjust it. Type this command, and your world will be much cleaner:
(gdb) set print pretty
(gdb) p *dev
$3 = {
name = "eth%d\000\000\000\000\000\000\000\000\000\000",
name_node = 0x0,
ifalias = 0x0,
mem_end = 0,
mem_start = 0,
base_addr = 0,
irq = 30,
state = 4,
...
}
(gdb)
Much better.
You can see this device's IRQ number is 30, and its name is still the template string eth%d (because registration isn't complete yet). This is the true face of the kernel at the moment of boot—no secrets whatsoever.
Setting Off Again: Let It Finish, Then Pull It Back
After inspecting the scene, we still need to let the system live. Continue by typing c.
The system will complete the rest of the boot process until you see that familiar / # prompt again.
Think the debugging is over? No.
As long as QEMU is still running and GDB is still connected, you can drag this sheep back at any time. Press Ctrl + C in the GDB window.
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
cpu_v7_do_idle () at arch/arm/mm/proc-v7.S:78
78 ret lr
(gdb)
Look, it stopped at cpu_v7_do_idle. This is where the ARM processor dozes off when it has nothing to do.
Take a peek at the stack with bt, and you'll see it's indeed hanging out in the idle process.
This is the power of source-level remote debugging. You aren't guessing why it's slow; you are watching it work.
Wrapping Up: Shutting Down Like a Gentleman
When you've had enough fun, don't brutally kill the QEMU process.
Although killing it directly works too, since we're simulating a proper Linux system now, let's try to play by the rules. Type poweroff in that serial terminal and watch it shut down gracefully:
/ # poweroff
[ ... ] System halted.
Alright, this chapter's "ripper" course was just a warm-up. We learned how to pin down the kernel, but the real challenge is often those kernel modules you write yourself that explode the moment they load.
In the next section, we'll tackle those mischievous modules.