11.2 Understanding How KGDB Works Conceptually
In the previous section, we set up the SEALS project and got our "hardware" (albeit virtual) and rootfs. Now, we need to bring the real weapon—the debugger—onto the kernel battlefield.
KGDB enables source-level kernel debugging. What does this mean? It means you no longer have to stare at a wall of assembly instructions or vague Oops reports. Instead, you can stop directly at a specific line of C code, inspect variables, and step through execution.
But you might immediately realize a problem, one that is also the core design challenge of KGDB.
The Pause and the Paradox: Who Debugs the Debugger?
When we use GDB to debug a regular user-space application, GDB runs on top of the OS and can pause that unfortunate process at any time. But what if what we want to debug is the kernel itself?
When GDB pauses the kernel to execute a breakpoint instruction, the entire OS stops. The CPU is frozen in kernel mode at that exact moment. So, who runs GDB? Who handles network packets? Who responds to keyboard input?
This is a classic "performing surgery on yourself" problem. If you are performing an appendectomy on yourself while also holding the scalpel, it's obviously impossible.
To solve this paradox, GDB uses a client-server architecture, which requires the cooperation of two machines (or two isolated execution environments).
The Two Leads: Host and Target
KGDB's solution is to split GDB in half, placing each piece in a different world:
- Host: This is your everyday computer (or our x86_64 virtual machine). It runs the GDB client. This is a massive piece of software containing all the fancy symbol resolution, source code display, and TUI graphical interface features. It is the "commander."
- Target: This is the machine we want to debug (or the QEMU-emulated ARM32 board). It runs the GDB server, which is KGDB. This is a lightweight component that resides permanently in kernel space as part of the kernel. It is the "frontline scout."
Analogy (Part 1)
You can think of this process as the collaboration between a bomb disposal expert and a field assistant.
- The Host is the bomb disposal expert sitting in a safe truck, with blueprints, coffee, and a large screen (the GDB client).
- The Target is the field assistant wearing a bomb suit, standing right next to the bomb (the KGDB server).
- The bomb is the kernel that is about to crash.
- The expert can't go up and cut the wires because they aren't on site; the assistant must follow the expert's orders on which wire to cut or which voltmeter to read, and then report the results back to the expert.
But there is a crucial distinction here: a real bomb disposal assistant has free will, whereas the KGDB server is merely a thread embedded within the kernel. When KGDB is active, the entire kernel on the Target is actually paused—only when KGDB receives a command from the Host (such as "read the value of variable x") does it briefly wake up, perform the memory read, send the result back to the Host, and then put the kernel back to sleep.
The Communication Line: Crossing the Isolation Barrier
Since they are separated, the two halves need to talk.
The GDB client and server typically communicate over a TCP/IP network (default port is 1234), though a serial console can also be used.
The process looks like this:
- You type
stepin GDB on the Host. - The GDB client packages this command and sends it over the network to KGDB on the Target.
- KGDB on the Target receives the command, takes control of the CPU, and single-steps one instruction.
- KGDB captures the resulting register state, packages it, and sends it back to the Host.
- Your GDB screen updates, displaying the next line of source code.
Analogy (Part 2: Revealing the Distance)
Returning to the bomb disposal analogy.
There is one aspect of this analogy that is "over-anthropomorphized": the assistant (KGDB) isn't actually "thinking" when the kernel is paused. It is more like puppet strings.
When the kernel is paused, the entire world (the Target) has hit the pause button. The only thing KGDB can do is respond to the Host's requests to read or write memory and registers. It's not like an assistant shouting into a walkie-talkie, "Sir, what's the voltage?"; it's more like you using remote desktop software to control a frozen computer—every mouse click (command) you make is a remote electrical signal that forces the CPU to twitch for an instant before going limp again.
This explains why the Target's screen usually "freezes" during debugging—because the kernel thread responsible for drawing the display never gets a chance to run.
Why Not JTAG? (A Brief Aside)
When it comes to embedded debugging, veterans might ask: why not use JTAG (like the BDI2000 hardware debugger)?
JTAG is indeed more hardcore. It operates directly at the chip level and can even be used before the kernel boots. Furthermore, JTAG's built-in gdbserver is usually more stable than KGDB because it doesn't depend on the kernel's own robustness.
But JTAG is expensive and requires physical wiring. KGDB's advantage is that it is purely software-based. As long as your Linux kernel can still catch its breath and run the network stack, you can use KGDB. For tinkering in virtual environments like QEMU, or doing early kernel driver development, KGDB is that readily available Swiss Army knife.
Alright, the theory is clear: the Host gives the orders, and the Target obediently follows. Now we need to transplant this mechanism onto our specific hardware platform.
Setting the Stage: vmlinux vs. bzImage
Before jumping into practice, there is one last conceptual point to clear up.
When we compile the kernel under Linux, two key files are generated. You must distinguish between them, because loading the wrong one during debugging will cause things to break.
- vmlinux: This is the uncompressed kernel image. It is a massive ELF format file containing all symbol tables. This is meant for humans and GDB.
- bzImage / zImage: This is the compressed kernel image (located in the
arch/<arch>/boot/directory). This is meant for the Bootloader (U-Boot, GRUB). This is what actually gets loaded and executed at boot time.
Analogy (Part 3: Verification)
Returning to the blueprint analogy.
- bzImage is the machine that has been compressed, packaged, and prepared for shipping to the battlefield.
- vmlinux is the detailed engineering blueprint showing the exact position of every single screw.
When we debug with KGDB, GDB needs the blueprint. It has to know exactly which page and which line (memory address) the
sys_openfunction is on in the blueprint. As for how the machine was compressed and packed into the BOOT partition, GDB doesn't care.If you tell GDB to load
bzImage, it's like handing an engineer a compressed ration pack and asking them to fix a machine—they can't understand the internal structure. So, always keep the uncompressed vmlinux at hand.
Having just vmlinux isn't enough. If this blueprint is an "abridged" version with key areas blacked out (no debug symbols), repairs are still impossible. We need to intentionally bake the symbol information into vmlinux when compiling the kernel.
That is the dirty work we will tackle in the next section: modifying the kernel configuration to enable CONFIG_DEBUG_INFO.