Skip to main content

11.7 Further Reading

The main text ended with the previous section.

If you've followed along with the hands-on exercises throughout this chapter, you now have a solid grasp of the core KGDB workflow—from configuring the kernel to attaching GDB, from breakpoints to scripting. This is enough to handle 90% of kernel debugging scenarios.

But the world of kernel debugging is vast. Even stretching this chapter to its limits, we've barely scratched the surface. There are countless edge cases, advanced techniques, and hard-won lessons from the trenches that no textbook can cover.

At this point, the best teacher is the internet—not for searching error messages, but for seeing how others stumbled, and discovering the crazy things kernel developers have done to shoehorn a debugger into the kernel.

The list below contains links I believe deserve a permanent spot in your bookmarks bar. Some are serious, some are hardcore, and some... well, are just plain fun.


📚 Core Documentation and Kernel Guides

First, while official documentation can sometimes be dry, it is the single source of truth.

Kernel Doc: Using kgdb, kdb and the kernel debugger internals This is the bible of KGDB. No matter what blogs say, you'll eventually need to come back here to verify parameters. It provides the clearest explanation of how to switch between KDB (the built-in minimalist debugger) and KGDB, and the details of how the underlying locks are implemented.

Merging kdb and kgdb (LWN, 2010) LWN's article quality is consistently excellent. This 2010 article looks back at how the KDB and KGDB subsystems went from being separate rivals to merging into one. Understanding this history explains why current configurations have both CONFIG_KGDB and CONFIG_KDB, and how they share the underlying I/O driver.

Man page on kdb(8) If you don't want to fire up two machines and prefer typing commands directly over a serial console, KDB is your only choice. This man page is your map when you get lost.


🛠️ Hardcore Practice and Embedded Platforms

There's a chasm between theory and practice, and the resources below are the bridges.

Using Serial kdb / kgdb to Debug the Linux Kernel - Douglas Anderson (Google, 2019) If you think we simply set up a basic serial port in this chapter, Douglas's talk will show you what a "complex environment" really looks like. He covers how to multiplex the Console and GDB over a single physical serial port. On real boards, serial ports are a scarce resource. Being able to watch kernel logs while debugging is the hallmark of a practical embedded engineer.

KGDB/KDB over serial with Raspberry Pi The Raspberry Pi is the most accessible ARM experimentation platform. This tutorial has a bit of a Yocto flavor, but it thoroughly demonstrates how to wire things up on real hardware. When you jump from QEMU's virtual world to real hardware and find your serial output is garbled due to voltage mismatches, this article can save your life.

A KDB / KGDB SESSION ON THE POPULAR RASPBERRY PI Though it's a 2013 article, many of the underlying mechanisms (like the early ARM boot process) haven't changed much. Seeing how veterans debugged kernels in an era without fancy tools is almost a spiritual experience.


🕵️‍♂️ Advanced Tricks and Toolbox

When you feel regular breakpoints aren't enough, here are some fun things to explore.

5 Easy Ways to Reduce Your Debugging Hours (Undo.io) This article isn't just about KGDB; it's about debugging mindset. How do you cover more code paths with fewer breakpoints? How do you leverage watchpoints instead of blindly using print statements? Most of the techniques mentioned here are universal, useful whether you're debugging the kernel or user-space applications.

Debugging ARM kernels using fast interrupts (LWN) This is advanced content even among advanced topics. Normal FIQ handling is already complex; if you want to debug within a FIQ context, this is your reference.


🎓 External Perspectives and GDB Tutorial Series

Step outside the kernel circle and see what GDB itself can do.

Red Hat Developer series on GDB A three-part GDB tutorial series produced by Red Hat. It covers everything from getting started to the principles of debuginfo, and even printf-style debugging. Although it focuses mainly on user-space programs, GDB's command syntax and Python extension mechanisms are exactly the same in KGDB. After reading this, your understanding of GDB will elevate from "knowing how to use it" to "understanding it."

Using GDB in TUI mode (Official Manual) & Debug faster with gdb layouts (Video) We mentioned TUI in Section 11.6. If you want to customize that window layout, or figure out why your TUI windows are flickering, look here. The video tutorial demonstrates highly efficient "keyboard-driven" operations that are incredibly satisfying to watch.


🐛 Common Issues and Firefighting Manual

Finally, when you've pressed every key and GDB still won't stop, go here for answers.

StackOverflow: KGDB remote debugging connection issue Failing to connect is the first hurdle for KGDB beginners. This thread discusses the various arcane issues with USB-to-serial adapters.

Breakpoints not being hit... (QEMU) If you set a breakpoint in QEMU and the program just flies right past it, it's highly likely an issue with CONFIG_STRICT_KERNEL_RWX or compiler optimizations. Someone in this Q&A has fallen into the exact same trap as you.


🎵 Bonus Easter Egg

The GDB Song (GNU) When you're tired of coding, give this a listen. It's genuinely... full of nostalgia.

"I'm a hacker, and that's what I do..."

Don't tell anyone I recommended this.


A kernel debugger in Python: drgn (LWN) This last one points to the future. drgn is a debugger written entirely in Python. It doesn't rely on GDB and reads kernel memory directly. Although we've been focusing on KGDB, keeping an eye on this new technology will show you another possibility for system-level debugging.


(End of this section)


Chapter 11 Echoes

Remember the question we asked at the beginning of this chapter—when the kernel is flying high, how do we actually grab it?

In this chapter, we essentially built an invisible "umbilical cord."

One end connects to your host machine where you're typing code, and the other end connects to that lonely kernel running inside QEMU (or on a real ARM board). In the past, you could only throw one-way notes into it via printk and pray they would drift back in the logs.

Things are different now.

Through KGDB, you are no longer just an observer; you are a ghost in the kernel world. You can freeze time at will, lift the memory cover to peek at variables, and even command it to "rewind." We learned how to understand the machine's layout through the massive symbol table that is vmlinux, and how to automate tedious module loading processes using Python scripts like lx-symbols.

More importantly, we built an intuition: Software doesn't lie; it only executes. When something feels "unscientific," it usually means the symbol table is wrong, or optimizations have hidden the variables. The process of solving these problems is the evolution from "black-box debugging" to "white-box dissection."

But this is only the beginning.

In the next chapter, the final part of this book, we will push these debugging capabilities to the limit. We will see how to use these tools to track down the most bizarre, hardest-to-reproduce system-level bugs, and witness how top-tier system maintainers can spot a missing semicolon in a screen full of errors at a single glance.

Are you ready? We are about to reach the end of the final leg.


(The book is about to conclude)


Exercises

Exercise 1: Understanding

Question: When using QEMU to emulate an ARM target system for kernel debugging, why can't we directly load the compressed kernel image zImage into GDB for source-level debugging? If we need to set a breakpoint early in the boot process (e.g., at the start_kernel function) to wait for the debugger to connect, how should the QEMU startup parameters be configured, and what is the purpose of the 'nokaslr' kernel boot parameter?

Answer and Analysis

Answer: The reason we cannot directly load zImage is that zImage is a compressed, self-extracting image that does not contain complete debugging symbol information. GDB requires an uncompressed ELF format file (vmlinux) with debugging symbols for source-level debugging.

QEMU startup parameter configuration: Use the -S -s parameter. Among these, -S causes the CPU to freeze at startup, and -s is shorthand for -gdb tcp::1234, which enables the GDB remote debugging port.

The purpose of the 'nokaslr' parameter is to disable Kernel Address Space Layout Randomization. If KASLR is enabled, the addresses of kernel code and data will randomize on every boot, causing the static symbol addresses in GDB to mismatch the actual runtime addresses. This makes it impossible to set breakpoints correctly or inspect variables.

Analysis: This question tests the understanding of the basic KGDB debugging environment. Debugging the kernel is essentially debugging a process, which requires a symbol table. zImage self-extracts at runtime, but GDB cannot parse symbols inside a compressed payload, so it must be used in conjunction with vmlinux. Regarding startup parameters, QEMU's -S pauses the machine, buying time for GDB to connect; nokaslr ensures the memory layout is "fixed," which is a prerequisite for static analysis tools to work.

Exercise 2: Application

Question: Suppose you are debugging a kernel module named my_led in a KGDB environment. The module is already loaded into the target kernel, but you find that GDB cannot hit the breakpoint at the my_led_init function, or the breakpoint is invalid. Given the /sys/module directory under sections, write the complete command steps to load the module's symbol table into GDB (assuming the .text section address is 0xbf012000).

Answer and Analysis

Answer: The steps are as follows:

  1. Find the section addresses: Check the addresses of each section on the target system. cat /sys/module/my_led/sections/.text (Output: 0xbf012000)

  2. Load the symbols: Use the add-symbol-file command in the host machine's GDB, specifying the addresses of key sections simultaneously to ensure GDB can correctly locate code and data. add-symbol-file path/to/my_led.ko 0xbf012000 -s .data 0xbf01a000 -s .bss 0xbf01c000 (Note: In actual use, replace the .data and .bss addresses with the actual values seen from the cat command. If .data/.bss are not specified, you may not be able to inspect global variables.)

Analysis: This question tests the practical skill of debugging LKMs. Kernel modules are loaded dynamically, and GDB cannot automatically know their load locations. Sysfs exposes this information through the sections directory. The core command is add-symbol-file. A common mistake here is providing only the .text address (function address). If the module uses global variables, failing to provide the .data and .bss section addresses will cause variable inspections to display <optimized out> or incorrect values.

Exercise 3: Thinking

Question: When configuring a kernel to support KGDB, it is generally recommended to disable the CONFIG_STRICT_KERNEL_RWX option (kernel code segment read-only, data segment non-executable). Please analyze why traditional software breakpoints might fail if this option is not disabled. In this case, besides modifying the kernel configuration, what alternative GDB command can be used, and what is its underlying principle?

Answer and Analysis

Answer: Principle analysis: The implementation principle of software breakpoints is that the CPU executes a INT 3 (x86) or a specific illegal instruction at the target location. This requires modifying the code segment in memory. If CONFIG_STRICT_KERNEL_RWX is enabled, the kernel code segment is forcibly marked as read-only by hardware. Any attempt to write to the code segment (including GDB inserting a breakpoint instruction) will trigger a page fault, causing a system crash or breakpoint write failure.

Alternative solution and principle: We can use hardware breakpoints. Use the hbreak (hardware breakpoint) command in GDB.

Principle: Hardware breakpoints utilize the CPU's internal debug registers (such as DR0-DR7 on x86) to set up address monitoring. When the CPU's execution flow accesses a specific address, the hardware automatically triggers an exception. This method does not require modifying target memory and is therefore unaffected by RWX permissions. However, the number of hardware breakpoints is typically very limited (e.g., x86 usually only has 4).

Analysis: This question tests the understanding of the underlying principles of debugging mechanisms (memory and CPU features). Software breakpoints rely on memory writes and fail when encountering write protection. This guides the thinker toward mechanisms that don't rely on memory modification—namely, utilizing the CPU's hardware debug registers. This is essential knowledge for advanced debuggers and also explains why hbreak is more important in embedded or security-hardened environments.


Key Takeaways

Kernel debugging faces a dilemma that user-space debuggers cannot reach: when code runs at Ring 0 privilege or a system crash prevents output buffers from flushing, traditional debugging methods completely fail. The key to solving this problem is KGDB (Kernel Debugger), which adopts a client-server architecture to split GDB's functionality: the Host runs a feature-rich GDB client as the commander, while the Target runs a lightweight KGDB server inside the kernel as the frontline scout. The two communicate over a network or serial port, enabling source-level kernel pausing and inspection.

The key to setting up a KGDB environment lies in correctly distinguishing between vmlinux and bzImage: the former is the uncompressed ELF image containing complete debugging symbols, which must be provided to GDB to read the symbol table; the latter is the compressed image loaded by the bootloader for system startup. To make the kernel support debugging, you must forcibly enable CONFIG_DEBUG_INFO (generate debugging symbols) and CONFIG_KGDB (embed the GDB server) at compile time. It is also recommended to disable CONFIG_STRICT_KERNEL_RWX to prevent memory protection mechanisms from blocking software breakpoint writes, and add the nokaslr boot parameter to disable address randomization, ensuring instruction addresses are fixed and predictable.

In practice, debugging dynamically loaded kernel modules is one of the biggest challenges because GDB cannot automatically predict where a module will be loaded in memory. Debuggers must read the files under /sys/module/<module_name>/sections/ to obtain the actual load addresses of the module's sections and use the add-symbol-file command to manually inform GDB of these addresses. To handle extreme cases where a module crashes immediately upon loading, you can pre-set a hardware breakpoint at the kernel's do_init_module function entry, or use a delayed execution mechanism to buy yourself a precious time window to connect the debugger and load the symbols.

KGDB provides watchpoint functionality that is more powerful than traditional breakpoints. By utilizing the CPU's hardware debug registers (such as DR0-DR3 on x86), it can pause execution the instant a specified variable is read or written. This "data breakpoint" is extremely effective for tracking bizarre memory corruption or "phantom" pointer modifications in the kernel, directly pinpointing the exact line of code that modified the variable. Additionally, by leveraging the Python GDB scripts included in the Linux kernel source code (enabled via CONFIG_GDB_SCRIPTS), we can directly execute commands like lx-dmesg or lx-lsmod within GDB to extract system state directly from memory without resuming kernel execution, greatly improving debugging efficiency.