Skip to main content

8.1 Lock Debugging Overview

There is a class of bugs whose favorite trick is invisibility.

When you just stare at the code, everything looks perfect; when you try to catch them with printk, they magically vanish due to output latency; even when you muster the courage to step through with GDB, they stop triggering because the timing has changed. These are concurrency bugs—the dreaded "Heisenbugs" of kernel development.

In this chapter, we're going after these phantoms. But first, we need to agree on one thing: concurrent programming isn't about luck—it's about rigorous rules. What we need to do is drag what's hiding in the cracks of timing out into the open.

This is harder than it sounds. The difficulty lies in establishing absolute order within an invisible, intangible multidimensional space.


8.1 Technical requirements

Environment setup and prerequisites

Before we begin this detective game, let's confirm what's in our toolbox. The environment requirements for this chapter are exactly the same as Chapter 1—you need a configured Linux development machine and the necessary cross-compilation toolchain. The code examples in this book will still be available in the GitHub repository (if you haven't cloned it yet, now is the time):

https://github.com/PacktPublishing/Linux-Kernel-Debugging

In addition, we'll frequently reference the final two chapters of my previous free eBook, Linux Kernel Programming – Part 2. Why? Because that book dedicates two full chapters to breaking down the principles of locking and debugging tools. If you haven't read it, or if terms like "critical section," "spinlock," and "mutex" feel unfamiliar, the road ahead might be a bit rough.

To make sure we're all on the same page, it is strongly recommended that you grab a copy of Linux Kernel Programming – Part 2. Don't worry, it's free—completely free.

The repository link is here:

https://github.com/PacktPublishing/Linux-Kernel-Programming-Part-2

About locks: what we assume you already know

This book's mission is "debugging," not "beginner tutorials." This means we won't spend time explaining what a mutex is, what a spinlock is, or how atomic operations work. If you're currently struggling with the question "why do we need locks?", then please pause and read the following chapters from LKP - Part 2—they contain all the answers you need:

  • Chapter 6, Kernel Synchronization – Part 1: This is the foundation of foundations. It explains in detail what a critical section is, why kernel space needs to care about concurrency, and how to correctly use the mutex and spinlock APIs. Pay special attention to the section on lock side effects—for example, spin_lock_irq() disables interrupts, and if you don't know this, your system might deadlock for seemingly no reason.
  • Chapter 7, Kernel Synchronization – Part 2: This covers advanced techniques, including lock-free techniques and the use of Lockdep. Lockdep is the most powerful lock validation tool in the kernel; if you haven't heard of it, this chapter will open up a whole new world for you.

This might sound like an ad, but it isn't. It's because concurrency debugging in the kernel relies so heavily on a deep understanding of the underlying mechanisms. If you don't grasp the fundamental difference between spin_lock and mutex, you won't be able to make sense of the KCSAN reports later on; if you don't understand how lockdep works, you won't know why it can detect deadlocks ahead of time.

Enough talk, where's the book?

To reiterate, the LKP-2 eBook is completely free. You can download the PDF directly from GitHub, or grab a Kindle version on Amazon (also free).

The link is right here—don't say I didn't tell you:

https://github.com/PacktPublishing/Linux-Kernel-Programming/blob/master/Linux-Kernel-Programming-(Part-2)/Linux%20Kernel%20Programming%20Part%202%20-%20Char%20Device%20Drivers%20and%20Kernel%20Synchronization_eBook.pdf

Alright, tools collected, prerequisites covered. Now, let's get down to business—catching those biting bugs in this invisible web.