Skip to main content

Chapter 12: The Price of Concurrency — Kernel Synchronization Mechanisms Part 1

The previous two chapters covered scheduling, and everything looked orderly and well-organized. But that was because we had the luxury of focusing on just "one process" at a time.

Reality is harsh: on modern multi-core systems, dozens of hardware interrupts, kernel threads, and user processes are scrambling across CPUs simultaneously. If they happen to target the same piece of memory... well, that's exactly what this chapter is about — kernel synchronization.

This is not an "nice-to-have" advanced topic. It's the dividing line between "toy code" and "production-grade drivers." Once your code runs on multiple cores, or your device generates interrupts, all bets are off. Unless you truly know what you're doing, your data structures will rot right under your nose, and you won't know it until the system suddenly panics at 3 AM.

So buckle up. What follows might be a bit brain-bending, but it's a necessary rite of passage for becoming a serious systems programmer. We're going to cage the concurrency beast — or at least... learn how not to get bitten by it.


12.1 Critical Sections: When "Simultaneous" Happens

Before diving into code, we need to align on a core concept.

Concurrency itself is not a problem. Code is read-only, and so are instructions. Having ten thousand CPUs execute the same function simultaneously is not only safe — it's exactly what we want (that's the whole point of multi-core). The problem lies in "writes" — more precisely, multiple execution paths simultaneously modifying the same writable data.

If that sounds simple, it might be because you haven't seen true chaos yet. Let's break it down from the most basic concepts.

What is a Critical Section?

The textbook definition is dry, but in the kernel, this is a life-or-death concept. Let's break it into two conditions that must both be true:

  1. Potentially concurrent: This code path has the possibility of being executed by two or more execution flows at the same time.
  2. Operates on shared writable state: The code touches global variables, static variables, or some memory that everyone can see.

If both conditions are met, congratulations — you've found a critical section.

Critical sections must be forcibly serialized. This means that at any given time, only one thread can be "dancing" inside. If two threads are inside simultaneously, that's a bug, typically called a race condition or data race.

Wait, why can't I even "read"?

This is a classic intuitive trap. "I'm just reading the data, not changing it — surely that's fine?"

In a single-core, single-threaded world, reads are indeed safe. But in a multi-core concurrent environment, read operations can encounter dirty reads or torn reads. Imagine reading a 64-bit variable on a 32-bit architecture — this requires two instructions to complete. If, in the instant after reading the upper 32 bits, another thread modifies the lower 32 bits... what you get is a "Frankenstein" value — neither the old value nor the new one, but meaningless garbage.

Atomicity: The Art of Indivisibility

When discussing concurrency, we often hear about "atomic operations."

Atomicity means an operation either completes entirely or doesn't happen at all — there's no "half-done and then stopped" state. In kernel terminology, if a critical section is atomic, it must run to completion in one go without being interrupted (not even by interrupts).

In user-space programming (like Pthreads), we typically only care about "mutual exclusion" — as long as I keep you out, we're fine. But in the kernel, things are trickier:

  • Process context: Most of the time, you're running in process context (e.g., handling a system call). If the critical section might block (e.g., waiting for I/O), you can't use forced atomicity — you must use a mutex.
  • Atomic context: If you're running in an interrupt handler, or holding a spinlock, you absolutely cannot sleep. Here, the critical section must not only be exclusive but also atomic — any call to schedule() would be catastrophic.

Here's a simple mental model:

  • If the code might sleep -> Mutex
  • If the code must never sleep (e.g., in an interrupt) -> Spinlock

That Classic i++ Trap

Almost every concurrency story starts with this example. But I still have to tell it, because it's the best case study of how modern compilers and hardware team up to trip you.

static int i = 5;

void foo(void) {
i++; /* Is this line safe? */
}

Your intuition tells you: "Isn't this just adding 1? A single instruction, right?"

Dead wrong.

Let's open the black box. Unless the compiler has extreme optimizations enabled (and the target instruction set happens to support atomic increment, like x86's LOCK INC instruction), i++ is typically broken down into three steps at the machine level:

  1. Load: Read i from memory into a register.
  2. Add: Add 1 in the register.
  3. Store: Write the new value back to memory.

These three steps are completely independent instructions in the CPU's eyes. This means that if a context switch occurs between step 1 and step 3, or if another core is also writing to i, your modification gets overwritten. That's a data race.

You can try this yourself on Compiler Explorer (godbolt.org). Without optimizations, i++ typically compiles to three assembly instructions. With -O2, the compiler might optimize it into a single atomic instruction (on x86), but on ARM or other RISC architectures, it might still be multiple instructions.

As kernel developers, our default assumption must be: i++ is not atomic. We must use locks.

An Intuitive Picture of Critical Sections

Let's make the abstract concepts concrete. Suppose your driver has a read method like this:

ssize_t my_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
{
// t0: only touching local variables — safe, even with concurrency
char tmp[100];

// t1: starts touching globally shared data -> danger!
my_global_data->len = count;
my_global_data->buf = buf;

// ... still touching shared data ...

// t2: stops touching shared data -> danger cleared
copy_to_user(buf, ...); // back to safe territory (assuming buf is passed by user)

// t3: local variables again, safe
return count;
}

Without locking, any time between t1 and t2, another thread (or an interrupt handler on the same CPU) could barge in and trash my_global_data. That's why critical section protection matters.

A small detail: You might notice that copy_to_user isn't included in the critical section above (because we assumed buf is a local variable). But if copy_to_user needs to access protected shared data as well, it must also be within the lock's protection scope.

Code Running Fine Doesn't Mean There's No Problem

This is the most frustrating thing about concurrency.

If your test cases aren't aggressive enough, or you get lucky, that extremely tiny "time window" might never get hit. The code runs rock-solid in the lab for three months, but once deployed to a customer's server (higher load, more cores, more frequent interrupts), it crashes every few days.

Concurrency bugs are non-reproducible, random, and intermittent. That's why we must strictly follow the rules when writing code, rather than relying on luck.


12.2 The Tools: Locks

Alright, now that we've identified the enemy (critical sections), we need weapons to deal with them.

The most intuitive weapon is the lock.

The Essence of Locks: Turning Parallel into Serial

You can think of a lock as a single-plank bridge.

  • Winner: The first thread to grab the lock. It earns the right to enter the critical section and continue executing. All other threads can only watch from the sidelines.
  • Losers: The threads that didn't get the lock. They must wait.

This "waiting" has two completely different strategies, which are also the key distinction between the two main types of locks in the Linux kernel:

  • Mutex: The losers go to sleep. The kernel kicks them off the CPU, and the scheduler runs other processes. Only when the winner releases the lock are the losers woken up to rejoin the battle.
  • Spinlock: The losers spin in place. They stare at the lock, constantly checking in a loop: "Is it free yet? Is it free yet?" They refuse to leave the CPU — they just wait right there.

This makes Spinlock sound stupid? Wasting CPU power? Don't worry — we'll get to why "sleeping" can sometimes be the worse choice.

Analogy callback

Remember the "single-plank bridge" metaphor?

  1. Building the mapping: A Mutex is like a toll booth at the bridgehead. If there's a car ahead (lock is occupied), you turn off your engine and sleep in the rest area until the toll collector wakes you up.
  2. Revealing the twist: A Spinlock is different. With a Spinlock, you're stuck in traffic on the bridge, but you don't get out of your car. You sit there staring at the taillights of the car in front, keeping your foot on the clutch and hand on the gear shift, even if you have to wait an hour. That's "spinning."
  3. Validating the analogy: Back to our code scenario. If you hold the lock for an extremely short time (like just assigning a variable), then having others "spin" and wait for you to finish is cheaper than putting them to sleep and waking them up (the overhead of context switches). In this case, Spinlock wins. But if your work takes a long time (like reading/writing disk), then Spinlock is pure CPU waste.

Deadlock: When Everyone is Waiting

Locks can backfire if used incorrectly. The most terrifying scenario is deadlock.

The most classic case is the AB-BA deadlock:

  • Thread A on CPU 0: Acquired Lock A, trying to acquire Lock B.
  • Thread B on CPU 1: Acquired Lock B, trying to acquire Lock A.

The result: A waits for B to release B, B waits for A to release A. They stare at each other until the end of time. The system hangs.

There's an even more subtle variant: self-deadlock. This can happen even with a single lock. If a thread holds a lock and then tries to acquire it again (recursive locking), the kernel's Spinlock implementation will typically deadlock — because it can never wait for itself to release the lock.

How to avoid it?

  1. Lock ordering rule: This is the iron law. If everyone follows the "acquire A first, then B" order, deadlock is impossible. Many places in the kernel have comments explicitly defining lock ordering.
  2. Keep it short: The shorter you hold a lock, the lower the probability of deadlock and the better the system performance. This is a "lock granularity" issue.

12.3 Practical Toolbox: Mutex or Spinlock?

Theory aside, which one should you use when writing code? Here's a simple decision table.

Decision Tree: Which One?

  • Scenario A: I'm in process context (e.g., a system call), and my critical section might block (e.g., waiting for I/O, or allocating memory).

    • Answer: Mutex. Because you can sleep, and sleeping is better than hogging the CPU.
  • Scenario B: I'm in interrupt context, or any atomic context.

    • Answer: Spinlock. Because you can't sleep in atomic context! You must spin.
  • Scenario C: I'm in process context, my critical section is very short (e.g., just modifying a pointer or counter), and it absolutely will not block.

    • Answer: Spinlock. Because to avoid the context switch overhead of sleeping, spinning in place is actually more efficient.

Tool 1: Mutex

A Mutex is a "sleeping lock." The good side is that it allows you to do potentially blocking operations while holding the lock. The bad side is its inherent overhead (two context switches: one to sleep, one to wake up).

Initialization

Don't assume an uninitialized lock is open. Locks must be explicitly initialized.

#include <linux/mutex.h>

// Static definition
static DEFINE_MUTEX(my_mutex);

// Dynamic initialization (typically in module_init)
struct mutex my_mutex;
mutex_init(&my_mutex);

Best practice: Put the lock inside the data structure it protects.

struct my_driver_data {
int some_value;
struct mutex lock; /* protects other members of this struct */
};

This way you won't confuse "which lock protects what."

Usage API

The most basic usage:

// Lock (if lock is occupied, current process enters uninterruptible sleep)
mutex_lock(&my_mutex);

/* ... critical section code ... */

// Unlock
mutex_unlock(&my_mutex);

Warning — common pitfalls:

  • Recursive locking: Calling mutex_lock twice on the same mutex from the same thread typically results in deadlock or an error. Kernel Mutexes do not support recursion.
  • Who locks, who unlocks: Don't try to unlock a lock held by someone else, and don't try to release an uninitialized lock.

Interruptible Sleep Variant

Sometimes you want your process to be interruptible while waiting for a lock (e.g., user pressed Ctrl+C).

if (mutex_lock_interruptible(&my_mutex) != 0) {
return -ERESTARTSYS; // interrupted by a signal
}
/* ... critical section ... */
mutex_unlock(&my_mutex);

There's also a non-blocking variant:

if (mutex_trylock(&my_mutex)) {
/* Got the lock, do work */
mutex_unlock(&my_mutex);
} else {
/* Didn't get it, do something else, retry later */
}

Tool 2: Spinlock

A Spinlock is a "busy-waiting lock." It's a brute-force tool.

Initialization

Similar to Mutex:

#include <linux/spinlock.h>

// Static definition
static DEFINE_SPINLOCK(my_spinlock);

// Dynamic initialization
spinlock_t my_spinlock;
spin_lock_init(&my_spinlock);

Usage API: Basic Version

The simplest usage (only for scenarios where you're certain you won't be interrupted):

spin_lock(&my_spinlock);

/* ... critical section: NEVER sleep! NEVER access user-space memory! ... */

spin_unlock(&my_spinlock);

Warning — the fatal sleep

If you call any function that might sleep while holding a Spinlock (like kmalloc(GFP_KERNEL), copy_to_user, or msleep), you'll trigger a famous kernel error:

BUG: scheduling while atomic

This doesn't just mean your system is close to crashing — this error is caught by debug options like CONFIG_DEBUG_ATOMIC_SLEEP and will thoroughly shame you in the logs.

Why does this happen? Because the Spinlock's design premise is "I'll release the lock momentarily, just wait." If you go to sleep, no one knows when you'll wake up, and other CPU cores spinning on this lock will wait until the end of time.

Usage API: Dancing with Interrupts

Reality is often more complicated. If your process-context code is accessing shared data, and suddenly a hardware interrupt fires, and the interrupt handler also needs to access that same data... you've got trouble.

  • Scenario: Process holds Spinlock A. Interrupt fires, interrupt handler tries to acquire Spinlock A.
  • Result: The interrupt handler waits for the process to release the lock, but because interrupts have higher priority, the process never gets CPU time to release it. Deadlock.

The solution: Disable interrupts on the local CPU when acquiring the lock.

unsigned long flags;

spin_lock_irqsave(&my_spinlock, flags); /* disable interrupts and save state */

/* ... critical section ... */

spin_unlock_irqrestore(&my_spinlock, flags); /* restore previous interrupt state */

This API is the safest. The irqsave suffix means "save the current interrupt mask to flags, then disable interrupts." On unlock, irqrestore restores the previous mask.

Note: If you're certain interrupts were enabled before entering the critical section, you can use spin_lock_irq() as a shortcut, but the irqsave version is always the safer choice with no noticeable performance penalty.

There's also the case of conflicts with "bottom halves" — softirqs/Tasklets:

spin_lock_bh(&my_spinlock);
/* ... critical section ... */
spin_unlock_bh(&my_spinlock);

This disables software interrupts, preventing softirqs from preempting you.


12.4 Summary: A Practical Checklist

Now let's compress this chapter's knowledge into a checklist you can take to the field.

  1. Identify critical sections: Ask yourself — can this code be executed concurrently? Is it modifying global data? If yes, that's a critical section.
  2. Choose a lock:
    • Might sleep? -> Mutex.
    • In interrupt context, or critical section is very short and must not sleep? -> Spinlock.
  3. Initialize: DEFINE_MUTEX or DEFINE_SPINLOCK, or the _init functions.
  4. Lock/Unlock:
    • Mutex: mutex_lock / mutex_unlock.
    • Spinlock: spin_lock / spin_unlock (watch out for interrupts!).
    • Spinlock + interrupt risk: spin_lock_irqsave / spin_unlock_irqrestore.
  5. Avoid pitfalls:
    • Never sleep while holding a lock.
    • Don't lock recursively.
    • Don't do time-consuming work inside a lock (keep lock granularity fine).
    • Always release the lock on all paths — including error paths!

In the next chapter, we'll venture into darker territory: atomic variables, lock-free programming, and how to use kernel tools (like Lockdep) to catch those deeply hidden concurrency bugs. Until then, make sure the lock in your hand is held firmly enough.


Exercises

Exercise 1: Understanding

Question: In Linux kernel development, what are the two core conditions that must be met for a code section to be classified as a Critical Section? If a piece of code runs in process context and only reads shared data, can it go unprotected? Why?

Answer and Analysis

Answer: Two core conditions: 1) The code path may be concurrent (i.e., there exists the possibility of parallel execution); 2) The code operates on shared writable data. Even read-only access typically cannot go unprotected, as it may result in Dirty Reads or Torn Reads — reading incomplete or inconsistent data.

Analysis: According to the definition, a critical section must satisfy both "potentially concurrent" and "accesses shared writable data." Although read operations don't modify data, in a concurrent environment without synchronization mechanisms, reads may occur simultaneously with writes. For example, reading a 64-bit integer on a 32-bit system cannot be done atomically — you might read half of the new value and half of the old value (a torn read). Therefore, to guarantee data consistency, all accesses to shared data (both reads and writes) typically require mutual exclusion protection.

Exercise 2: Application

Question: You're writing a network device driver. Here's what you know:

  1. There's a core function netif_rx() for receiving packets.
  2. This function is called both in process context (via system calls) and in interrupt context (hardware interrupt handler, Hardirq).
  3. The function needs to access a globally shared linked list pkt_queue.

To ensure kernel stability, should you use a Mutex or a Spinlock to protect pkt_queue? Explain your reasoning.

Answer and Analysis

Answer: You should use a Spinlock.

Analysis: This is an application scenario question testing understanding of atomic context and lock mechanisms.

  1. Mutex (sleeping lock): When the lock is occupied, the acquiring thread goes to sleep. This is allowed in process context but strictly prohibited in interrupt context, because interrupt handlers cannot be scheduled to sleep.
  2. Spinlock (busy-waiting lock): When the lock is occupied, the waiting thread spins in a loop without sleeping.

Since the question explicitly states that netif_rx() is called in interrupt context, the code runs in atomic context and cannot use a Mutex that might cause sleeping. Therefore, a Spinlock must be used to protect the shared data.

Exercise 3: Thinking

Question: Suppose we have two kernel threads, Thread A and Thread B, and two resource locks, Lock1 and Lock2. Initially, both locks are released.

The execution sequence is as follows:

  1. Thread A acquires Lock1.
  2. A context switch occurs, Thread B starts executing.
  3. Thread B acquires Lock2.
  4. Thread B tries to acquire Lock1 (held by A, so B enters blocking/spinning wait).
  5. A context switch occurs, Thread A resumes execution.
  6. Thread A tries to acquire Lock2 (held by B, so A enters blocking/spinning wait).

What phenomenon has occurred? Based on the "Lock Ordering" principle from this chapter, how can this problem be avoided at the code level?

Answer and Analysis

Answer: This phenomenon is called Deadlock. To avoid this, follow the Lock Ordering principle: define a global lock hierarchy, specifying that multiple locks must always be acquired in the same order (e.g., always acquire Lock1 first, then Lock2 — never in reverse order).

Analysis: This is a classic deadlock scenario (Circular Wait condition). Thread A holds Lock1 and waits for Lock2, while Thread B holds Lock2 and waits for Lock1. Both wait for the other to release their resource, resulting in permanent blocking and inability to continue execution.

Thinking and Solution: In complex kernel modules, when multiple locks need to be acquired, it's easy to have inconsistent lock acquisition orders due to different code paths. The core of solving this problem lies in breaking the "circular wait" condition. Kernel developers typically establish strict coding conventions: all code paths accessing Lock1 and Lock2 must acquire Lock1 first, then Lock2. This way, when Thread A acquires Lock1, Thread B will be blocked when trying to acquire Lock1, preventing it from acquiring Lock2, thus eliminating the possibility of mutual waiting.


Key Takeaways

Concurrency safety in kernel development begins with precise identification of critical sections — code that simultaneously satisfies two conditions: "may be executed by multiple flows" and "operates on shared writable state." Since high-level language operations like i++ are typically decomposed by the compiler into "read-modify-write" non-atomic instructions, synchronization mechanisms must be used to forcibly serialize these regions, otherwise data overwrites or dirty reads — race conditions that are extremely difficult to reproduce — will easily occur.

Choosing the correct lock mechanism depends on the context and execution characteristics of the code. Mutexes are suitable for process context where the critical section might block (e.g., waiting for I/O or allocating memory), at the cost of context switch overhead. Spinlocks are designed for interrupt context or extremely short critical sections that must never block — they avoid scheduling overhead through CPU busy-waiting, but sleeping is strictly prohibited while holding one.

Spinlock usage comes with stringent rules, the biggest trap being calling functions that might sleep while holding the lock. This causes the system to throw a fatal "Scheduling while atomic" error, breaking the kernel scheduler. Additionally, if a spinlock held by a process is requested again by an interrupt handler on the same CPU, deadlock results — therefore spin_lock_irqsave must be used to disable local interrupts when acquiring the lock to ensure safety.

Lock design inherently introduces the risk of deadlock, most typically the "AB-BA" deadlock (two threads holding two locks in opposite order) or self-deadlock caused by recursive locking. The core iron law for prevention is to enforce a globally unified lock acquisition order (if multiple locks are needed, they must always be acquired in a fixed order), and to ensure lock holding time is as short as possible to reduce contention probability.

In real engineering practice, locks should be co-located with the data structures they protect, and a rigorous code review checklist should be established. Beyond normal execution paths, special attention must be paid to error handling paths, ensuring that corresponding unlock operations are called on every exit branch (including error returns), to avoid lock leaks from logic flaws that ultimately lead to system deadlock.