Chapter 6: The Cost of Exclusion: Critical Sections and Atomicity
The Narrative Core of This Chapter
Concurrency wasn't designed—it was a nuisance forcefully shoved into the system with the advent of multi-core CPUs and interrupts.
We will have to face the phantom of the "critical section" and learn to tame it with two tools that have completely different temperaments: the Mutex and the Spinlock. In this process, you'll realize the weight of the word "atomicity" and why kernel developers always have "locking" on the tip of their tongues.
6.0 Chapter Prelude: When Order Begins to Collapse
There is a class of bugs that, at first glance, seems completely random and patternless—sometimes the program runs perfectly fine, and sometimes it crashes inexplicably. A reboot seems to fix it, but a while later, it breaks again. At this point, an experienced engineer will usually sigh and say, "Looks like a concurrency issue."
Concurrency issues, or synchronization problems, are the most headache-inducing and fatal traps in operating system kernels.
Imagine you and another person are using the same notepad to keep accounts. Without an agreement, you both pick up a pen at the same time and write different numbers in the same row. What's the result? That row might turn into a messy scribble, or the person who wrote last overwrites the previous person's data. Either way, the ledger is wrong.
In the era of single-core CPUs, we could pretend this problem didn't exist—as long as we disabled interrupts or carefully yielded in our code, we seemed to get by peacefully. But now it's different. Today's chips are SMP (Symmetric Multiprocessing), and even single-board computers like the Raspberry Pi are multi-core. This means that at the exact same moment, there are genuinely multiple streams of code running in physical parallel on different CPU cores. Not to mention interrupt handlers that can butt in at any time.
The old "naive" approach no longer works. You can't rely on luck. You need a strict set of rules to draw boundaries: within this code, only I can enter; everyone else (including other CPU cores and interrupts) must wait outside. This is why "critical sections" and "mutual exclusion" were born.
If concurrency isn't handled properly, the consequence isn't just data corruption—it's complete system deadlock or crash.
The mission of this chapter is to figure out two things:
- Who is actually competing with whom? (Critical sections and data races)
- What weapons do we use to separate them? (Mutex vs. Spinlock)
6.1 Critical Sections, Exclusion, and Atomicity
The Hidden Peril of Concurrency in the Linux Kernel
Let's take a step back and think the problem through clearly.
Not all code needs protection. Only those code paths that access shared writable data are dangerous. We have a name for these dangerous code regions: Critical Sections.
A Data Race occurs when a critical section is executed concurrently without protection. Imagine two CPU cores executing balance = balance + 1 at the same time. This line of code involves more than one step at the assembly level: it needs to read balance from memory, add 1, and write it back. If unfortunately, both CPUs read the old value (say, 100), each adds 1, and then writes back 101, the result is—an account that should have become 102 is only 101. That one dollar evaporated.
To solve this problem, we need to introduce two core concepts:
- Mutual Exclusion: When I enter the restroom, I lock the door, and everyone else must wait outside. This is Mutual Exclusion.
- Atomicity: The operation is indivisible. It's either fully done or not done at all; there is no "half-done and interrupted by someone else" situation. This is Atomicity.
In the kernel, to implement these two concepts, we primarily rely on two types of locks: Mutexes and Spinlocks.
Mutex or Spinlock? A Guide for the Indecisive
This is where kernel newcomers get most confused: which one should I use?
Simply put, these two guys have completely different behavioral patterns:
-
Mutex:
- Behavior: If the lock is taken by someone else, you go to sleep (block) and yield the CPU. When the lock owner releases the lock, the kernel wakes you up.
- Use case: The critical section is relatively long, or you're doing operations that might block (like waiting for I/O).
- Cost: Context switch overhead (sleeping and waking up takes time).
- Mortal enemy: Cannot be used in interrupt context, because interrupt handlers cannot sleep.
-
Spinlock:
- Behavior: If the lock is taken, you spin in place (busy loop), constantly asking "Are we there yet? Are we there yet?" until you acquire the lock.
- Use case: The critical section is extremely short (like changing a pointer or incrementing a counter). Also, you must use it in contexts where sleeping is not allowed (like interrupt handling or atomic contexts).
- Cost: If you hold the lock for too long, other CPUs spin idly, wasting CPU cycles, and can even cause system Thrashing—everyone is busy waiting for the lock, and nobody is doing real work.
Summary in one sentence:
- Can sleep? Use Mutex.
- Cannot sleep (or the time is extremely short)? Use Spinlock.
Using Mutexes
The Mutex is the most commonly used lock in the kernel; after all, in most cases we can comfortably run in process context.
struct mutex and Initialization
The kernel uses the struct mutex data structure to represent a Mutex. There are usually two ways to define it:
1. Static definition (compile-time):
static DEFINE_MUTEX(my_mutex);
This line of code defines and initializes a lock named my_mutex.
2. Dynamic initialization (runtime):
If you allocate the lock at runtime (for example, inside a probe function), you need to explicitly initialize it:
struct mutex my_mutex;
mutex_init(&my_mutex);
Locking and Unlocking API
Once you have a lock, using it is very intuitive.
/* 获取锁(进入临界区) */
/* 如果锁不可用,当前进程会进入不可中断的睡眠 */
mutex_lock(&my_mutex);
/* --- 临界区开始 --- */
/* 访问共享数据,做你想做的修改 */
/* --- 临界区结束 --- */
/* 释放锁(离开临界区) */
/* 必须由同一个任务调用 */
mutex_unlock(&my_mutex);
Unwritten Rules for Using Mutexes Correctly
There are a few pitfalls here—step into them and you get deadlock, so please be careful:
Deadlock Trap 1: Recursive Calls
You absolutely cannot lock again while already holding the lock. If you try to mutex_lock(&my_mutex) inside the critical section, you will deadlock. The kernel's Mutex does not support recursive locking. It will think it already holds the lock and then wait there until the end of time. This will genuinely blow up.
Deadlock Trap 2: Out-of-Order Locking If you need to hold two locks simultaneously (say, Lock A and Lock B), you must enforce a system-wide consistent order. Everyone must take A first, then B. If one thread takes A and waits for B, while another takes B and waits for A, that's the classic AB-BA deadlock.
Lock Leaks: Don't Forget to Go Home
If you take a lock, you must release it. And you can't return inside the critical section, or jump elsewhere, otherwise the lock leaks and others will wait forever.
Performance Trap: Don't Linger Although a Mutex allows sleeping, holding a lock for a long time degrades system concurrency performance. Lock granularity should be as small as possible—grab it, do your work quickly, and release it immediately.
Using Spinlocks
Now let's switch channels and talk about that "hothead"—the Spinlock.
Spinlocks are typically used in SMP systems, or single-core systems with kernel preemption enabled. Its core data structure is spinlock_t.
Basic Usage
Similar to Mutexes, there are both static and dynamic initialization methods:
/* 静态 */
static DEFINE_SPINLOCK(my_spinlock);
/* 动态 */
spinlock_t my_spinlock;
spin_lock_init(&my_spinlock);
The most basic lock/unlock APIs are spin_lock and spin_unlock:
spin_lock(&my_spinlock);
/* --- 临界区 --- */
/* 注意:这里绝对不能调用任何会引发睡眠的函数! */
/* --- 临界区结束 --- */
spin_unlock(&my_spinlock);
⚠️ Warning
Never sleep while holding a spinlock (e.g., calling kmalloc(GFP_KERNEL), msleep, etc.).
If you do this, you will trigger one of the kernel's most famous errors: "Scheduling while atomic".
This means you're trying to schedule in an atomic context, and the system will directly panic or hang. This will genuinely send your blood pressure through the roof.
The Tricky Case of Handling Interrupts
This is probably the most dizzying part of spinlocks.
Suppose you're holding a spinlock in process context, accessing shared data. Suddenly, an interrupt arrives, and the interrupt handler runs on the same CPU. It also tries to access this same data, so it goes for the lock too.
Result: Deadlock. The interrupt handler will spin forever waiting for the lock to be released, but the lock holder (the process) was interrupted and has no chance to run, and thus no chance to release the lock. Infinite loop achieved.
To solve this problem, the kernel provides a set of APIs with the irq suffix. Their meaning goes beyond just locking—they also include enabling/disabling local CPU interrupts.
Approach 1: spin_lock_irq / spin_unlock_irq
spin_lock_irq(&my_spinlock); /* 获取锁并禁用硬件中断 */
/* --- 临界区 --- */
spin_unlock_irq(&my_spinlock); /* 释放锁并恢复中断 */
This combination is simple and brute-force. It's suitable when you're unsure of the previous interrupt state, or when you don't care.
Approach 2: spin_lock_irqsave / spin_unlock_irqrestore — The Safest Way
In most cases, we should use this. It saves the current interrupt state into a flag variable and restores it upon unlocking.
unsigned long flags;
spin_lock_irqsave(&my_spinlock, flags);
/* 获取锁,禁用中断,并将之前的中断状态保存到 flags 中 */
/* --- 临界区 --- */
spin_unlock_irqrestore(&my_spinlock, flags);
/* 恢复之前保存的中断状态,然后释放锁 */
Why is this the best? Because if the code environment where your critical section resides already requires interrupts to be disabled, using the _irq version might mistakenly re-enable interrupts. _irqsave ensures "environmental restoration."
Approach 3: spin_lock_bh / spin_unlock_bh
If you're not worried about hardware interrupts but are concerned about bottom halves—softirqs or tasklets—competing, you can use this. It disables bottom half execution but allows hardware interrupts to preempt.
Spinlock Usage Summary — Cheat Sheet
Let's organize the pile of APIs above and match them to their scenarios:
| Scenario | Recommended API | Characteristics |
|---|---|---|
| Simple case Only in process context, no data shared with interrupts/bottom halves | spin_lock()spin_unlock() | Lowest overhead. Pure spinlock. |
| Medium complexity Process context shares data with interrupts, don't care about saving interrupt state | spin_lock_irq()spin_unlock_irq() | Acquires lock and disables local interrupts. Simple but somewhat brute-force. |
| Safest / Complex Process context shares data with interrupts, needs strict state restoration | spin_lock_irqsave()spin_unlock_irqrestore() | This is the most recommended general practice. Saves and restores interrupt state, no side effects. |
| Fighting bottom halves Competing with Softirqs / Tasklets | spin_lock_bh()spin_unlock_bh() | Disables bottom half execution, allows hardware interrupts. |
Now looking back at the question from the prelude: who is competing? Multi-core and interrupts. What do we use to separate them? If sleeping is allowed, use a Mutex to yield the CPU; if sleeping is not allowed or the time is extremely short, you have to bite the bullet and stand guard with a Spinlock.
So, What Are Spinlocks Doing on a Uniprocessor System?
You might ask: Do spinlocks still make sense on a uniprocessor (UP) system?
That's a really good question.
Since there's only one CPU, if the lock holder is running, the waiter can't run. How can the waiter spin? It can't run at all!
The answer is: On UP systems, the "spinning" logic of spinlocks is optimized away. spin_lock is replaced with a no-op at compile time (or merely an increment to the preemption counter).
However! Note the irq related APIs.
Even on UP systems, spin_lock_irqsave still has meaning. Because it disables interrupts. This is exactly what we need—the only way to prevent deadlock on a uniprocessor is to ensure that when you get interrupted, the interrupt won't try to contend for the same lock.
So, there is an extremely important engineering principle here:
As a driver developer, don't worry about whether it's UP or SMP. Always write according to SMP logic, and always use standard APIs. The kernel internals will handle the details for you. On a uniprocessor, those useless spin loops will automatically disappear, leaving only the necessary interrupt-disabling logic.
Supplement: Regarding "Local Locks" in the 5.8 Kernel
That's not all. In the 5.8 kernel, the real-time Linux (PREEMPT_RT) project introduced a new term—"Local Locks." This might be too far out for newcomers; just knowing it exists is enough: its main purpose is to provide optimizations for hard real-time kernels, but on non-real-time kernels, it's very helpful for lock debugging (especially when paired with tools like lockdep). If you're writing code with extremely strict real-time requirements, you can look up articles about this feature on LWN.
Exercises
Exercise 1: Understanding
Question: Suppose your Linux driver has a global static variable static int safety_count;, and in the driver's read method, you have the following code: safety_count++;. If this driver runs on a multi-core system with CONFIG_SMP enabled and uses no locking mechanism, please explain why this could lead to data inconsistency (dirty reads or lost updates)?
Answer and Analysis
Answer: Because the i++ operation is not an atomic instruction on most processor architectures; it typically involves three steps: "read-modify-write." Without lock protection, two threads might read the same old value simultaneously, each increment by one, and write it back, causing one increment to be lost, or they might read data that is only half-updated.
Analysis: This tests your understanding of Critical Sections and Data Races. Although a single line of C code looks simple, the machine code generated by the compiler usually involves multiple instructions like load, inc, and store. In a multi-core environment, Thread A and Thread B might execute the load instruction simultaneously, get the same initial value, then each increment and write back, resulting in a final value that only increased by 1 instead of the expected 2. This is a classic data race caused by an unprotected critical section.
Exercise 2: Application
Question: You are writing a block device driver and need to allocate memory and copy large amounts of user data in the process context write method, while also protecting a frequently accessed global linked list. Should you choose a Mutex or a Spinlock? Why?
Answer and Analysis
Answer: You should choose a Mutex.
Analysis: This tests your understanding of the criteria for choosing between a Mutex and a Spinlock. The scenario involves "process context" and "allocating memory" and "copying large amounts of data"—all of which are potentially blocking operations. The Spinlock is designed for protecting short critical sections; while held, the thread is in a busy-wait state and cannot sleep. If you call a function that might sleep (like memory allocation) while holding a Spinlock, it will cause a kernel crash or hang. A Mutex allows the holder to sleep, making it suitable for scenarios that might take time and require blocking.
Exercise 3: Application
Question: In a character device driver, both the driver's read method (process context) and the bottom-half interrupt handler (softirq context) access the same shared queue. If you can only use locking mechanisms for synchronization, which lock API should you choose to protect the critical section in the read method? Please write out the specific lock function calls.
Answer and Analysis
Answer: You should use spin_lock_irqsave() and spin_unlock_irqrestore() (or use spin_lock_irq() / spin_unlock_irq() if you are certain about the interrupt state).
Analysis: This tests your understanding of synchronization strategies when data is shared between interrupt context and process context. Softirqs can preempt process context execution. If the process context only uses a plain spin_lock(), it might be interrupted by a local CPU softirq while holding the lock. If the softirq code also tries to acquire the same lock, a deadlock occurs (because the process holding the lock hasn't had a chance to release it before being interrupted). Therefore, you must use the spin_lock_irq* family of functions, which disable local CPU interrupts while acquiring the lock, ensuring the critical section cannot be interrupted by local interrupt handlers.
Exercise 4: Thinking
Question: Suppose a real-time system has three threads: high-priority thread H, medium-priority thread M, and low-priority thread L. Thread L holds a Mutex, thread H is waiting for that lock, and thread M is currently occupying the CPU running (not waiting for the lock). What happens in this situation? How does this differ from using an RT-mutex?
Answer and Analysis
Answer: Priority inversion will occur. Thread H is blocked waiting for L, but L cannot run to release the lock because of its low priority, and is instead preempted by M, which has a medium priority and doesn't need to wait for the lock. This causes H to be effectively blocked by M, behaving as if it had M's low priority. RT-mutex solves this problem through a priority inheritance mechanism: when H waits for the lock held by L, L's priority is temporarily boosted to H's level, allowing it to execute quickly and release the lock, avoiding being preempted by M for a long time.
Analysis: This tests your understanding of the classic concurrency problem "priority inversion" and its solutions. When using a regular Mutex, the scheduler only schedules based on static priority, causing a high-priority task to be indirectly subject to a medium-priority task. The priority inheritance protocol introduced by RT-mutex ensures that a low-priority task holding a lock can obtain sufficient CPU time to complete its work when necessary, thereby eliminating unbounded priority inversion delays.
Key Takeaways
Concurrency issues stem from multi-core physical parallelism and interrupt preemption, causing non-atomic operations on shared data (like "read-modify-write") to result in data races. The core risk lies in instruction sequences within critical sections being interfered with. Therefore, the system must introduce mutual exclusion mechanisms to draw boundaries, ensuring exclusivity when accessing shared writable state to prevent data corruption or system crashes.
The primary tools for handling concurrency are Mutexes and Spinlocks, and the choice between them depends on critical section length and context requirements. Mutexes are suitable for process contexts that allow sleeping and for longer operations, avoiding CPU waste through sleeping; Spinlocks are suitable for extremely short critical sections or contexts where sleeping is not allowed (like atomic contexts), maintaining protection of the critical section through CPU busy-waiting.
Using Mutexes requires strict adherence to rules: no recursive locking, no jumps or returns during lock holding that could lead to lock leaks, and lock holding time should be as short as possible. Additionally, when multiple locks are involved, a globally consistent locking order must be followed to prevent the classic AB-BA deadlock, because the kernel Mutex mechanism itself does not automatically detect or handle such deadlocks.
Using Spinlocks is more demanding than Mutexes, absolutely prohibiting any calls to functions that might cause sleep (like memory allocation or delays) within the critical section, otherwise it triggers a "Scheduling while atomic" error leading to a system crash. On uniprocessor systems, the "busy-wait" logic of spinning is optimized away, but its function of cooperating with interrupt control mechanisms (like spin_lock_irqsave) to prevent deadlocks is still preserved.
When interrupt handlers and process context have data races, you must use interrupt-controlling APIs (recommended spin_lock_irqsave/spin_unlock_irqsave). This not only acquires the lock but also safely masks local CPU hardware interrupts and saves state, preventing interrupt handlers from trying to acquire an already-held lock on the same CPU and causing a deadlock.
Remember the question from the beginning—who is actually competing with whom? The answer is multi-core and interrupts. What weapon do we use to separate them? The answer is: if sleeping is allowed, use a Mutex to yield the CPU; if sleeping is not allowed or the time is extremely short, you have to bite the bullet and stand guard with a Spinlock.