Chapter 7: The Art and Pitfalls of Lock-Free Programming
There is a class of problems that appear to be engineering challenges, but are actually limitations of understanding.
In this chapter, we tackle exactly such a problem: when two execution flows access the same memory simultaneously, what can we do besides bluntly forcing one to stop and wait (locking)?
If your intuitive answer is "nothing, you must use a lock," your intuition was mainstream 20 years ago, but today it is wrong. The performance bottlenecks of modern kernels often lie not in algorithmic complexity, but in "waiting." Every spinlock's idle loop and every mutex's context switch is a direct waste of CPU cycles.
The mission of this chapter is to build a "lock-free" intuition. We will start from the lowest-level atomic integers and work our way up to the kernel's internal lock debugging tools. You will find that many places where you thought a lock was mandatory can actually be solved with lighter-weight mechanisms; and many pieces of locking code you thought were "fine" are actually quietly sowing the seeds of deadlock.
Don't rush to look at those flashy lock-free algorithms—let's return to the most fundamental questions first.
7.1 Using atomic_t and refcount_t Interfaces
A Legacy of the Old Era: Why Not volatile?
If you are new to kernel concurrency, you might wonder: since we don't want to use locks, can we just use C's volatile keyword to modify variables?
The answer is no.
The volatile keyword is everywhere in kernel drivers, but its primary purpose is to tell the compiler "don't optimize this variable, because it might change inexplicably due to hardware or other threads." It does prevent the compiler from caching the variable in a register, but it does not guarantee atomicity, nor does it guarantee memory ordering.
Imagine a scenario: two CPUs simultaneously execute counter++. On x86, this is typically compiled into three instructions:
- Read
counterfrom memory into a register. - Add 1 in the register.
- Write the register back to memory.
If both CPUs read the old value at the same time (say, 5), increment it to 6, and write it back, the result is 6, not the expected 7.
volatile cannot solve this problem. What we need is a hardware-level "Read-Modify-Write" (RMW) instruction that guarantees these three steps are indivisible, acting as a single instruction.
This is exactly why atomic_t exists.
atomic_t: Not Just an Integer
You can think of atomic_t as a "thread-safe counter" in the kernel—but there is one catch in this analogy: a true counter only increments and decrements, whereas atomic_t is internally defined as a structure (on 32-bit systems containing a int counter) to ensure cross-platform compatibility, and it enforces memory alignment to avoid cross-cache-line issues.
Let's look at its true face (usually defined in <asm/atomic.h> or <linux/types.h>):
typedef struct {
int counter;
} atomic_t;
And its 64-bit sibling (for 64-bit systems):
typedef struct {
s64 counter;
} atomic64_t;
Definition and Initialization
Defining and initializing them requires dedicated macros. This isn't just a style choice—it ensures the internal structure is correctly initialized (for example, clearing debug bits on certain architectures):
// 静态定义并初始化为 0
static atomic_t my_ref_cnt = ATOMIC_INIT(0);
// 动态设置
atomic_t v;
atomic_set(&v, 4); // 将 v.counter 设为 4
Basic RMW Operations
The most commonly used operations are atomic addition and subtraction. Note that these functions often return the new value (or the old value on some architectures, depending on the macro implementation, but the semantics are consistent):
atomic_add(1, &v); // v.counter += 1
atomic_sub(1, &v); // v.counter -= 1
// 带返回值的操作(返回的是运算后的新值)
atomic_inc(&v); // v.counter++
atomic_dec(&v); // v.counter--
If you want to get the current value, do not access the .counter field directly (that is a violation). Instead, use:
int val = atomic_read(&v);
atomic_set(&v, 10); // 直接设置
Conditional Operations: Avoiding Race Conditions
This is where atomic_t is most powerful. The following operations are not only atomic but also combine condition checking and modification into one:
// 如果 v.counter 减 1 后为 0,则返回 true,否则返回 false
// 整个过程是原子的,没人能在你减 1 和判断之间插一脚
if (atomic_dec_and_test(&v)) {
// 我们是最后一个引用者,可以安全释放资源了
kfree(obj);
}
// 反之亦然
if (atomic_inc_and_test(&v)) {
// 加 1 后正好是 0(说明原来是 -1,通常表示下溢)
}
// 带减法并测试负值
if (atomic_sub_and_test(2, &v)) {
// 减去 2 后是否为 0
}
// 加上 delta 后是否为负值(常用于信号量实现)
if (atomic_add_negative(1, &v)) {
// 结果 < 0
}
⚠️ Pitfall Warning
Never treat atomic_t as a universal lock.
- It only protects this single variable. If you have
struct { atomic_t a; int b; },atomic_tcan only guarantee that operations onaare atomic. If you needaandbto be modified simultaneously while remaining consistent, you still need a spinlock or mutex. - It cannot replace serialization. If you need to read A first, then modify B based on A, these two operations combined are not atomic—unless A and B are in the same atomic variable (or packed into a larger structure, which would require other mechanisms).
refcount_t: A Safer Counter Than atomic_t
Returning to our "counter" analogy. Although atomic_t solves concurrency conflicts, it has a fatal flaw: it is far too lenient with integer overflow.
If an object is referenced frantically in a multi-core environment, the reference count might wrap around (going from INT_MAX to INT_MIN), causing the object to be freed prematurely while users still hold references—this is the classic Use-After-Free (UAF) vulnerability.
To solve this problem, the kernel introduced refcount_t (defined in <linux/refcount.h>). It is an enhanced version of atomic_t, designed specifically for reference counting.
Definition and Initialization
#include <linux/refcount.h>
static refcount_t my_refcnt = REFCOUNT_INIT(1);
Operation Interfaces
The interfaces are similar to atomic_t, but with clearer naming:
refcount_set(&my_refcnt, 1);
// 增加引用
if (refcount_inc_not_zero(&my_refcnt)) {
// 成功增加,且原值不为 0(意味着对象还没死)
// 拿到了对象的引用
} else {
// 对象正在销毁过程中,不能再用
}
// 减少引用
if (refcount_dec_and_test(&my_refcnt)) {
// 减到 0 了,可以释放了
kfree(obj);
}
⚠️ Key Difference: Saturation and Overflow Protection
The core of refcount_t lies in its overflow handling. When the count reaches REFCOUNT_SATURATED (usually a value close to UINT_MAX), it stops increasing or directly triggers a kernel warning (depending on the CONFIG_REFCOUNT_FULL configuration).
This seems perfect, but there is a trade-off:
Performance.
Operations on refcount_t are slightly slower than atomic_t because it incorporates a series of check logic (especially when CONFIG_REFCOUNT_FULL is enabled). Therefore, if you are not doing reference counting but simply using a statistical flag, atomic_t is sufficient; only when managing object lifecycles must you use refcount_t.
Pitfall Case Study: The Trap of atomic_dec_and_test
If you try to implement reference counting using atomic_t, it's easy to write something like this:
if (atomic_dec_and_test(&obj->refcnt)) {
kfree(obj);
}
This looks fine. But under frantic multi-core concurrency, refcnt might be decremented into negative territory (for example, via duplicate frees). Once it goes negative, it can never return to 0, and the object leaks forever.
This is why the kernel now strongly recommends refcount_t. It detects this unnatural underflow and immediately panics the kernel (or saturates), letting you discover the bug right away, rather than waiting months to be killed by the OOM Killer due to a memory leak.