8.3 Catching Concurrency Bugs with KCSAN
Since relying on the human brain to wrestle with the sheer complexity of LKMM is becoming impractical, we had better find a tool to help. Entering the stage for this section is the "security scanner" of kernel concurrency: KCSAN.
What is KCSAN
KCSAN (Kernel Concurrency Sanitizer) is a runtime framework for dynamically detecting data races. It was merged into the mainline kernel in version 5.8 in August 2020. Currently, its primary battleground is the x86_64 architecture, with ARM64 support arriving much later (stabilized only in kernel 5.17 in March 2022).
If you haven't read the definition of "data race" in the previous section, we strongly recommend going back and reviewing it now. Otherwise, the following discussion will lack a proper foundation.
How KCSAN Works (The Extremely Simplified Version)
KCSAN has only one core mission: discover and report data races.
To achieve this, it makes a default assumption: all aligned writes no larger than the processor word size are atomic (regardless of whether you marked them with WRITE_ONCE). Under this assumption, KCSAN only needs to watch for one specific pattern: an unmarked plain read racing against any write to the same address.
Sounds a bit loose? Yes. If those write operations are also plain C writes (no locks, no markers), this is absolutely a bug under the strict LKMM definition, but KCSAN in its default configuration might let it slide. This is a compromise made to reduce the false positive rate (we will dive into this later in the configuration section).
Syzbot: The Tireless Robot
KCSAN is essentially an automated robot. Paired with Syzbot (the syzkaller robot), it can continuously scan the mainline kernel.
Syzbot specializes in fuzzing; it feeds the kernel all sorts of bizarre system call sequences, trying to force hidden bugs out into the open. KCSAN, meanwhile, stands guard in the background. Once Syzbot's operations trigger a concurrent access conflict, KCSAN logs it.
This work has been running since October 2019 and is still going strong today. You can see its track record here:
https://syzkaller.appspot.com/upstream?manager=ci2-upstream-kcsan-gce
The Key Technology Here: Soft Watchpoints
The most frustrating part of concurrency bugs is their Heisenbug nature—when you try to debug them, they disappear. To catch these bugs that depend on subtle timing, KCSAN must artificially introduce a bit of "perturbation" in the code execution path.
It works through compiler instrumentation and a soft watchpoint mechanism:
- Setting the point: KCSAN sets a soft watchpoint on a memory address.
- Intentional delay: When code accesses this address, KCSAN intentionally pauses it for a brief moment (this time is configurable; it defaults to 80 microseconds in task context and 20 microseconds in interrupt context).
- Closing the net: During this delay, if another thread or interrupt also accesses this address, both watchpoints trigger simultaneously.
KCSAN checks the nature of these two accesses. If they meet the conditions for a data race (e.g., one is an unmarked read and the other is a write), it reports immediately.
It tells you what happened, whether the data changed (old value vs. new value), and most importantly, it provides call stack traces for both sides, letting you see exactly how these two ghosts collided.
Only unmarked accesses take the bait If your access is marked with the
READ_ONCE(),WRITE_ONCE(), oratomic_*macros, KCSAN will not set a watchpoint there. It assumes that since you marked it, you know what you are doing.
Configuring Your Kernel to Enable KCSAN
Talk is cheap. To get KCSAN up and running, you need to reconfigure and recompile the kernel.
Prerequisites Checklist
This isn't something you can just toggle on; it has several hard requirements:
- Architecture: Currently primarily supports x86_64. ARM64 support is newer (kernel 5.17+).
- Kernel version: x86_64 requires at least 5.8 (August 2020); ARM64 requires at least 5.17.
- Compiler: GCC or Clang must be version 11 or higher. The kernel config option
CONFIG_HAVE_KCSAN_COMPILERis responsible for checking this. - Debug switch: You must enable
CONFIG_DEBUG_KERNEL=y. Note that this merely makes the debug menu visible; it does not automatically select specific tools for you. - Mutual exclusivity: KASAN and KCSAN are fundamentally incompatible. You can only choose one; they cannot be enabled simultaneously. The reason is simple: both perform massive amounts of instrumentation, and combining them will blow up.
- KCOV conflict: If compiling with Clang, KCSAN and KCOV (code coverage tool) also cannot be enabled at the same time.
When you select CONFIG_KCSAN=y, it automatically selects CONFIG_STACKTRACE, because reporting bugs requires printing detailed call stacks.
Enabling KCSAN
In make menuconfig, you can find it via the following path:
Kernel hacking -> Generic Kernel Debugging Instruments -> KCSAN: dynamic data race detector
If you don't see it in the menu at all, don't curse just yet—check whether all the dependency conditions above are met. The easiest way is to set this up in a fairly recent x86_64 Ubuntu virtual machine (e.g., 21.10).
After entering the KCSAN submenu, you will see a bunch of parameters. The defaults are generally conservative and suitable for daily use. Table 8.1 lists some key parameters (for the sake of brevity, we won't paste a screenshot here, but will describe the core logic in text):
CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC
- Default:
y - Effect: Assumes aligned plain writes are atomic. This is the source of much confusion for beginners, and we will specifically address it when we step into the pitfalls later.
CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY
- Default:
y - Effect: Only reports when the race actually causes the data to change. This filters out some harmless, theoretical races.
CONFIG_KCSAN_SKIP_WATCH
- Default:
4000 - Effect: This is the ultimate knob affecting performance. It means that out of every 4000 memory accesses, KCSAN will only sample one. The smaller you set this, the more accurately it catches bugs, but the more severe the system stuttering becomes.
Hands-on: Catching a Simple Data Race
Enough theory. Let's get our hands dirty.
We wrote a simple kernel module ch8/kcsan_datarace to demonstrate this mechanism. The code initializes two workqueues, running do_the_work1 and do_the_work2 respectively. If we enable race_2plain_w=y via a module parameter, these two functions will frantically read and write the same global variable gctx->data, with absolutely no lock protection.
This is textbook data race code:
// ch8/kcsan_datarace.c
static void do_the_work1(struct work_struct *work1)
{
int i; u64 bogus = 32000;
PRINT_CTX();
if (race_2plain_w) {
pr_info("data race: 2 plain writes:\n");
for (i=0; i<iter1; i++)
gctx->data = (u64)bogus + i;
/* 无保护的普通写操作 */
}
}
static void do_the_work2(struct work_struct *work2)
{
int i; u64 bogus = 98000;
PRINT_CTX();
if (race_2plain_w) {
pr_info("data race: 2 plain writes:\n");
for (i=0; i<iter2; i++)
gctx->data = (u64)gctx->y + i;
/* 无保护的普通写操作 */
}
}
The First Pitfall: Why Isn't It Reporting?
You insert the module and stare at the logs—KCSAN makes not a peep.
This is counterintuitive. Two threads are frantically writing to the same address without locks, and it doesn't report? Is KCSAN just slacking off?
No. Remember that CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC config option from earlier? It defaults to y.
Under this default assumption, KCSAN considers aligned plain writes to be atomic. Since they are "atomic," a write-write collision, while not conforming to the strict LKMM definition, is not considered a mandatory error in KCSAN's "relaxed mode" (especially if CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY is also on and the data doesn't look corrupted).
Fixing the Configuration to Reproduce the Bug
Now, let's play the strict engineer. Go back to the kernel source directory, run make menuconfig, find KCSAN_ASSUME_PLAIN_WRITES_ATOMIC, and turn it off (set to n).
This means: KCSAN, stop making assumptions. Any unmarked concurrent write that collides is a bug.
Recompile the kernel and reboot.
Insert our test module again. This time, the console immediately explodes with a report (Figure 8.5 is a screenshot of it):
BUG: KCSAN: data-race in do_the_work1 / do_the_work2
write to 0xffff9fc3cc9e3238 of 8 bytes by task kworker/0:1 on cpu 0:
do_the_work1+0x...
process_one_work+0x...
...
write to 0xffff9fc3cc9e3238 of 8 bytes by task kworker/1:2 on cpu 1:
do_the_work2+0x...
process_one_work+0x...
...
Reported by Kernel Concurrency Sanitizer on:
...
How to Read This Report
KCSAN's report is actually quite structured. Let's break it down:
-
First line (red header):
BUG: KCSAN: data-race in func_x / func_yThis tells you directly:func_xandfunc_yare clashing. In our example, these aredo_the_work1anddo_the_work2, but it will often show the underlying wrapper functions (likeprocess_one_work, because workqueues dispatch tasks through this function). -
Access info:
write to 0xffff... of 8 bytes by task <PID> on cpu <ID>This line tells you:- Whether it's a read or a write (here it is
write). - Whether it's marked (if it were
read/write (marked), it would mean something likeREAD_ONCEwas used; here it is unmarked). - The kernel virtual address (confirm if it's your global variable).
- Who did it (which task, which CPU).
- Whether it's a read or a write (here it is
-
Call stack: Immediately following the access info is the complete stack trace. This is what you need the most; follow the stack upwards to locate the exact line of code.
The Runtime Statistics Game
Since KCSAN is based on statistical sampling, catching a bug involves a bit of luck.
To verify this, the original author wrote a simple Shell script tester.sh that tests how many runs are roughly needed to trigger a report by varying the loop count.
The conclusion is intuitive: the more loops and the denser the accesses, the higher the probability of triggering a KCSAN watchpoint. However, due to the CONFIG_KCSAN_REPORT_ONCE_IN_MS limit (default 3000ms, meaning it only reports once every 3 seconds), you might find that even after many runs, only the first occurrence is logged.
Additionally, the kernel comes with a more professional test module CONFIG_KCSAN_TEST. If enabled, it compiles a kcsan-test.ko. This module uses the KUnit and Torture frameworks to intentionally create all sorts of tricky concurrent scenarios to stress the system. A single run takes about 7 minutes; if you run it without a panic, it means your KCSAN environment is basically stable.
Runtime Control: The debugfs Interface
Besides compile-time configuration, you can also interact with KCSAN at runtime via debugfs (root privileges required, of course):
File path: /sys/kernel/debug/kcsan
- echo "on" > .../kcsan: Enable KCSAN (it is on by default).
- echo "off" > .../kcsan: Disable KCSAN. If you feel the current performance overhead is too high, you can temporarily turn it off.
- cat .../kcsan: View current statistics (e.g., how many accesses have been checked, how many races have been caught).
The Right Attitude Toward Reports: Don't Rush to Judgment
This is the most important piece of advice.
When you see KCSAN report a data race, do not reflexively slap a READ_ONCE() or WRITE_ONCE() on it, thinking "as long as the warning goes away, we're good."
Why Should You Hold Back?
This approach masks the problem rather than solving it.
KCSAN's design philosophy is: accesses to shared variables should not race in the first place. If you access shared data without a lock, that is a logic bug. If you simply mark the access as READ_ONCE, you are just telling KCSAN "shut up, I know this is a race," but the risk of data inconsistency remains. The correct approach should be:
- Add lock protection.
- Or use atomic operations.
- Or use lock-free techniques.
When Can You Use data_race()?
Of course, there are exceptions to everything. If you are indeed writing statistics code, diagnostics code, or some logic that intentionally sacrifices consistency for performance (e.g., reading an unprotected statistics counter where it doesn't matter if you get 100 or 101), then you can use the data_race() macro.
It explicitly tells KCSAN (and anyone reading the code): I know there is a race here, but it is benign, so leave it alone.
// 示例:内核 fork.c 中的代码
/* 这是一个防 fork 炸弹的检查,稍微不准一点(竞态)也没关系 */
if (data_race(nr_threads >= max_threads))
goto bad_fork_cleanup_count;
Additionally, if you feel a specific function is too hot and the instrumentation is causing a performance drop, you can mark the entire function with the __no_kcsan attribute, though this is usually done to isolate known false positives or third-party code.
KCSAN's Unique Skill: Advisory Lock Detection
Traditional lock debugging tools (like Lockdep) mainly detect issues like deadlocks and lock dependency errors. But they have a blind spot: Advisory Locks.
If you hold a lock but forget to use it to protect access to a shared variable, the access will still succeed (because the kernel doesn't enforce physical memory isolation), but it will cause a data race. Lockdep is powerless here because it only checks if your lock acquisition order is correct.
KCSAN is extremely powerful in this regard. Through the ASSERT_EXCLUSIVE*() macro, it can verify whether a piece of code is truly holding a lock exclusively when accessing memory. If a concurrent write occurs without holding a lock, KCSAN will still catch you. This is why we say KCSAN can discover deep-seated concurrency issues through compiler instrumentation without relying on lock semantics.
Chapter Summary: From Tools to Practice
At this point, our understanding of KCSAN should be quite comprehensive. It is not just an error-reporting tool, but a mentor that helps us understand kernel concurrency semantics.
Now that we have mastered how to discover concurrency bugs, the next section will move into real-world practice. We will analyze several real kernel bugs caused by lock deficiencies. We will see how these lurking crises were introduced in the code of core developers, and how they were ultimately fixed.