5.6 Catching Undefined Behavior with the UBSAN Kernel Checker

In the previous section, we discussed how KASAN is the "heavy artillery" of the memory error world, but it's clearly not a silver bullet. Some incredibly sneaky bugs—like the ones we saw earlier—can perfectly evade KASAN's radar. So, what other weapons do we have at our disposal?

If you've written C, you've surely experienced this: you write buggy code with logical flaws, but it somehow—somehow—runs. Sometimes it even runs smoothly, until one late night at 2 AM, you add a completely innocuous line of code in production, and the whole system blows up.

This is the most chilling aspect of C: undefined behavior.

What is Undefined Behavior?

There is a no-man's-land in the C standard. Compilers typically only handle "correct" cases. When your code crosses the line—such as array overflows, signed integer overflows, or division by zero—the compiler often chooses to "look the other way." It assumes such things won't happen and generates highly optimized machine code based on that assumption. This确实 does improve performance, but the cost is burying safety in the sand.

Worse still, these bugs are highly insidious. You might find that a piece of code runs fine with optimizations disabled but crashes when optimizations are enabled. Or, as we saw in the "Stale frames" section, accessing stale stack frame data might actually succeed, and the data read out might even look correct. All of this is unpredictable, which is why we call it undefined behavior.

Introducing UBSAN

To combat these phantoms, the kernel community introduced Undefined Behavior Sanitizer (UBSAN).

Like KASAN, UBSAN works by using compile-time instrumentation. When you fully enable UBSAN, kernel code is compiled with the -fsanitize=undefined option. This means the compiler inserts check code at critical locations—once a violation is triggered, the system reports it immediately.

UBSAN primarily covers two major categories of issues:

Arithmetic-related UB:
- Integer overflow/underflow
- Division by zero
- Out-of-bounds shift operations
Memory-related UB:
- Out-of-bounds access to static arrays
- Null pointer dereference
- Misaligned memory access
- Object size mismatch

You might notice that some of this functionality overlaps with KASAN. That's true. UBSAN increases kernel code size and slows down execution speed (typically by 2x to 3x), but it can catch many things that KASAN misses. Especially during development and unit testing, it is indispensable. In fact, as long as you can tolerate a slightly larger kernel and a slightly heavier CPU load (which is no problem at all for modern servers, except for extremely small embedded systems), enabling UBSAN in production is also feasible.

Configuring the Kernel to Enable UBSAN

Let's open make menuconfig and find the UBSAN menu location via the following path:

Kernel hacking
  -> Generic Kernel Debugging Instruments
      -> Undefined behaviour sanity checker

The interface looks roughly like Figure 5.10 (image omitted here, but you can picture that familiar blue configuration window).

To follow along with me, you need to ensure the following core configuration options are enabled:

CONFIG_UBSAN: The main switch.
CONFIG_UBSAN_BOUNDS: This is very important; it handles bounds checking for static array indices.
CONFIG_UBSAN_MISC: Catches other miscellaneous UB.
CONFIG_UBSAN_SANITIZE_ALL: Enables full kernel scanning.
CONFIG_TEST_UBSAN=m: Compiles the test code as a module (in lib/test_ubsan.c) for easy verification.

You can check lib/Kconfig.ubsan for the specific meanings of these options, but I suggest you don't get bogged down in details right now—just enable the ones above and move on.

UBSAN's Impact on the Compilation Process

After you build CONFIG_UBSAN=y into the kernel and run the compilation command with the V=1 parameter, you can see exactly what the compiler is doing. You'll notice a long string of -fsanitize=... options added to GCC's command line.

For example, like this:

make V=1
[...]
gcc -Wp,-MMD,[...] -fsanitize=bounds \
-fsanitize=shift -fsanitize=integer-divide-by-zero \
-fsanitize=unreachable -fsanitize=signed-integer-overflow \
-fsanitize=object-size -fsanitize=bool -fsanitize=enum [...]

This is the moment instrumentation happens. The compiler is weaving a safety net into your code.

Hunting UB with UBSAN

UBSAN's strongest trick is detecting out-of-bounds access to static arrays.

Let's look at test case #4.4. We define a few global arrays in the code:

static char global_arr1[10], global_arr2[10], global_arr3[10];

Why define three instead of one?

There's an obscure pitfall here. At the time of writing this book (at least with GCC 9.3), the compiler had a bug when setting up "red zones" for global data.

The left red zone of the first global variable in a module might be set up incorrectly. This leads to a very bizarre phenomenon: if you perform a "left out-of-bounds" (underflow) access on this variable, the detection tools might miss it! To work around this bug, we defined three arrays, and in the test code, we specifically pass the pointer to the second one (global_arr2). This way, KASAN and UBSAN can work properly.

(By the way, the order of global variables in a module depends on the linker, which is not under your control.)

It's worth mentioning that this issue does not exist on Clang 11+.

I personally ran into this bug. Later, I reported it along with the issue that the kernel's test_kasan module didn't cover this test case. KCSAN maintainer Marco Elver quickly followed up and added the test case (November 17, 2021). Meanwhile, our technical reviewer Chi-Thanh Hoang also discovered that this was essentially due to GCC missing the left red zone, and added this information to the kernel Bugzilla. Hopefully, the GCC community will fix this stubborn issue soon.

Alright, back to the code. Let's look at one of the test cases—right out-of-bounds access to global memory. Our test code performs read and write operations on global_arr2, and of course, we intentionally write it incorrectly:

int global_mem_oob_right(int mode, char *p)
{
    volatile char w, x, y, z;
    volatile char local_arr[20];
    char *volatile ptr = p + ARRSZ + 3; // OOB right
    [...]
    } else if (mode == WRITE) {
        *(volatile char *)ptr = 'x';  // 无效：右越界写入
        p[ARRSZ - 3] = 'w'; // 有效：在范围内
        p[ARRSZ + 3] = 'x'; // 无效：右越界写入
        local_arr[ARRAY_SIZE(local_arr) - 5] = 'y'; // 有效
        local_arr[ARRAY_SIZE(local_arr) + 5] = 'z'; // 无效：右越界写入
    } [...]

It's invoked through the debugfs interface:

[...] else if (!strncmp(udata, "4.4", 4))
        global_mem_oob_left(WRITE, global_arr2);

When this code runs and triggers the illegal access, UBSAN spits out an error report like this in the kernel log:

array-index-out-of-bounds in <C-source-pathname.c>:<line#>
index <index> is out of range for type '<var-type> [<size>]'

Looking at Figure 5.11 (screenshot omitted), the window on the right shows the kernel log. Ignore the large block of KASAN output above; let's focus on the UBSAN section below. It precisely points out the problem on line 194—attempting to write data outside the valid range of a local array!

By the way, as the code changes, the line numbers you see might differ, which is perfectly normal.

Immediately after, test case #4.3 recklessly attempts an underflow read on a local stack variable. The result? UBSAN catches it beautifully once again!

Looking at the output in Figure 5.12, UBSAN once again slaps the source filename and line number right in your face.

By now, you've probably noticed: UBSAN is extremely good at handling errors with static array indices—whether you go out of bounds on the left or the right.

However, it has a clear blind spot: pure pointer arithmetic. If you cause trouble entirely through pointer arithmetic, UBSAN might turn a blind eye. And this is exactly where KASAN excels—KASAN treats all pointer-based illegal access equally.

Furthermore, just like KASAN, UBSAN cannot catch all memory defects.

To prove this point, we run that troublesome kernel module ch5/kmembugs_test again. The result is rather disheartening: even with UBSAN enabled, those three classic problems—Uninitialized Memory Read (UMR), Use-After-Free (UAR), and Memory Leak—still go uncaught!

The screenshot in Figure 5.13 records this "disaster." The log is completely clean, as if nothing happened at all.

What does this tell us? It tells us that the tools are complementary. You need KASAN, you need UBSAN, and you need the other tools we'll discuss in the next chapter.

Also, don't forget that UBSAN is also excellent at catching arithmetic UB—integer overflows, underflows, and division by zero are all major hotspots for security vulnerabilities. Since the theme of this chapter is memory defects, we won't dive deep into arithmetic issues here. If you're curious, you can explore lib/test_ubsan.c in the kernel source code. I highly recommend giving it a run.

Experiment Results Summary

Alright, it's time to tally up the results. Table 5.4 summarizes the results of running various test cases with UBSAN enabled.

(Overview of Table 5.4 content: Shows the detection status of different test cases under UBSAN, including line number references, compiler version GCC 9.3.0, and kernel configuration 5.10.60-prod01.)

A few details in the table are worth mentioning:

[1] Test case numbers: Please refer to the ch5/kmembugs_test/kmembugs_test.c source code, as well as the debugfs_kmembugs.c, load_testmod, and run_tests scripts in the same directory.
[2] Compiler environment: Ubuntu Linux on x86_64, GCC 9.3.0. (We have a dedicated section for the Clang 13 part later on).
[3] Test kernel: Custom production kernel 5.10.60-prod01, with CONFIG_UBSAN=y and CONFIG_UBSAN_SANITIZE_ALL=y enabled.
Test cases 4.1 through 4.4 are effective on both global static memory (compile-time allocated) and stack local memory. That's why their numbers appear repeatedly.

Detailed Table Notes (Must Read)

Here are detailed explanations for the [U1] and [U2] markers in the table—there are quite a few pitfalls and edge cases hidden within:

[U1] UBSAN catches and reports out-of-bounds access on global static memory: The output format is as follows:
```
array-index-out-of-bounds in <path>:<line>
index <idx> is out of range for type 'char [10]'
```
[U2] Object size mismatch: In certain situations, UBSAN will also incidentally report an object-size-mismatch error:
```
object-size-mismatch in <path>:<line>
store to address <addr> with insufficient space for an object of type 'char [10]'
```
In the above scenario, UBSAN will list the violation details in full, including the process context and the kernel stack trace.

Here's an interesting twist: If you turn KASAN off (I specifically recompiled a CONFIG_KASAN=n kernel for this) and only enable UBSAN, things will be a bit different. Under this configuration, you might receive a segmentation fault directly, although the kernel log will still clearly point to the source of the bug (by checking where the instruction pointer register RIP points).

Note: Don't forget to check Table 6.4 in the next chapter. It's a comprehensive comparison table that puts the detection results of all tools together—well worth a look.

Now, you're equipped with both KASAN and UBSAN, so you have a lot more firepower than before. But I suggest you pause here to digest this information, read the detailed notes in the "Catching memory defects in the kernel – comparisons and notes (Part 1)" section that follows, and try running these test cases on your own machine.

However, there's still one unresolved mystery: we mentioned earlier that some out-of-bounds defects can only be caught when compiled with Clang 11 or later. This is a key point.

So, next, let's step into the world of Clang and see how to use this modern compiler to build our kernel and modules.

What is Undefined Behavior?​

Introducing UBSAN​

Configuring the Kernel to Enable UBSAN​

UBSAN's Impact on the Compilation Process​

Hunting UB with UBSAN​

UBSAN's Blind Spots​

Experiment Results Summary​

Detailed Table Notes (Must Read)​