Skip to main content

Chapter 1: Crossing the Boundary: The First Step from User Space to the Kernel

1.1 Writing a Simple misc Character Device Driver

There is a class of problems that appear to be about programming, but are actually about worldview.

This chapter tackles exactly such a problem. Up to now, the code you've written has run in user space—backed by the OS, shielded by virtual memory, where a crash simply terminates the process without taking down the machine. Device drivers are different: they run in kernel space. There is no "safety net" here; every pointer offset you write maps directly to physical memory, and every mistake you make can vaporize the entire system in an instant.

Sounds terrifying? Yes. But this fear often causes us to overlook another fact: the kernel isn't actually that mysterious.

Put simply, the kernel is just a massive program running at a privileged level with full hardware access. A device driver is merely a "plugin" within this massive program. Its core task is straightforward: establish a channel—allowing ordinary user-space programs that shouldn't touch hardware directly to safely send data to, or receive data from, hardware through the kernel.

Why doesn't the old approach work? In the past (or in very old textbooks), writing a driver often meant manually registering a character device, dealing with hardcoded major device numbers, and even directly modifying nodes under the /dev directory. This approach was not only tedious but also prone to conflicts. If every driver author just picked a random major number, the system would eventually descend into chaos from "number collisions." Now we have a better mechanism—the misc class, which solves this problem.

In this chapter, we will write a minimal "encryption" driver from scratch. It doesn't control any real hardware, but that doesn't matter—what matters is that it will fully demonstrate how the VFS (Virtual File System), kernel space, and device drivers collaborate.

The first thing we need to do is bring a piece of kernel code to life.


Understanding the Basics: How Devices Are Recognized by the Kernel

In the eyes of Linux, the world's devices are roughly divided into two categories:

  1. Character Devices: Like keyboards, mice, sensors, and GPUs. Their defining trait is "streaming"—you can only read or write sequentially and cannot randomly seek (unless the device itself supports it). They typically cannot be mounted as a filesystem.
  2. Block Devices: Like hard drives, Flash storage, and SD cards. They are "block-oriented," storing data in blocks and supporting random access. Because they look like disks, they can be mounted onto the file tree.

The misc driver we are going to write belongs to the Character Device category.

But here's the catch: how does the kernel know which device you are?

In traditional character devices, the kernel identifies a device using a 32-bit integer. This integer is split in two:

  • Major Number (upper 12 bits): This is the "family name." It tells the kernel, "I am managed by this driver."
  • Minor Number (lower 20 bits): This is the "given name." It tells the specific driver, "I am device number X in this family."

Think about the output of ls -l /dev/sda1; you'll see numbers like 8, 1—these are the major and minor numbers.

But managing this number manually is too annoying—you have to request an unused major number and create the node yourself under /dev. So, the Linux kernel provides a handy shortcut: the misc class.

⚠️ We need to retract this analogy three times

You can think of the misc class as a "public parking lot" inside the kernel. All regular cars (devices) park in the same lot (major number 10). The security guard (kernel) doesn't identify cars by their license plates (major numbers), but rather by the parking ticket (minor number) you hold to distinguish which car is yours.

But there's one thing wrong with the "parking lot" analogy: a real parking lot stops letting cars in when it's full, whereas misc devices use minor numbers (numbers between 0 and 255) as unique identifiers. As long as your minor numbers don't conflict, you can theoretically register countless devices—this is more like a phone switchboard with infinite extensions, where everyone dials the same main number (10) and reaches a specific room via the extension (minor number).

……

(When we get to the code registration later)

Now look back at that "phone switchboard": when you call misc_register(), you're essentially telling the switchboard operator, "I want to connect to extension 210." As long as 210 isn't taken, your driver's call goes through. If someone already occupies that extension, misc_register will return -EBUSY, just like hearing a busy signal on a phone call.

The Modern Linux Device Model (LDM)

In older versions of Linux, drivers and devices were loosely coupled. But in the modern kernel, everything is an object—this is the Linux Device Model (LDM).

LDM maintains a massive tree inside the kernel (much like the tree view in Windows Device Manager). To expose this tree to user space, the kernel mounts the structure under the /sys directory—this is sysfs.

In this model, everything is a bus. Even peripherals integrated inside an SoC that aren't plugged into a physical slot are virtually hung on a virtual bus called the platform bus.

  • Bus Driver: The "loader" responsible for scanning goods on the bus. It discovers devices (enumeration) and matches them with their corresponding drivers.
  • probe Method: When the loader says, "Hey, I found a device, can your driver handle it?", this is the callback function the kernel invokes. This is the true initialization site.
  • remove Method: The "farewell message" triggered when a device is unplugged or a driver is unloaded, used to clean up and release memory.

Although our misc driver is simple, it will still use the LDM mechanism to show its face to /sys and /dev.


Establishing the Connection: The Triangular Relationship Between Processes, Drivers, and the Kernel

Before writing any code, we must first address a conceptual question: when a user program calls write(), what actually happens?

You might think: "Isn't it just writing data?"

Actually, there's a subtle twist here—there's a "moat" in between.

  1. User Space Initiates: Your program calls write(fd, "hello", 5). This happens in user space, which is "on the other side of the river."
  2. VFS Intercepts: The kernel's VFS (Virtual File System) layer intercepts this call first. VFS doesn't know what specific hardware you're writing to; it only knows "someone wants to write something to this file descriptor."
  3. Lookup Table: VFS uses the file descriptor to find the corresponding inode, then uses this inode to look up a table. This table is the file_operations structure.
  4. Falling into the Kernel: The file_operations structure stores a bunch of function pointers. If the write pointer has been assigned by your driver, VFS will jump to it and execute your code.

⚠️ Key Structure: file_operations

This structure is the driver's "menu of capabilities." You tell the kernel: "If you want to read from me, call function A; if you want to write to me, call function B." If you don't assign them, the menu is empty, and user programs will receive -EINVAL or -ENOSYS when they try to operate on it.


Let's Write the Code: A Driver with a Secret

Let's verify something first: if this step succeeds, what do you expect to see? Run through it in your head before touching the keyboard. We expect to see "Hello world" in the kernel log and our device file under /dev.

1. Basic Skeleton: Header Files and Initialization Macros

All kernel modules have two fixed entry points: init (on load) and exit (on unload).

We need the module_init and module_exit macros to tell the kernel: "jump here."

2. Implementing the "Ferry": copy_to_user and copy_from_user

This is where beginners most often run aground.

You might think: "Isn't it just copying memory? Why not just use memcpy and be done with it?"

Absolutely not.

Remember? The page tables for kernel space and user space are different. A pointer passed from user space might be "unmapped" or "invalid" in the kernel. Using memcpy directly will cause the kernel to attempt accessing an illegal address, leading to an immediate panic. Or worse, it might happen to be a valid address, but that would be a security vulnerability.

You must use the kernel-provided "ferry" APIs:

  • copy_to_user(): Moves data from kernel space to user space safely.
  • copy_from_user(): Moves data from user space to kernel space safely.

These functions check whether the user-space pointer is writable. If there's a problem with the pointer, they won't crash the kernel; instead, they return the number of bytes that failed to copy (the portion that wasn't copied).

3. Defining Our "Secret": Driver Private Data

Suppose we want to build a simple "encryption" driver. It has a secret string that only someone who knows the password can read. This requires the driver to be able to "remember" things.

⚠️ Warning: Never use global variables! If you define a global variable in your driver to store the secret, two device instances will clash with each other. The correct approach is to allocate a piece of private data for each device.

We define a structure to hold our secret:

/* drivers/misc/secret_example.c (部分) */

#define MY_SECRET_MAX 64

struct secret_device {
char secret[MY_SECRET_MAX]; // 存放秘密字符串的缓冲区
// ...
};

4. Claiming Your Number: Writing the init Code

Now let's implement the module's initialization function. There are a few steps here that must be done in order.

Goal: Register a misc device and allocate our private data.

Why: Leverage the kernel's "resource management" mechanism to ensure that if registration fails, resources are automatically rolled back; if registration succeeds, our structure is bound to the device.

Location: static int __init secret_init(void)

Code / Command Block:

static struct miscdevice secret_misc_device;

static int __init secret_init(void)
{
struct secret_device *my_dev;
int ret;

/* 1. 分配我们的私有数据结构体 */
/* 使用 devm_kzalloc:这是“托管”的内存分配。
* 当设备卸载时,内核会自动帮我们释放这块内存,
* 省去了手动调用 kfree 的麻烦,防止内存泄漏。
*/
my_dev = devm_kzalloc(&secret_misc_device.parent, sizeof(struct secret_device), GFP_KERNEL);
if (!my_dev)
return -ENOMEM; // 内存不足,直接退出

/* 设备初始化:把我们的秘密硬编码进去 */
scnprintf(my_dev->secret, MY_SECRET_MAX, "Linux is the best OS ever!");

/* 2. 填充 miscdevice 结构体 */
secret_misc_device.minor = MISC_DYNAMIC_MINOR; // 动态分配一个次设备号
secret_misc_device.name = "secret"; // 这决定了它在 /dev 下的名字是 /dev/secret
secret_misc_device.fops = &secret_fops; // 把我们的操作函数表挂上去

/* 将私有数据保存到 miscdevice 的父结构中,方便后续回调使用
* (注:这是简化的写法,通常miscdevice本身会被嵌入到更大的结构中)
*/

/* 3. 注册设备 */
ret = misc_register(&secret_misc_device);
if (ret) {
pr_err("Failed to register misc device\n");
return ret;
}

pr_info("Secret driver loaded with major 10, dynamic minor %d\n", secret_misc_device.minor);
return 0;
}

Expected Output: If you compile this code into a module and insmod it, you'll see a successful registration log in dmesg, and a new secret file will appear under the /dev directory.

How to Verify Success:

ls -l /dev/secret
# 应该能看到类似 crw-rw---- 1 root root 10, 58 ... 的输出
# 10 是主号,58 是动态分配的次号

This Is Where You Can Accidentally Escalate Privileges

We've now established the channel. But sometimes, channels leak.

Imagine what would happen if we implemented the write function without checking the length of data passed from user space, and just copied it directly into a kernel buffer.

Why (demonstrating buggy code): To show how this turns into a security vulnerability.

/* 这是一个有 BUG 的 write 实现 */
static ssize_t secret_write(struct file *file, const char __user *buf,
size_t len, loff_t *ppos)
{
struct secret_device *dev = PDE_DATA(file_inode(file));

/* 灾难在这里:我们没有检查 len 是否超过了 MY_SECRET_MAX */
if (copy_from_user(dev->secret, buf, len)) {
return -EFAULT;
}
return len;
}

If you pass 1000 bytes of data, but dev->secret is only 64 bytes. copy_from_user will ruthlessly overwrite the memory beyond the secret array. What's behind it? It could be other critical kernel data structures, or a function's return address.

This is the legendary "buffer overflow."

On certain kernel versions or architectures, if you overwrite the return address on the kernel stack and point it to a carefully crafted piece of shellcode, you've achieved privilege escalation—going from an ordinary user to Root.

If you think this design is terrifying, your intuition is spot on. That's why in real driver development, boundary checking is a matter of life and death.


Digging Deeper: Beyond Overflow, There's "Reading It"

What if we have a buggy read function?

The kernel has a tool called KASAN (Kernel Address SANitizer) that is specifically designed to catch this kind of misbehavior.

Suppose we accidentally leak "uninitialized kernel memory" to user space in read:

static ssize_t secret_read(struct file *file, char __user *buf,
size_t len, loff_t *ppos)
{
struct secret_device *dev = PDE_DATA(file_inode(file));

/* 假设这里有个逻辑错误,我们把 dev 结构体后面的内存也读出去了 */
/* 或者我们根本没有清空分配给 dev 的内存 */

if (copy_to_user(buf, dev->secret, len))
return -EFAULT;

return len;
}

If you have the KASAN kernel option enabled, the system will immediately panic and print a large block of red error logs, telling you "you accessed out-of-bounds memory."

Don't think this is making a mountain out of a molehill. Leaking kernel pointers (KASLR leak) is the first step for an attacker to bypass kernel protections. Even reading just 1 extra byte could become the crack that brings down the entire system.


By this point, the mechanism should be clear—or so you think. What we've built is not just a readable and writable file, but a safe zone demarcated in the dangerous wilderness of the kernel.

Remember that question from the beginning—why can't driver development be as casual as writing Python? Now you should be able to answer: because any careless memory access can become a backdoor to Root. Every line of copy_from_user and boundary check we write is essentially reinforcing that door.

What this chapter is truly doing is establishing the underlying form of the concept of "user-kernel space interaction." On the surface, we're configuring a misc device, but in reality, we're understanding why copy_to_user must exist, and why struct cred (process credentials) is the last line of defense for kernel security.

In the next chapter, we'll push this mechanism into a new scenario—where you'll discover that the intuition built today will come in handy in an unexpected way: when a real hardware interrupt occurs, this simple read-write model will become more complex, and more fascinating.