1.2 Technical Preparation and Environment
Before we actually start writing code, we need to ensure our environment is ready and mentally prepare for the "beast" we are about to face—the Linux kernel driver model.
This chapter assumes you have read the preface and installed all necessary packages in an Ubuntu 18.04 LTS (or newer) virtual machine. If you haven't done this yet, we strongly recommend completing that step first.
For the best experience, be sure to clone this book's GitHub repository (Linux-Kernel-Programming-Part-2) and be ready to get your hands dirty.
Imagine standing at the gates of the kernel. You hold the key (the source code) in your hand, but without a map (documentation) and the right shoes (development environment), you'll get lost or twist your ankle after just a few steps. Our task in this section is to get our gear in order and then try to pry the door open just enough to peek inside.
1.3 Writing a Simple misc Character Device Driver
This section will be your first hands-on exercise. We will first lay down some background knowledge—about device files, device numbers, and the kernel model. Then, by writing a character driver skeleton named misc, you will see with your own eyes how the kernel "magically" translates user requests into function calls within the driver.
Understanding Device Basics
In the Unix/Linux philosophy, everything is a file. You've probably heard this enough times to make your ears ring, but what does it actually mean in driver development?
A Device Driver is the bridge connecting the OS and hardware. It can be compiled directly into the kernel image or dynamically loaded as a Loadable Kernel Module (LKM). Regardless of the form, it runs in kernel space with the highest privilege (Ring 0).
For user-space programs to talk to hardware, they must pass through this gate. To maintain the "everything is a file" design, the kernel abstracts devices into a special type of file—a device file or device node. They typically reside in the /dev directory.
To distinguish among thousands of devices, the kernel issues two "ID cards" to each one:
- Type: Whether it is a character device or a block device.
- Device Number: A 32-bit number divided into a Major Number and a Minor Number.
You can imagine this as a massive tree map:
- The roots are the device types.
- The branches are the major numbers (representing device categories, like SCSI hard drives, keyboards, or GPUs).
- The leaves are the minor numbers (representing specific instances, like the third partition on the second hard drive).
Character Devices vs. Block Devices
The difference between the two often confuses people, but the core distinction comes down to one thing: whether it can be mounted.
- Block Device: Supports random access and can be mounted to a filesystem. Typically storage devices (hard drives, USB flash drives).
- Character Device: Cannot be mounted; data flows in and out sequentially like a stream. Aside from storage and network devices, most devices you encounter are character devices.
Analogy: Water Pipe vs. Storage Boxes You can think of a character device as a water pipe. Water (data) can only flow in one end and out the other. You can't jump to the middle to scoop out a cupful, nor do you need to "mount" the pipe to a wall to use it. A block device, on the other hand, is like a row of storage boxes. You can freely open the 5th box to grab something, and you can mount this row of boxes on the wall (mount a filesystem) to manage them uniformly.
Back to our map. Starting from Linux 2.6, device numbers were packed into a 32-bit dev_t type:
- Upper 12 bits: Major number (up to 4096).
- Lower 20 bits: Minor number (up to 1 million per major number).
This means: theoretically, both the character device tree and the block device tree can each hold 4096 major categories, with 1 million specific devices under each category. This has been sufficient for a very long time.
What About the misc Class?
A problem had long plagued kernel developers: major number resources were running out.
To solve this, the kernel decided to consolidate a bunch of "miscellaneous" devices—mice, sensors, touchpads—into a special miscellaneous class, known as the misc class.
- Type: Character device.
- Major Number: Fixed at 10.
- Minor Number: Within this class, the minor number becomes a "secondary major number" used to distinguish specific misc devices.
This is why we choose to start with a misc driver: we don't need to apply for a dedicated major number. The kernel allocates it automatically, saving us a lot of administrative overhead.
A Quick Look at the Linux Device Model (LDM)
Before diving into the code, we need to look up at the macro architecture. The modern Linux kernel (2.6+) features a unified Linux Device Model (LDM).
This is a rather "brilliant" design. The LDM maintains a massive tree inside the kernel, linking all buses, devices, and drivers in the system together. This tree is exposed to user space via sysfs (mounted at /sys).
Analogy: A Tree-Structured Family Business You can think of the LDM as an organizational chart of a family business (
/sys).
- Buses are the various departments.
- Devices are the employees within those departments.
- Drivers are the specific job descriptions for the employees. When a new employee (device) joins the company, the department manager (bus driver) finds a matching position based on the job posting (driver) and assigns the employee to it. This process is called Probe.
Core Principle: Every device must be attached to a bus.
- USB devices are attached to the USB bus.
- PCI devices are attached to the PCI bus.
- What about peripherals integrated into an SoC that don't have a physical bus? The kernel invented a virtual bus—the Platform Bus.
When a driver registers with a bus, if it finds a matching device, the kernel calls the driver's probe() method (to initialize resources); conversely, when the device is removed or the module is unloaded, it calls the remove() method (to clean up resources).
Back to our misc driver:
To keep things simple, the misc driver we write does not need to explicitly register with any bus, nor does it need to implement probe/remove. It registers directly with the misc framework, much like a sole proprietorship that doesn't need to be affiliated with a large department.
Writing the misc Driver Code — Part 1: The Skeleton
Alright, we've looked at the map; now it's time to pave the road. We are going to write the simplest possible skeleton driver.
In the driver's initialization code, we need to register our device with the kernel. For misc devices, the API we use is misc_register(). It takes only one parameter: a pointer to a miscdevice structure, where we describe the various attributes of the device.
// ch1/miscdrv/miscdrv.c
#define pr_fmt(fmt) "%s:%s(): " fmt, KBUILD_MODNAME, __func__
#include <linux/miscdevice.h>
#include <linux/fs.h> /* fops, file 数据结构 */
static struct miscdevice llkd_miscdev = {
.minor = MISC_DYNAMIC_MINOR, /* 让内核动态分配一个空闲的次设备号 */
.name = "llkd_miscdrv", /* 设备名,注册后内核会自动创建 /dev/llkd_miscdrv */
.mode = 0666, /* 设备节点权限:所有用户可读写 */
.fops = &llkd_misc_fops, /* 指向驱动功能实现的钩子 */
};
static int __init miscdrv_init(void)
{
int ret;
struct device *dev;
ret = misc_register(&llkd_miscdev);
if (ret != 0) {
pr_notice("misc device registration failed, aborting\n");
return ret;
}
/* 获取设备指针,用于日志输出 */
dev = llkd_miscdev.this_device;
pr_info("LLKD misc driver (major # 10) registered, minor# = %d, "
"dev node is /dev/%s\n", llkd_miscdev.minor, llkd_miscdev.name);
dev_info(dev, "sample dev_info(): minor# = %d\n", llkd_miscdev.minor);
return 0; /* success */
}
Code Breakdown
MISC_DYNAMIC_MINOR: This is a macro. We are telling the kernel, "Pick an unused minor number for me." Upon successful registration, the kernel fills the allocated number back intollkd_miscdev.minor..name: The name is important. The misc framework uses this name to automatically create a device node with the same name under/dev. This saves us the trouble of manually running themknodcommand..mode:0666means anyone can read and write to this device. This is a big no-no in production, but it saves a lot of permission-related headaches during the debugging phase..fops: This is the most critical piece. It connects the device node to specific C functions. We will dive into this in the next section.
After compiling and inserting this module, you should be able to see llkd_miscdrv under /dev, and the kernel log will display the minor number assigned to it (e.g., 56).
Understanding the Connection Between Processes, Drivers, and the Kernel
Now the driver's "body" (the structure) is registered, but where is its "soul"?
In a Unix/Linux system, when a user-space process issues a system call (like read or write) on a file (including device files), the kernel's VFS (Virtual File System) layer intercepts the call.
How does the VFS know which driver function to call?
The answer lies in the file_operations structure pointed to by .fops.
You can think of file_operations as a function pointer table (or a pure virtual function interface in C++). Each entry in the table corresponds to a system call: open, read, write, llseek, mmap, and so on.
When we execute:
int fd = open("/dev/llkd_miscdrv", O_RDWR);
read(fd, buf, 100);
The following flow happens inside the kernel (pseudocode):
/* 内核 VFS 层逻辑 */
struct file *filp = ...; // 代表打开的文件对象
if (filp->f_op->read)
filp->f_op->read(...); // 调用驱动注册的 read 函数
This is the essence of the connection: at registration time, we fill the VFS slots with our driver's function addresses; at call time, the VFS simply jumps to execute them.
Handling Unsupported Methods
If a driver doesn't support a certain operation (like lseek), we can just leave it unimplemented, leaving the corresponding pointer as NULL. In this case, the VFS will return a default error (usually EINVAL).
⚠️ Warning: There's a gotcha
For llseek, if you don't set it, it might return a random value, causing user space to mistakenly think the operation succeeded.
The correct approach is:
- Explicitly assign
.llseektono_llseek. - Call
nonseekable_open()in youropenmethod.
This way, when user space calls lseek, it will receive a clear -ESPIPE (Illegal seek) error.
Writing the misc Driver Code — Part 2: Functionality Implementation
With the above understanding, we can now look at the specific function implementations.
First, we define the file_operations structure instance:
static const struct file_operations llkd_misc_fops = {
.open = open_miscdrv,
.read = read_miscdrv,
.write = write_miscdrv,
.release = close_miscdrv,
.llseek = no_llseek, /* 明确声明不支持 seek */
};
Then, we implement the open method. In this method, we can perform permission checks, initialize resources, or—like we're doing here—just print some debug information.
static int open_miscdrv(struct inode *inode, struct file *filp)
{
char *buf = kzalloc(PATH_MAX, GFP_KERNEL);
if (unlikely(!buf))
return -ENOMEM;
PRINT_CTX(); // 打印当前进程上下文信息
pr_info(" opening \"%s\" now; wrt open file: f_flags = 0x%x\n",
file_path(filp, buf, PATH_MAX), filp->f_flags);
kfree(buf);
return nonseekable_open(inode, filp);
}
Here we use the file_path() API to get the pathname of the device node. Note that it requires a kernel buffer (buf); remember to kfree it when done.
Next is the read method. This is the simplest implementation:
static ssize_t read_miscdrv(struct file *filp, char __user *ubuf, size_t count, loff_t *off)
{
pr_info("to read %zd bytes\n", count);
return count; /* 假装读取了 count 字节 */
}
At this point, if you test it with dd if=/dev/llkd_miscdrv of=readtest bs=4k count=1, dd will succeed, and you'll get a file filled with zeros. Why? Because our driver doesn't actually copy any data into ubuf; it just returns success (count). The buffer in user space's dd was originally zeroed out, so it reads zeros.
Real Data Transfer: User Space and Kernel Space
The previous driver was "fake." A real driver needs to transfer hardware data (or kernel data) to the user.
This brings up a core issue: kernel space and user space memory are isolated. You cannot simply use memcpy() to copy data between the two. Not only is this insecure, but it will also straight-up crash the system on certain architectures.
The kernel provides two dedicated primary APIs:
copy_to_user(void __user *to, const void *from, unsigned long n): Copy from kernel to user.copy_from_user(void *to, const void __user *from, unsigned long n): Copy from user to kernel.
These functions check whether the user-space address is valid and writable. They might trigger page faults, causing the process to sleep, so you absolutely must not use them in interrupt context or while holding a spinlock.
Using copy_to_user
static ssize_t read_method(struct file *filp, char __user *ubuf, size_t count, loff_t *off)
{
char *kbuf = kzalloc(...); // 内核缓冲区
if (!kbuf) return -ENOMEM;
/* 假设这里已经从硬件读到了 kbuf 里 */
/* 拷贝到用户空间 */
if (copy_to_user(ubuf, kbuf, count)) {
dev_warn(dev, "copy_to_user() failed\n");
kfree(kbuf);
return -EFAULT;
}
kfree(kbuf);
return count; /* 返回成功读取的字节数 */
}
If copy_to_user returns non-zero, it means the copy didn't fully succeed (usually due to an invalid user-space address), and returning -EFAULT is the standard practice.
Advanced Exercise: A misc Driver with a "Secret"
Now, we are going to write a more complete driver: ch1/miscdrv_rdwr.
This driver will store a "secret string" in the kernel. Users can read it to retrieve the secret, or write it to update the secret.
To achieve this, we need to define a driver context structure (Private Data) to store global state.
/* 驱动上下文结构体 */
struct drv_ctx {
struct device *dev;
int tx, rx, err;
char oursecret[128]; /* 秘密存储地 */
};
static struct drv_ctx *ctx;
Initialization
In the init function, we allocate memory and initialize it:
ctx = devm_kzalloc(dev, sizeof(struct drv_ctx), GFP_KERNEL);
if (unlikely(!ctx))
return -ENOMEM;
ctx->dev = dev;
strscpy(ctx->oursecret, "initmsg", 8); /* 初始化秘密 */
Note the use of devm_kzalloc here. This is the resource-managed version of kzalloc. When the driver is unloaded, the kernel will automatically free this memory, so we don't need to manually call kfree. This greatly reduces the risk of memory leaks.
Read Method Implementation
static ssize_t read_miscdrv_rdwr(struct file *filp, char __user *ubuf, size_t count, loff_t *off)
{
int secret_len = strlen(ctx->oursecret);
struct device *dev = ctx->dev;
if (count < secret_len) return -EINVAL; /* 缓冲区太小 */
/* 核心操作:把秘密发给用户 */
if (copy_to_user(ubuf, ctx->oursecret, secret_len)) {
dev_warn(dev, "copy_to_user() failed\n");
return -EFAULT;
}
/* 更新统计信息 */
ctx->tx += secret_len;
dev_info(dev, " %d bytes read, returning... (stats: tx=%d rx=%d)\n",
secret_len, ctx->tx, ctx->rx);
return secret_len;
}
Write Method Implementation
static ssize_t write_miscdrv_rdwr(struct file *filp, const char __user *ubuf, size_t count, loff_t *off)
{
void *kbuf = NULL;
struct device *dev = ctx->dev;
if (unlikely(count > MAXBYTES)) return -EFBIG; /* 数据太大 */
/* 分配临时内核缓冲区 */
kbuf = kvmalloc(count, GFP_KERNEL);
if (unlikely(!kbuf)) return -ENOMEM;
/* 从用户拿数据 */
if (copy_from_user(kbuf, ubuf, count)) {
dev_warn(dev, "copy_from_user() failed\n");
kvfree(kbuf);
return -EFAULT;
}
/* 更新秘密 */
strscpy(ctx->oursecret, kbuf, count);
/* 更新统计 */
ctx->rx += count;
dev_info(dev, " %zd bytes written, returning... (stats: tx=%d rx=%d)\n",
count, ctx->tx, ctx->rx);
kvfree(kbuf);
return count;
}
Now, you can compile this driver, write a user-space program to read and write, and you will see the "secret" being passed back and forth between the kernel and user space.
Security Issues: When a Driver Becomes a Nightmare
Here we reach a very serious turning point. You might think the logic in the above code is simple, but even a single line of incorrect code can blow the kernel's security doors wide open.
Remember the question posed in the introduction—"why can't we just write kernel code casually?"—because you are in Ring 0.
Let's see what happens if we intentionally get a pointer wrong. We'll write a "bad" driver, bad_miscdrv.
Scenario 1: The Read Pitfall
Suppose that in read, we accidentally write the wrong destination address, or the user passes in a malicious address.
/* 错误示范 */
new_dest = ubuf + (512*1024); /* 指向非法位置 */
copy_to_user(new_dest, ctx->oursecret, secret_len);
Modern kernels have KASAN (Kernel Address Sanitizer) and access_ok() checks. Such illegal access usually causes copy_to_user to fail, returning -EFAULT, and user space will receive a "Bad address" error. While this will crash the program, it at least won't immediately lead to privilege escalation.
Scenario 2: The Write Pitfall — Privilege Escalation
This is where the real terror lies. The destination address of copy_from_user is a kernel buffer. If we can control this destination address, we can write data to any location in kernel memory.
A Linux process's permission information is stored in the struct cred structure within task_struct. If uid is 0, it means Root.
Imagine if the driver's write method had a vulnerability like this:
/* 假设驱动里有一个逻辑漏洞,导致 new_dest 被我们控制 */
new_dest = ¤t->cred->uid; /* 极其危险的写法! */
count = 4; /* uid 是 32 位整数 */
copy_from_user(new_dest, ubuf, count);
If user space sends 4 bytes of zeros, this code will overwrite the current process's UID with 0.
Consequences:
- The
writesystem call returns. - The user process checks its own UID—it has become 0 (Root).
- The process spawns a Root Shell.
This is the most classic principle of kernel privilege escalation. Although this is just a demonstration, countless historical CVEs have been caused by similar pointer errors and missing boundary checks.
Remember that question from the beginning? — Why can't driver development be as casual as writing Python? Now you should be able to answer: because any casual memory access can turn into a backdoor to Root. Every line of
copy_from_userand boundary check we write is essentially reinforcing that door.
Chapter Echoes
What this chapter is really doing is establishing the underlying cognitive framework of "user-space and kernel-space interaction." On the surface, we are configuring a misc device, but in reality, we are understanding why copy_to_user must exist, and why struct cred (process credentials) is the last line of defense for kernel security.
We didn't just write code; we deconstructed the data flow behind it: from a user-space open() call, piercing through the VFS layer, and ultimately landing on the driver module's C function pointers. It's like looking under the hood of a car to see exactly how the spark plugs fire.
In the next chapter, we will push this mechanism into a new scenario—where you will find that the intuition built today will come in handy in unexpected ways: when a real hardware interrupt occurs, this simple read/write model will become more complex, and even more fascinating.
Exercises
Exercise 1: Understanding
Question: In the Linux Device Model, if you view the device node miscdrv under the /dev directory and its attributes show as crw-rw-rw- 1 root root 10, 56, ..., what do these three numbers represent based on this chapter's content? How are they established during system boot or driver loading?
Answer and Analysis
Answer: 10 represents the major number, 56 represents the minor number, and 1 represents the hard link count (what is shown here is not the device number, but the node's file attributes).
Analysis: In Linux device drivers, 10 is the device's major number, identifying the device type (in this case, a misc character device). 56 is the minor number, dynamically allocated by the kernel or specified by the driver, used to distinguish different device instances under the same major number. 1 is the filesystem-level hard link count, not a device identifier.
As mentioned in this chapter, misc class devices share major number 10. When the driver calls misc_register() and passes in MISC_DYNAMIC_MINOR, the kernel automatically allocates an available minor number (56 in this case). The major number is statically allocated to the misc class by the kernel, used to index to the correct driver handler functions at the VFS layer.
Exercise 2: Application
Question: Suppose you are writing a character device driver and need to safely transfer 100 bytes of data from a kernel buffer kbuf to user space in the read method. Which kernel API should you use to ensure kernel security and properly handle potential page faults? What would be the consequences of using memcpy directly?
Answer and Analysis
Answer: You should use copy_to_user. Using memcpy directly would lead to kernel security breaches or system instability (because the user-space pointer might be invalid or trigger page faults, which the kernel cannot handle directly in this manner).
Analysis: As explained in the chapter, memory access between kernel space and user space is strictly restricted. copy_to_user is specifically designed to copy data from kernel space to user space. It not only performs the copy but also validates the user-space address. If the pointer is invalid, it returns the number of bytes that were not copied instead of causing a system crash.
Using memcpy directly bypasses these security checks. If the address provided by user space is invalid (e.g., mapped as read-only memory or an unmapped address), the kernel will trigger an exception, typically resulting in an Oops (kernel panic).
Exercise 3: Application
Question: When designing a misc driver's file_operations, if we do not intend to support llseek functionality, simply setting the .llseek member to NULL is not enough. Why? What is the correct approach?
Answer and Analysis
Answer: Because the VFS layer's default handling mechanism for llseek might return a random positive number (simulating success) instead of returning an error. The correct approach is to assign .llseek to no_llseek and call nonseekable_open in the open method.
Analysis: As described in the 'Handling unsupported methods' section, if you simply set llseek to NULL, the default VFS handling might modify the file position pointer and return a seemingly successful random positive value, which would mislead user-space programs.
To explicitly inform user space that the device does not support seeking, you must explicitly set .llseek to the kernel-provided no_llseek function. Additionally, best practice dictates calling nonseekable_open(inode, filp) in the driver's open callback to thoroughly mark the file as non-seekable.
Exercise 4: Thinking
Question: This chapter introduced devm_kzalloc (resource-managed allocation) for driver memory allocation. Please consider and analyze: when allocating memory in a driver's probe method (or init function), which approach is more prone to resource leaks when handling error paths like "device initialization failure" or "driver unloading"—the traditional kmalloc + kfree manual management approach, or using devm_kzalloc? Why?
Answer and Analysis
Answer: The traditional manual management approach (kmalloc/kfree) is more prone to resource leaks.
Analysis: In driver development, the initialization process often involves multiple steps (allocating memory, registering the device, requesting an IRQ, etc.). If a step fails midway, the traditional approach requires carefully rolling back and freeing all previously allocated resources. It is easy to miss certain kfree calls, leading to memory leaks.
In contrast, devm_kzalloc binds the memory lifecycle to the device itself. When the device is removed from the kernel or the driver is unloaded, the kernel automatically frees all resources allocated via the devm API. This eliminates the need to write tedious error-handling rollback code, significantly reducing the risk of resource leaks in complex initialization logic. This reflects the philosophy of modern Linux kernel design: leveraging object lifecycle management to simplify driver development.
Key Takeaways
There is a strict boundary between user space and kernel space. Kernel drivers run at the highest privilege level, and any minor memory access error can lead to system crashes or severe security vulnerabilities. Therefore, we cannot be as casual as when writing user-space programs. Driver development is essentially about building a controlled channel that allows ordinary programs to safely interact with hardware through the kernel. This transition from a "protected" to a "bare-metal" environment is the primary mindset developers must establish.
Linux integrates devices into the system through the "everything is a file" abstraction, with the VFS (Virtual File System) layer acting as the dispatcher between user requests and driver code. When a user program calls read or write, the request does not go directly to the hardware; instead, it is intercepted by the VFS, which looks up the corresponding file_operations structure. This structure stores the function pointers (like .open, .read) registered by the driver developer, and the VFS jumps to execute the specific driver logic through these pointers.
To avoid the tedium and conflicts of manually managing major numbers, modern driver development typically adopts the misc (miscellaneous) class mechanism. This is a scheme that unifies all non-standard devices under major number 10. The kernel uses the minor number to distinguish specific misc devices. Developers only need to call misc_register() and fill in the device name and operation interface, and the kernel will automatically create the corresponding device node under /dev.
Because the page tables for kernel space and user space are isolated, driver code must absolutely never use memcpy to exchange data directly. Instead, it must use the two dedicated APIs: copy_to_user and copy_from_user. These functions not only handle data transfer but also strictly validate the legitimacy of user-space pointers, preventing the kernel from panicking due to illegal address accesses and ensuring a safe error return rather than a crash when pointers are invalid.
Security is the lifeline of driver development, and lax boundary checks are often the source of privilege escalation. If data length validation is missing during write operations, an attacker could exploit a buffer overflow to overwrite critical data structures in kernel memory (such as the UID in struct cred), elevating a normal process's privileges to Root. Correctly using managed interfaces like devm_kzalloc along with strict parameter validation are key measures for reinforcing this kernel door.