Day 5–6 · 内核核心数据结构¶

预计时长：2 小时 / 天，共 4 小时
类型：理论为主 + 代码阅读

做什么¶

彻底搞懂内核代码无处不在的四个数据结构：侵入式链表 list_head、哈希链表 hlist_head、红黑树 rbtree、引用计数 kref。重点是理解设计哲学，不只是 API 用法。不懂这些，读任何内核驱动都会卡住。

要了解什么¶

1. `list_head`：侵入式链表¶

普通链表 vs 侵入式链表的区别：

// 普通链表（传统做法）：链表节点包含数据
struct node {
    int data;
    struct node *next;
    struct node *prev;
};

// 侵入式链表（Linux 做法）：数据结构包含链表节点
struct my_device {
    char name[32];
    int irq;
    struct list_head list;  // ← 链表节点嵌入数据结构中
};

优势：一个链表实现可以被所有数据类型复用，无需为每种数据写一套链表。list_head 自身不包含数据，数据通过 container_of 找回。

核心 API（定义在 include/linux/list.h）：

// 初始化
LIST_HEAD(my_list);                          // 静态初始化链表头
INIT_LIST_HEAD(&dev->list);                  // 动态初始化节点

// 增删
list_add(&new->list, &head);                 // 头插（LIFO）
list_add_tail(&new->list, &head);            // 尾插（FIFO）
list_del(&entry->list);                      // 删除

// 遍历（重要！）
list_for_each_entry(pos, head, member) { }   // 正向遍历
list_for_each_entry_safe(pos, tmp, head, member) { } // 遍历中可删除

2. `container_of`：最重要的宏¶

// 已知 list_head *node，求包含它的 struct my_device *
struct my_device *dev = container_of(node, struct my_device, list);

原理（需要完全理解）：

#define container_of(ptr, type, member) ({              \
    void *__mptr = (void *)(ptr);                       \
    ((type *)(__mptr - offsetof(type, member)));        \
})

offsetof(type, member) = member 字段在结构体中的字节偏移量
ptr（list_head 指针）减去这个偏移量，就得到结构体起始地址
类型转换为 type *

这是指针算术，不是魔法。list_for_each_entry 内部就是在反复调用 container_of。

3. `hlist_head`：哈希链表¶

用于哈希表。普通链表头节点是双向的（两个指针），但哈希桶有成千上万个，如果每个桶都用 list_head（16 字节），内存浪费大。hlist_head 只有一个指针（8 字节），节省哈希桶的内存。

struct hlist_head bucket[1024];  // 哈希桶数组
struct my_item {
    int key;
    struct hlist_node node;      // 用 hlist_node 而非 list_head
};

// 查找
struct my_item *item;
hlist_for_each_entry(item, &bucket[hash], node) {
    if (item->key == target_key) { /* found */ }
}

4. `rbtree`：红黑树¶

内核用红黑树实现需要有序且快速查找的场景：

调度器的 CFS（按 vruntime 排序所有可运行任务）
内存管理的 VMA（按起始地址排序进程的虚拟内存区域）
定时器管理

关键点：内核的 rbtree 不内置 key 比较，需要你自己写插入逻辑，自己比较 key。这是刻意的设计——让你控制比较方式。

// 定义
struct my_node {
    struct rb_node rb;     // 嵌入 rb_node
    int key;
};
struct rb_root my_tree = RB_ROOT;

// 查找
struct rb_node *node = my_tree.rb_node;
while (node) {
    struct my_node *data = container_of(node, struct my_node, rb);
    if (key < data->key)       node = node->rb_left;
    else if (key > data->key)  node = node->rb_right;
    else return data;          // found
}

5. `kref`：引用计数¶

struct my_device {
    struct kref refcount;
    // ...
};

// 初始化（初始引用计数 = 1）
kref_init(&dev->refcount);

// 增加引用
kref_get(&dev->refcount);

// 减少引用（降到 0 时调用 release 函数）
kref_put(&dev->refcount, my_device_release);

// release 函数
static void my_device_release(struct kref *ref) {
    struct my_device *dev = container_of(ref, struct my_device, refcount);
    kfree(dev);  // 安全释放
}

核心思想：谁持有对象的引用，谁就调 kref_get；用完后调 kref_put，最后一个放手的调用者负责释放内存。防止 use-after-free。

练习¶

练习 1：手写 `container_of` 并验证¶

// 创建文件 ~/labs/container_of_demo.c（用户态程序，gcc 编译验证逻辑）
#include <stdio.h>
#include <stddef.h>

#define container_of(ptr, type, member) \
    ((type *)((char *)(ptr) - offsetof(type, member)))

struct list_head {
    struct list_head *next, *prev;
};

struct my_device {
    char name[32];
    int irq;
    struct list_head list;
};

int main() {
    struct my_device dev = { .name = "my_dev", .irq = 42 };
    struct list_head *lp = &dev.list;

    // 从 list_head 指针还原 my_device 指针
    struct my_device *recovered = container_of(lp, struct my_device, list);

    printf("name: %s, irq: %d\n", recovered->name, recovered->irq);
    printf("pointer match: %s\n", (recovered == &dev) ? "YES" : "NO");
    printf("offsetof: %zu bytes\n", offsetof(struct my_device, list));
    return 0;
}

gcc -o container_of_demo ~/labs/container_of_demo.c && ./container_of_demo

[ ] 修改结构体，在 list 之前加更多字段，观察 offsetof 的变化
[ ] 手算 offsetof(struct my_device, list) 的值，与程序输出对比

延伸：PenguinLab 项目中有完整的用户态侵入式链表实现和测试套件，见 example/kernel_base_ds/。完成练习 5 后可以对照阅读。

练习 2：在内核源码中找 `list_head` 的真实使用¶

cd third_party/linux

# 找一个用 list_head 的驱动
grep -r "list_for_each_entry" drivers/gpio/ | head -5

# 打开其中一个文件，找到 list_head 在哪个结构体里
# 跟踪它被 list_add / list_del 的地方

[ ] 找到 drivers/ 下任意一个用 list_for_each_entry_safe 的案例，解释为什么在那里不用普通的 list_for_each_entry

练习 3：阅读调度器中的 rbtree¶

# CFS 调度器用 rbtree 管理任务，按 vruntime 排序
less third_party/linux/kernel/sched/fair.c
# 搜索 rb_node，找到 enqueue_task_fair 函数
# 看 __enqueue_entity 如何做 rbtree 插入

[ ] 找到 CFS 调度器选取下一个运行任务的函数，理解它为什么取 rbtree 的最左节点

练习 4：阅读 `include/linux/list.h` 全文¶

wc -l third_party/linux/include/linux/list.h
# 大约 1200 行，建议分两次读完

[ ] 找到 list_for_each_entry_safe 的实现，解释 tmp 参数的作用
[ ] 找到 list_splice 和 list_splice_tail，理解两者的区别

练习 5：动手实现用户态侵入式链表（PenguinLab 实战）¶

PenguinLab 项目提供了 example/kernel_base_ds/ 目录，包含一个完整的用户态侵入式链表实现和测试套件。命名使用 penguin_ 前缀（避免与系统头文件冲突），实现原理与内核 list_head 完全一致。

5.1 阅读代码

# 阅读简化版内核链表头文件
cat example/kernel_base_ds/penguin_list.h

# 对比真实内核实现（粗略对比，看 API 命名和参数）
diff <(grep -E "static inline|define" example/kernel_base_ds/penguin_list.h | head -30) \
     <(grep -E "static inline|define" third_party/linux/include/linux/list.h | head -30)

5.2 运行测试套件

项目包含 12 个测试用例，覆盖链表的所有核心操作：

cd example/kernel_base_ds
mkdir -p build && cd build
cmake ..
make
./penguin_example

测试用例清单（全部在 penguin_example.c 中）：

测试	验证内容
`test_init_and_empty`	链表初始化、判空
`test_add_head`	头插法（栈语义）
`test_add_tail`	尾插法（队列语义）
`test_singular`	单节点判断
`test_first_last_entry`	list_first_entry / list_last_entry
`test_del`	中间节点删除
`test_del_init`	删除并重新初始化
`test_for_each_prev`	反向遍历
`test_safe_delete_in_loop`	遍历时安全删除
`test_splice`	链表合并
`test_splice_init`	合并并重新初始化源链表
`test_splice_empty`	合并空链表（边界条件）

5.3 对照内核源码理解差异

[ ] 阅读 penguin_list.h 中的 penguin_list_add，对比 third_party/linux/include/linux/list.h 中的 list_add，看内核版本多了什么（提示：WRITE_ONCE 等内存屏障机制）
[ ] container_of 在 penguin_list.h 中叫 PENGUIN_CONTAINER_OF，展开方式相同，但内核版本用了 __builtin_types_compatible_p 做类型检查
[ ] 在 penguin_example.c 中添加你自己的测试用例（例如：遍历删除所有节点、splice 后继续操作原链表）

延伸阅读¶

资料	具体位置	说明
《Linux 内核设计与实现》Robert Love	第 6 章	内核数据结构，覆盖 list / rbtree / radix tree
Linux Kernel Development Robert Love	Ch.6 "Kernel Data Structures"	同上，英文原版
内核源码	`include/linux/list.h`	最权威的参考，注释详细
内核源码	`include/linux/rbtree.h` + `lib/rbtree.c`	rbtree 实现
LWN.net	https://lwn.net/Articles/336224/	"Intrusive linked lists"，解释设计哲学
《深入理解 Linux 内核》Bovet & Cesati	第 3 章进程部分	list_head 在进程管理中的使用

Day 5–6 · 内核核心数据结构¶

做什么¶

要了解什么¶

1. list_head：侵入式链表¶

2. container_of：最重要的宏¶

3. hlist_head：哈希链表¶

4. rbtree：红黑树¶

5. kref：引用计数¶

练习¶

练习 1：手写 container_of 并验证¶

练习 2：在内核源码中找 list_head 的真实使用¶