6. Kernel Mechanism Essentials — Processes and Threads

In the last chapter, we got our feet wet with kernel modules. By now, you should be able to write some simple code inside the kernel. But that's just the tip of the iceberg. The Linux kernel is massive, complex, and profound. If we want to navigate it freely, knowing only printk is far from enough.

In this chapter, we will truly open the door to the kernel's internal mechanisms. The topic we are going to discuss is how processes and threads are actually managed inside the kernel. Why is this important? Because if you don't understand what "processes" and "threads" look like through the kernel's eyes, you won't be able to grasp the memory management discussions in the next chapter, let alone write efficient drivers. We will see how those concepts you take for granted in user space—like stacks, heaps, or even the fact that "I am running"—are completely reconstructed from the kernel's perspective.

In fact, there is a fundamental difference between writing kernel code and writing user-space code: in user space, you are the master of your process; in kernel space, you are doing work in a borrowed context. Figuring out whose context you are "borrowing," and what limitations that context imposes, is the core mission of this chapter.

To explain this clearly, I've split this content into two chapters. This chapter focuses on the architecture of processes and threads, and the next chapter will dive into the internals of memory management. Of course, real kernel development experience is scattered throughout the entire book—like CPU scheduling, synchronization primitives, and so on—but those are stories for later.

The core questions we will tackle in this chapter include:

On whose "turf" is kernel code actually running? (Process context vs. Interrupt context)
What does a process's Virtual Address Space (VAS) actually look like?
How does the kernel organize these processes, threads, and their stacks?
How do we find and manipulate that ultimate task-describing structure in the kernel — task_struct?
How do we iterate over all tasks in the system?

Ready? Let's get our VM (Virtual Machine) ready first. Make sure you have set up the environment following the steps in the Online Chapter. If not, go back and get it done—the upcoming content is best verified by typing it in yourself.

6.1 Understanding Process and Interrupt Contexts

Back in Chapter 4, when we wrote our first kernel module, we briefly touched on kernel architecture. Now, it's time to lay this topic out in full.

Modern CPUs typically have the concept of privilege levels. For example, x86 has 4 Rings (Ring 0 being the most privileged, Ring 3 the least), ARM-32 has 7 modes, and ARM64 has Exception Levels (EL0-EL3). But regardless of the architecture, modern operating systems simplify this into two levels in practice: privileged (kernel mode) and unprivileged (user mode).

This is crucial.

What comes next might challenge your intuition a bit: Linux is a monolithic kernel. The literal meaning of this is "one giant stone." What does this mean? It means that when your process initiates a system call, there is no mysterious "kernel process" that jumps out to do the work for you.

The truth is: that process itself switches into kernel mode and personally executes the kernel code.

So, we say that kernel code executes in process context. This isn't just terminology; it's the foundation for understanding kernel behavior. The vast majority of driver code, exception handling (like page faults), and even parts of the scheduler, all run in this context.

But you might ask: aside from process context, is there another way for kernel code to live?

Of course. Imagine you are deeply immersed in the joy of coding, and suddenly your network card receives a packet. A hardware interrupt signal is instantly triggered. No matter what the CPU is doing at that moment (even if it's executing kernel code), it must immediately stop, save its current state, and jump to execute the Interrupt Service Routine (ISR). The code executing at this moment is running in interrupt context.

These are two completely different worldviews:

Process context: Actively initiated by a process (via system call or exception), and synchronous. Essentially, "I" am still doing the work, just with elevated privileges.
Interrupt context: Asynchronously triggered by hardware. At this point, you are not any process; you are the interrupt itself. You cannot sleep, you cannot block, and you must finish fast.

Figure 6.1 illustrates this conceptual view: user-space threads enter the kernel via system calls; meanwhile, pure kernel threads (the kind that don't need user space) are silently doing their work; and interrupts are the uninvited guests that interrupt everything at any moment.

Mode A: SICP Tense Mode The distinction here is not an academic game; it is the line between life and death. In process context, you can allocate memory, wait for locks, and sleep. In interrupt context, if you dare to sleep, the system crashes immediately.

Why? Because if you trigger the scheduler while sleeping, how does the scheduler know who to wake up? You aren't even a "process."

We'll teach you how to determine which context you're in later. But keep this intuition in mind for now.

6.2 Understanding the Basics of Process Virtual Address Space (VAS)

Before diving into the kernel, we need to review the process's "home"—the Virtual Address Space (VAS). There is an iron rule here: memory is sandboxed. A process believes it has the entire memory space to itself; looking "outside" is impossible.

The user-space VAS is divided into several homogeneous regions, which we call segments or, more technically, mappings—because they are essentially pieced together by the kernel via the mmap() system call.

Figure 6.2 shows a minimal set of a standard Linux user-space process VAS. Let's do a quick pass from low addresses to high addresses:

Text Segment (Code Segment)

This is where machine instructions reside. The Instruction Pointer (IP/PC) dances here. It is read-only (r-x). Note that the Text segment does not start at address 0. The page near address 0 is the famous "null pointer trap page," specifically designed to catch NULL pointer accesses.

Data Segment

Immediately following the Text segment. It stores global and static variables (rw-). In reality, it is split into three parts:

Initialized Data: Initialized global/static variables.
Uninitialized Data (BSS): Uninitialized global/static variables, automatically zeroed at runtime.
Heap: The home of malloc(). But note that in modern glibc, only requests smaller than 128KB (MMAP_THRESHOLD) are allocated from the heap. Larger chunks of memory are handled separately via mmap(), known as anonymous mapping. The heap is dynamic, and it is the only segment that grows toward higher addresses. The legal address boundary at the top of the heap is called the Program Break (which you can check with sbrk(0)).

Shared Libraries

All dynamically linked shared libraries (.so) are mapped into a region somewhere between the heap and the stack.

Stack

This is the LIFO (Last-In, First-Out) world. Function calls, parameter passing, and local variables all live here. On modern CPUs (x86, ARM), the stack grows downward, which is known as a Fully Descending Stack.

Here is an interesting detail: although logically we say "a stack frame is allocated for each function call," the actual physical operation isn't that complex—the Stack Pointer (SP) moves, and the stack frame appears; when returning, it just pops back. This design is incredibly fast.

Regarding threads: A process has at least one thread (the main() thread). If there are multiple threads (thrd2, thrd3 in Figure 6.2), they share almost everything in the process's VAS—except the stack. Each thread has its own independent private stack. The main thread's stack is at the very top of the VAS, while the stacks of other threads sprout up in the "shared region" between the heap and the stack.

Analogy Recall: Back to the "Sandbox" We can imagine a process's VAS as a courtyard house (Sandbox).

Text/Data are the foundation and load-bearing walls; once built, they don't move.

Heap is a temporary shed you build in the yard, which can continuously expand toward the center of the yard (higher addresses).

Stack is a basket hanging from the ceiling. The more things you put in, the heavier the basket gets, and the longer the rope extends (growing toward lower addresses).

Threads: If they are a family living in the courtyard house, everyone shares the kitchen and bathroom, but each person has their own private stash (stack) hidden in their own suitcase. You can't go rummaging through someone else's suitcase, or things will get messy.

This analogy helps you understand why in multi-threaded programming, local variables are safe (in their own baskets), while global variables require caution (everyone can see them).

6.3 Organizing Processes, Threads, and Stacks — User Space and Kernel Space

The traditional UNIX philosophy is "everything is a process." But in the eyes of the modern kernel, threads are the basic unit of scheduling. The Linux kernel does not distinguish between processes and threads; a thread is simply "a process that shares certain resources." Every thread—whether in user space or in the kernel—corresponds to a kernel metadata structure: task_struct. We will discuss this structure in detail later.

Here is a key point that is easily overlooked: we need to prepare a stack for each thread at every privilege level.

Linux has two privilege levels: user mode and kernel mode. This means that every living user-space thread actually has two stacks:

User-space stack: Used when running user code. The local variables you define in C live here.
Kernel-space stack: Once you trap into the kernel via a system call, or trigger an exception (like a page fault), the CPU switches to kernel mode, and the Stack Pointer (SP) instantly points to this kernel stack. All kernel function calls and local variables within the kernel reside on this tiny kernel stack.

The only exception is kernel threads. They are born in the kernel and only see kernel space, so they only have a kernel stack and no user stack.

⚠️ Pitfall Warning Don't assume the kernel stack is large. On 32-bit systems, it's usually only 2 pages (8KB); on 64-bit systems, it's only 4 pages (16KB). Defining a int huge_array[4096] in a kernel module? You might blow the stack right out, leading to a hard-to-debug crash. Later on, when we need large chunks of memory in the kernel, we must allocate them dynamically (kmalloc/vmalloc). Never try to be cheap by putting them on the stack.

One more detail: Hardware IRQ Stacks When a hardware interrupt occurs, the CPU doesn't necessarily use the interrupted process's kernel stack (that's too small—what if the interrupt handler is also greedy?). Many architectures (including x86_64) prepare a separate IRQ stack for each CPU core, dedicated to handling interrupts. This prevents the interrupt handler from trampling an innocent process's kernel stack.

6.4 Inspecting the User Stack and Kernel Stack

Stacks are your lifeline when debugging. If you want to figure out "how did the program get here?" or "why is it stuck here?", looking at the stack is almost the only way. But as we just mentioned, each thread has two stacks—how do we look at them?

Inspecting the Kernel Stack: `/proc/PID/stack`

Good news: the kernel has made this part very user-friendly. As long as you have root privileges, you can directly read /proc/PID/stack to get the current kernel stack trace of that process.

$ sudo cat /proc/2549/stack
[<0>] do_wait+0x184/0x340
[<0>] kernel_wait4+0xaf/0x150
[<0>] __do_sys_wait4+0x89/0xa0
[<0>] __x64_sys_wait4+0x1e/0x30
[<0>] do_syscall_64+0x5c/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

How do you read it?

From bottom to top. The very bottom is the entry point (entry_SYSCALL_64...), and the very top is the function currently executing (do_wait).
do_wait+0x184/0x340 means: currently stopped at an offset of 0x184 bytes inside this function, and the total length of the function is 0x340 bytes.
[<0>] was originally the code segment address, but to prevent hackers from leaking kernel layout information through these details (KASLR), modern kernels usually zero it out.

Look, our Bash process is currently hanging out in the kernel, dumbly waiting for a child process to finish via the wait4 system call.

Inspecting the User Stack: GDB Magic

Inspecting the user stack is actually a bit more troublesome. The most universal way is to use GDB. For convenience, you can write a simple script ch6/ustack, the core of which is calling GDB's bt command.

sudo gdb \
  -ex "set pagination 0" \
  -ex "thread apply all bt" \
  --batch -p $PID

If we throw this at our Bash process from earlier, you'll see output similar to this:

Thread 1 (process 2549):
#0  0x00007fadd3109c3a in __GI___wait4 ...
#1  0x0000555b98cd4f03 in ?? ()
#2  0x0000555b98cd6373 in wait_for ()
...

Likewise, read this from bottom to top. The #0 at the very top is the current frame.

eBPF: Modern Magic

If you feel the above approach is too "traditional," let's look at modern weaponry — eBPF.

Using stackcount from the BCC (BPF Compiler Collection) toolkit, we can do even cooler things: trace both the kernel stack and the user stack simultaneously.

# 举个栗子，我们想看 ping 程序在干嘛
sudo stackcount-bpfcc -d \
  `pgrep ping` \
  --delimited \
  ping_echo_send

This tool will attach itself to the Ping process like handing out flyers. Once it calls ping_echo_send, it captures and prints both the kernel and user call stacks at that exact moment. The --delimited parameter draws a -- divider line in the output, with the kernel on the left and the user on the right.

Feynman Flow Mode Honestly, the first time I saw the output from tools like eBPF, my reaction was "I don't fully get it, but it's awesome." You don't need to master all the details of BCC right now, but you need to build an intuition: Modern kernel debugging is no longer just staring at /proc files; it's about using instrumentation to make the kernel "spill its guts" on critical paths for you to see.

This is a God's-eye view. We used to guess what happened; now we watch it happen in real time.

6.5 The Kernel Task Structure: `task_struct`

Alright, now let's go back to that big family portrait (Figure 6.3/6.5). Besides code and data, the most core things in kernel space are those individual task_struct structures.

This is the "ID card" in the kernel's eyes. Every living thread—whether user-space or kernel—has a corresponding task_struct. It stores everything: the PID, memory descriptors, open files, signal handling, scheduling information... even whether this thread has been sleeping a lot lately.

This structure is defined in include/linux/sched.h. It is absurdly large (13KB+ on x86_64 in kernel 6.1).

SICP Tense Mode This is not just a data structure. This is the operating system's abstraction of the real world. When we write current->pid in our code, we are essentially asking the kernel: "Who is the soul currently running on the CPU?"

6.6 Accessing the Task Structure via the `current` Macro

Since there are hundreds or thousands of task_struct hanging on the kernel's linked lists, when my kernel code is running, how do I know "who am I"?

The kernel developers are a clever bunch; they came up with a macro: current.

#include <linux/sched.h>

current->pid;  // 拿到当前进程（线程）的 PID
current->comm; // 拿到当前进程的名字（去掉路径的）

You can think of current like the this pointer in C++, but it points to the task_struct of the thread currently executing kernel code.

Its implementation is highly architecture-dependent:

x86_64: Leverages per-CPU variables, making it extremely fast with no locks needed.
ARM64 / PowerPC: Even dedicates a general-purpose register to store this thing, making it a single-instruction lookup.

Determining Context: Who am I? Where am I?

We repeatedly emphasized earlier that you must not sleep in atomic context. So how do you know if you're in atomic context? The kernel provides an incredibly handy macro:

#include <linux/preempt.h>

if (in_task()) {
    // 这是进程上下文，通常可以睡觉
    foo();
} else {
    // 这是中断上下文（或持有自旋锁），绝对不能睡！
    bar();
}

⚠️ Pitfall Warning in_task() returning true doesn't necessarily mean you can sleep! If you are in process context but holding a spinlock, you are still part of an atomic context. Calling msleep() then? You're just asking to deadlock.

Remember a simple rule: don't sleep while holding a lock, and don't sleep in an interrupt.

Hands-on: The `current_affairs` Module

Let's write a module to test the waters. We'll print the current context information when the module initializes (init) and exits (cleanup).

/* 代码节选自 ch6/current_affairs/current_affairs.c */
static void show_ctx(char *nm)
{
    // ... 头文件包含 ...
    if (likely(in_task())) {
        pr_info("Running in process context ::\n"
                " name        : %s\n"
                " PID         : %6d\n"
                " TGID        : %6d\n"
                " UID         : %6u\n"
                " EUID        : %6u (%s root)\n"
                // ... 打印指针地址 ...
                current->comm,
                task_pid_nr(current), // 推荐使用 helper 宏
                task_tgid_nr(current),
                // ...
                );
    } else {
        pr_alert("Whoa! running in interrupt context! Should NOT happen here\n");
    }
}

Here we use a common micro-optimization macro, likely(). It tells the compiler: "This condition is highly likely to be true, so optimize the assembly code in that direction." You'll see this everywhere in the kernel.

When you insmod this module, guess who current points to?

It's the insmod process itself!

Callback: Linux is a Monolithic Kernel Remember what we said in the chapter intro? "You are in a borrowed context." When you type insmod module.ko in the terminal, the insmod process initiates a system call, and the kernel switches to privileged mode. There is no "kernel daemon" taking over your module. It is you (that insmod process) executing the module's init function in kernel mode.

This is the essence of a monolithic kernel. (A microkernel is the model where you send messages to a "server process" to get things done.)

6.7 Traversing the Kernel's Task List

Since all task_struct are strung together on a massive doubly-linked circular list, can we look at them one by one, like flipping through a roster? Of course we can.

The kernel provides a macro called for_each_process().

#include <linux/sched/signal.h>

struct task_struct *p;
for_each_process(p) {
    // p 指向每一个进程的 task_struct
    // 注意：它只遍历每个进程的主线程
}

But there's a catch: for_each_process technically only iterates over the "leader" (main thread) of each process group. If you want to find all threads in the system (including those underlings in multi-threaded processes), you need a more powerful macro:

Older kernels (< 6.6): do_each_thread(p, t) { ... } while_each_thread(p, t);
Newer kernels (>= 6.6): for_each_process_thread(p, t);

We wrote a demo module ch6/foreach/thrd_showall. When you insert it and look at the output in dmesg, you'll see a densely packed task list.

In the output of Figure 6.12, there is a very interesting column: TGID vs PID.

PID: In the kernel's eyes, this is the Thread ID. Every thread is different.
TGID: Thread Group ID. This is what we call the "Process ID" in user space.

If a process is single-threaded, PID == TGID. If it is multi-threaded, the main thread's PID == TGID, while the other threads have different PIDs, but their TGID all equal the main thread's PID.

You can verify this with the ps -LA command:

The first column, PID, is actually the TGID.
The second column, LWP (Light-Weight Process), is the PID in the kernel.

Chapter Summary

In this chapter, we built a complete map: from CPU privilege levels, to a process's Virtual Address Space (heap, stack, code segment), to how the kernel manages hundreds or thousands of threads using task_struct and doubly-linked lists, and how we locate ourselves in this massive web using the current macro.

In particular, the distinction between process context and interrupt context, along with the monolithic kernel characteristic that "user-space processes execute kernel code themselves", forms the foundation for understanding all subsequent driver models. Without understanding this, when you write code like "calling copy_from_user while holding a spinlock," you'll just be left staring blindly at a system crash.

In the next chapter, we will dive into another dimension of this memory — memory management. You'll discover how the structures we discussed today (task_struct, mm_struct) actually mount to physical memory pages.

It's a long journey, but now you have the map.

Exercises

Exercise 1: Understanding

Question: In Linux kernel programming, why are functions that might sleep (like kmalloc(GFP_KERNEL) or down()) strictly prohibited in interrupt context? What does this have to do with process scheduling?

Answer and Analysis

Answer: Because interrupt context is not associated with any specific Process Descriptor, and there is no "returnable" process for the scheduler to switch to. If you sleep in an interrupt, the scheduler cannot find the correct process to resume execution, leading to a system deadlock or crash.

Analysis: Tests the understanding of the essential difference between Process Context and Interrupt Context.

Process context: Triggered by a system call or exception, attached to a specific process (the current pointer is valid). If it sleeps, the scheduler can suspend the current process and wake it up later to resume.
Interrupt context: Triggered asynchronously by hardware, not part of any specific process. The interrupt handler interrupted a running process (whether in user mode or kernel mode).

If the interrupt handler attempts to sleep, the kernel scheduler will try to switch the CPU to another process. However, since the interrupt was not initiated by a "process," the scheduler cannot return the CPU to the state it was in when the interrupt occurred (because not enough "process context" information was saved to allow such a return). Therefore, interrupt handlers must execute quickly and cannot block or sleep.

Exercise 2: Application

Question: On a running Linux system, you have written a kernel module. In one of its functions, you need to determine whether the current code is in "atomic context." Which kernel macro would you use to make this determination? Please write the name of the macro and briefly describe two typical scenarios where its return value is true.

Answer and Analysis

Answer: Use the in_atomic() macro.

Two typical scenarios:

Executing within a hardware interrupt handler or softirq.
In process context while holding a spinlock.

Analysis: Tests the practical application of kernel context detection tools.

in_atomic(): This macro is used to detect whether code is in an atomic context, meaning whether process scheduling is allowed.
Scenario 1 (Interrupt Context): When handling a hardware interrupt, sleeping is obviously not allowed, and in_atomic() returns true.
Scenario 2 (Spinlock): In process context, if the code already holds a spinlock, it has entered an atomic critical section. Although it belongs to process context, sleeping while holding a spinlock can lead to deadlocks, so in_atomic() also returns true.

Note: in_interrupt() can also be used to detect interrupt context, but in_atomic() has broader coverage, including situations where a spinlock is held.

Exercise 3: Application

Question: Suppose you are writing a kernel driver for a 64-bit system. In your driver, you define a local array char buf[4096]. Although it compiles successfully, the system occasionally crashes (Kernel Panic) under heavy load. Based on your knowledge of "kernel stack size," analyze what the most likely cause of the crash is. Why don't we usually need to worry about this issue in user-space programming?

Answer and Analysis

Answer: The cause of the crash is a kernel stack overflow.

The kernel stack is very small (usually only 16KB on 64-bit systems) and fixed in size. Every level deeper the function call chain goes, and every bit larger the local variables are, consumes this limited stack space. If the call chain is deep enough, a 4096-byte local array can easily eat up 1/4 to 1/2 of the stack space, causing a stack overflow that overwrites the adjacent thread_info or other critical data, triggering a Panic.

In user space, however, the stack space is dynamically allocated, typically defaulting to around 8MB, and can be adjusted via ulimit. The space is very generous, making stack overflows caused by local arrays extremely rare.

Analysis: Tests practical engineering awareness of the differences between the kernel stack and the user stack.

Kernel stack limits: Each thread's kernel stack usually has only 2 pages (32-bit) or 4 pages (64-bit), i.e., 8KB or 16KB. This must accommodate all function stack frames along the entire kernel call path.
Risk: Defining large local variables in the kernel is extremely dangerous. Under heavy system load, interrupt handling and nested driver calls can be quite deep, instantly exhausting stack space.
User stack comparison: The user-space stack resides in the VAS's Stack Segment. Not only is the space large, but it can also grow dynamically. The Linux kernel not only provides the space but also handles expansion via the guard page mechanism when the stack approaches its limit. Unless recursion goes out of control, overflows are generally unlikely.
Best practice: When large chunks of memory are needed in the kernel, kmalloc() or heap allocation should be used, rather than local arrays.

Exercise 4: Thinking

Question: Thought exercise: To defend against ROP (Return-Oriented Programming) attacks, the Linux kernel introduced "user-space shadow stacks." However, this feature (as mentioned in the relevant chapter) explicitly states that it targets "user space." Please analyze: why doesn't the kernel usually need the same hardware-level shadow stack to protect its own return addresses? Or more deeply, if kernel code is exploited, would this hardware protection mechanism still be effective?

Answer and Analysis

Answer: The kernel usually doesn't need a separate hardware shadow stack, and hardware protection mechanisms often become ineffective under the kernel's highest privilege level.

Analysis reasons:

Ring 0 privileges: The kernel runs at the highest privilege level. If an attacker can already control kernel execution flow (hijacking the return address), it means the attacker has already gained Ring 0 privileges. An attacker with the highest privileges can disable the shadow stack feature (e.g., by modifying the CR4 register) or directly read/write the Model-Specific Registers (MSRs) that control the shadow stack's memory model, thereby bypassing the protection.
Performance and overhead: The kernel is extremely sensitive to performance. Enabling shadow stacks brings significant context-switching overhead and interrupt handling latency.
Existing mechanisms: The kernel primarily relies on strict code review, compile-time checks using __builtin_return_address (like STACKPROTECTOR), and memory partitioning to mitigate stack overflow issues, rather than relying on runtime hardware locking of return addresses.

Analysis: This is a comprehensive thought exercise involving security mechanisms and privilege models.

Principle of shadow stacks: It stores return addresses separately in a protected, read-only memory area. When a function returns, the CPU compares the return address on the normal stack with the one in the shadow stack. If they don't match, an exception is triggered.
Applicability to user space: User programs run at a low privilege level and cannot tamper with the shadow stack's configuration or memory, making this hardware-enforced protection highly effective.
The paradox in kernel space: The cornerstone of security mechanisms is "high privilege protects low privilege." But if an attacker breaches the kernel, they possess the highest privileges. Who supervises the one with the highest privileges?
- If an attacker can overwrite the return address on the kernel stack, a kernel write vulnerability already exists. The attacker could just as easily build a ROP chain to execute instructions that disable shadow stack protection while overwriting the return address.
- Therefore, kernel security focuses more on preventing unintended memory writes (like using SMEP/SMAP to prevent the kernel from executing user-space code) rather than solely protecting return addresses from being modified.

Key Takeaways

Linux uses a monolithic kernel architecture, which means that when a system call or exception occurs, it is not some mysterious "kernel process" that takes over the CPU. Instead, the current process actively switches to a privileged level (kernel mode) and borrows its own context to execute kernel code. Understanding this is crucial because kernel code almost always runs in a "borrowed context," and we must constantly be vigilant about the current execution environment (process context or interrupt context), as the latter strictly prohibits sleeping or blocking operations.

The kernel manages all scheduling entities through a massive doubly-linked list and the core structure task_struct. In the eyes of the Linux kernel, threads are the basic unit of scheduling, and what we call a "process" is essentially a group of threads that share specific resources (like a memory address space). Whether it's a user-space thread or a pure kernel thread, it is described by an independent task_struct in the kernel. This means the kernel doesn't specifically distinguish between processes and threads; it only sees the individual as a scheduling entity.

Due to the division of privilege levels, every active user-space thread actually possesses two stacks: a user stack for running user code (dynamically allocated via malloc, etc.), and a kernel stack used when trapping into the kernel (usually very small, only 16KB on x86_64). When writing kernel modules, developers must constantly be wary of the kernel stack's size limit. Defining large arrays on the stack is strictly forbidden, otherwise it can easily lead to stack overflows and system crashes. Large chunks of memory must be obtained through dynamic allocation.

A process's Virtual Address Space (VAS) is strictly sandboxed, divided into regions like the code segment, data segment, heap, and stack. The heap grows toward higher addresses, while the stack (Fully Descending Stack) grows toward lower addresses. For multi-threaded programs, all threads share the process's VAS (code segment, global variables, etc.), but each thread has its own independent user stack area. This is the root cause of why local variables are safe in multi-threaded programming, while global variables require synchronization protection.

Locating "who am I" in kernel code is achieved through the macro current, which points to the task_struct of the task currently running on the CPU. Developers can use helper macros like in_task() to determine whether they are in process context or interrupt context: in process context, you can usually request resources or briefly sleep, but in interrupt context or while holding a spinlock, any operation that might cause blocking will trigger a system crash. Additionally, traversing all tasks in the system requires the for_each_process_thread macro, and you need to distinguish between the thread ID (PID) in kernel concepts and the process ID (TGID) as seen in user space.

6.1 Understanding Process and Interrupt Contexts​

6.2 Understanding the Basics of Process Virtual Address Space (VAS)​

Text Segment (Code Segment)​

Data Segment​

Shared Libraries​

Stack​

6.3 Organizing Processes, Threads, and Stacks — User Space and Kernel Space​

6.4 Inspecting the User Stack and Kernel Stack​

Inspecting the Kernel Stack: /proc/PID/stack​

Inspecting the User Stack: GDB Magic​

eBPF: Modern Magic​

6.5 The Kernel Task Structure: task_struct​

6.6 Accessing the Task Structure via the current Macro​

Determining Context: Who am I? Where am I?​

Hands-on: The current_affairs Module​

6.7 Traversing the Kernel's Task List​

Chapter Summary​

Exercises​

Exercise 1: Understanding​

Exercise 2: Application​

Exercise 3: Application​

Exercise 4: Thinking​

Key Takeaways​

6.1 Understanding Process and Interrupt Contexts

6.2 Understanding the Basics of Process Virtual Address Space (VAS)

Text Segment (Code Segment)

Data Segment

Shared Libraries

Stack

6.3 Organizing Processes, Threads, and Stacks — User Space and Kernel Space

6.4 Inspecting the User Stack and Kernel Stack

Inspecting the Kernel Stack: `/proc/PID/stack`

Inspecting the User Stack: GDB Magic

eBPF: Modern Magic

6.5 The Kernel Task Structure: `task_struct`

6.6 Accessing the Task Structure via the `current` Macro

Determining Context: Who am I? Where am I?

Hands-on: The `current_affairs` Module

6.7 Traversing the Kernel's Task List

Chapter Summary

Exercises

Exercise 1: Understanding

Exercise 2: Application

Exercise 3: Application

Exercise 4: Thinking

Key Takeaways