Chapter 4: Hardware Interrupts and the Kernel's Response Strategy
Imagine this scenario.
You are deep in concentration writing code, or about to pick up that long-cold cup of coffee. Suddenly, the doorbell rings.
In that instant, your brain must react immediately: put the coffee down (save context), stand up (context switch), and answer the door (handle the interrupt). No matter what you were doing, this "hard interrupt" doorbell has the highest priority—it forces you to respond immediately.
Hardware Interrupts in an operating system work exactly the same way.
But things are slightly more complicated.
If your doorbell rang once every second, could you still write code? Or worse, what if the doorbell broke and rang continuously—wouldn't you go crazy? This is the reality we face every day in driver development: peripherals (keyboards, mice, network cards, disk controllers) are constantly sending signals to the CPU. If the CPU personally handled every single one, it would get nothing else done.
This brings us to the core topic of this chapter: How do we gracefully handle an interrupted life?
Early operating systems were simple: the CPU heard the bell, ran over, handled it, and came back. But in modern high-performance systems, this "honest" strategy no longer works. We need layered processing: someone to acknowledge the call (Top Half / Hard IRQ), and someone to do the heavy lifting (Bottom Half / Softirq / Threaded IRQ).
Our mission in this chapter is to figure out how this mechanism is built.
4.1 Hardware Interrupts and the Kernel Processing Flow
Don't rush to write code just yet. Before we allocate an IRQ, we need to understand the "wiring diagram" of this "doorbell."
From Hardware Electrical Signals to Kernel Abstractions
It all starts with a physical wire on the device.
When a device needs service (e.g., a network card receives a packet), it pulls the voltage high on this specific physical wire (or low, depending on the logic level). This wire ultimately connects to a chip on the motherboard—called IO-APIC on x86, and usually GIC (Generic Interrupt Controller) on ARM. This chip acts like a company's switchboard operator: it aggregates all incoming calls and decides which CPU line to route them to.
There is an abstraction layer to cross here.
As far as the CPU is concerned, it doesn't know what a "network card" or "keyboard" is. It only knows that "Interrupt Line 24" was triggered. This number is what we call an IRQ (Interrupt ReQuest).
You can think of an IRQ as the virtual ID number of a hardware interrupt. The kernel maintains a mapping table from IRQ numbers to specific device handler functions. When the switchboard (APIC/GIC) tells the CPU "Hey, line 24 has work," the CPU looks up the table, finds the corresponding handler, and jumps to execute it.
The Generic IRQ Handling Layer
But this "table lookup" action is actually quite troublesome.
Different interrupt controllers (PIC, IO-APIC, GIC) operate in completely different ways. If driver programs had to write to low-level registers for configuration every time, the kernel code would become a tangled mess.
To shield against these hardware differences, the Linux kernel invented the Generic IRQ handling layer.
The existence of this layer is critical. It acts as an adapter. Upper-layer driver programs only need to call standard APIs (like "allocate interrupt number 24 for me"), and the underlying generic layer translates this request into specific instructions for the interrupt controller. This layer also has to handle various tricky situations: for example, two devices sharing a single wire (shared IRQ), or the same interrupt arriving while its handler is still executing (interrupt masking).
Now we have established a complete mental model: peripheral pulls the wire $\rightarrow$ interrupt controller aggregates $\rightarrow$ CPU captures $\rightarrow$ kernel generic layer dispatches $\rightarrow$ driver handler executes.
The next question is: as driver authors, how do we hook ourselves into this chain?
Allocating a Hardware IRQ
To receive an interrupt, you must first "reserve" this number with the kernel.
This is not just about registering a callback function. The kernel does a lot of background work during this process: checking if this IRQ line is already occupied, whether sharing is allowed, what the corresponding device structure dev is, and whether the interrupt trigger mode is level-triggered or edge-triggered.
All of these configurations ultimately converge on a legacy API: request_irq().
Although modern code prefers its "managed version" (which we will cover later), understanding request_irq() is fundamental. Its prototype looks like this:
static inline int __must_check
request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags,
const char *name, void *dev)
Don't be intimidated by the parameters; they all have clear responsibilities:
irq: The hardware interrupt number you want to request. This usually comes from the Device Tree or the return value ofplatform_get_irq().handler: Your interrupt handler. This is the core; we'll dive into it shortly.flags: Flag bits. This is crucial, as it defines the interrupt behavior (e.g., fast interrupt or shared interrupt).name: A string passed to/proc/interruptsfor display. Pick a good name; you'll thank yourself during debugging.dev: A pointer to device-specific data. If you set theIRQF_SHAREDflag, this parameter is critical because it is used to distinguish between different devices sharing the same IRQ line.
If allocation succeeds, it returns 0. On failure, it returns a negative error code (e.g., -EBUSY means the line is already taken and the owner refuses to share).
Implementing the Interrupt Handler
Now let's look at handler.
When an interrupt actually occurs, the kernel pauses the current process, saves the context, and jumps to this function to execute. This means your code runs in a very special environment: Interrupt Context.
There is one extremely important thing to burn into your brain:
Never sleep in an interrupt handler.
You cannot call any function that might block, cannot allocate memory with the GFP_KERNEL flag (use GFP_ATOMIC instead), and cannot use a mutex. Your actions must be lightning-fast: handle the most urgent status acknowledgment, then exit immediately.
The standard handler signature looks like this:
static irqreturn_t my_isr(int irq, void *dev_id)
{
/* 1. 检查设备状态,确认是不是我们的设备发出的中断 */
/* 2. 如果是,清除设备寄存器里的中断 pending 位 */
/* 3. 执行最小限度的处理工作 */
return IRQ_HANDLED; // 或者 IRQ_NONE
}
Note the return type irqreturn_t.
- Returning
IRQ_HANDLED: Indicates that your device indeed triggered the interrupt and you have handled it. - Returning
IRQ_NONE: Indicates that this interrupt was not triggered by your device (common in shared interrupt scenarios).
Using the Threaded Interrupt Model
The "no sleeping" restriction is too painful.
Some hardware interrupt handling logic is genuinely complex—it might require reading data over a slow bus (like I2C) or locking to access a critical section. If we do this in a hardirq, system latency will skyrocket.
To solve this problem, modern Linux kernels introduced the Threaded IRQ model.
This is a very clever compromise. It splits interrupt handling into two stages:
- Primary Handler (Hardirq): Like a traditional hard interrupt, it still runs in an atomic context. But its task is extremely simplified—it only does the most urgent things, like acknowledging hardware status and masking the interrupt source. Then, instead of returning
IRQ_HANDLED, it wakes up a kernel thread. - Threaded Handler: This is a function running in a kernel thread context. Within this thread, you can sleep, use mutexes, and do anything that can be done in a process context.
To use this model, you need the request_threaded_irq() API:
int request_threaded_irq(unsigned int irq, irq_handler_t handler,
irq_handler_t thread_fn,
unsigned long flags,
const char *name, void *dev)
The key here is thread_fn.
- If you provide
thread_fn, the kernel will first executehandler(if non-null) when the interrupt occurs, then forcibly wake the kernel thread corresponding tothread_fn. - If you set
handlertoNULL, the kernel will provide a defaulthandlerwhose sole purpose is to wake yourthread_fn.
This perfectly fits the needs of modern drivers: completely decoupling the urgent acknowledgment (Hardirq) from the heavy processing logic.
Enabling and Disabling IRQs
Sometimes you need to temporarily turn off interrupts in your code.
Maybe you are reconfiguring a device, or maybe you are handling a race condition. The kernel provides several different APIs with completely different scopes—don't mix them up.
At the very bottom are local_irq_disable() and local_irq_enable().
These are "nuclear weapon" level. They disable all interrupts on the current CPU core. Nothing gets in except Non-Maskable Interrupts (NMIs). This is typically used to protect critical sections in core kernel code; driver code rarely needs this.
If you only want to disable just your device's specific IRQ line, you should use disable_irq() and enable_irq().
disable_irq(n) disables the interrupt with IRQ number n from being delivered across the entire system (if it is shared, it unfortunately drags the other devices down with it). Additionally, it waits for any currently executing interrupt handler to finish.
If you don't want to wait, you can use disable_irq_nosync(), which returns immediately but could lead to an ongoing interrupt being preempted by new logic before it finishes—which is very dangerous.
Viewing Allocated IRQ Lines
We aren't coding blindly. What happens in the system leaves a trace.
You can check the /proc/interrupts file at any time. It is a real-time "status board":
$ cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 46 0 0 0 IO-APIC 2-edge timer
1: 3 0 0 0 IO-APIC 1-edge i8042
24: 10234 5601 4200 8991 PCI-MSI 524288-edge eth0
...
Each line represents an IRQ.
- The first column is the IRQ number.
- The next few columns are interrupt counters on each CPU core (count increments by 1 each time an interrupt occurs).
- After that is the interrupt controller type and device name.
If you suspect your interrupt is being lost or not firing, the first thing to do is look at this file. If the counter is always 0, it means the hardware isn't pulling the wire at all, or the wire is connected wrong.
4.2 Understanding and Using Top Halves and Bottom Halves
Now let's pull our perspective back from the micro-level APIs and look at the strategy.
As we mentioned earlier, interrupt handlers have two major nemeses:
- They are too slow: Processing too slowly causes subsequent interrupts to be lost.
- They are too rushed: They cannot sleep, so they can't do heavy work.
To reconcile this contradiction, the Linux community has distilled a standard divide-and-conquer strategy over decades of evolution: Top Halves and Bottom Halves.
This isn't just a Linux patent; almost all modern operating systems do something similar. But in Linux, the implementations are extremely rich: Softirq, Tasklet, Workqueue, and Threaded IRQ. Here we will focus on the first two.
Softirqs and Kernel Threads
The lowest-level mechanism is the Softirq.
Softirqs are vectors statically defined at kernel compile time. You cannot dynamically register a new softirq type in a driver module. The kernel predefines a few commonly used ones, such as HI_SOFTIRQ (high priority), NET_TX_SOFTIRQ (network transmit), NET_RX_SOFTIRQ (network receive), etc.
The execution timing of softirqs is very subtle. They are usually checked right before the hard interrupt handler returns. If pending softirqs are found, the kernel will immediately find a time to execute them.
But there's a trap here.
If there is massive network traffic in the system, softirqs will trigger frantically. At this point, the CPU is so busy processing softirqs that user processes get no chance to be scheduled, making the system look like it has frozen. This is the legendary softirqd explosion.
To solve this problem, the kernel created the ksoftirqd kernel thread. When softirq processing becomes too frequent, the kernel will offload this work to the ksoftirqd threads, letting them be scheduled as normal processes to prevent starving user processes.
You can view the statistics for each softirq via /proc/softirqs:
$ cat /proc/softirqs
CPU0 CPU1
HI: 0 0
TIMER: 2831401 2703456
NET_TX: 123 89
NET_RX: 1234567 9876543
BLOCK: 45000 32000
Using Tasklets
For driver developers, Softirqs are too low-level and too dangerous (no lock protection, prone to SMP race conditions). We need a more friendly interface.
Enter the Tasklet.
Tasklets are essentially a dynamic mechanism built on top of Softirqs (specifically using HI_SOFTIRQ or TASKLET_SOFTIRQ). They allow you to register at runtime and guarantee that the same tasklet will never run simultaneously on two CPUs.
This greatly simplifies concurrency control.
Using a Tasklet usually involves two steps:
Step 1: Define and initialize the tasklet structure.
#include <linux/interrupt.h>
/* 定义 tasklet 处理函数 */
void my_do_tasklet(unsigned long data);
/* 定义 tasklet 结构体并静态初始化 */
DECLARE_TASKLET(my_tasklet, my_do_tasklet, 0);
If you need to allocate dynamically (e.g., inside a probe function), you can use tasklet_init:
struct tasklet_struct my_tasklet;
tasklet_init(&my_tasklet, my_do_tasklet, (unsigned long)data);
Step 2: Schedule it.
When you need to throw the heavy work to the bottom half, simply call this in your interrupt handler:
tasklet_schedule(&my_tasklet);
It's like whispering in the boss's (CPU's) ear: "I've noted this down; when you're free later, find that guy named my_tasklet to handle it."
The kernel will execute my_do_tasklet at some subsequent point (usually shortly after interrupts are re-enabled).
This feeling of "asynchronous scheduling" is the key to understanding the entire Linux interrupt subsystem.
4.3 Answers to a Few Remaining Questions
By this point, we've covered the core mechanisms of interrupts. But before we actually get our hands dirty, there are a few commonly confusing details that need clarification.
Edge-Triggered vs. Level-Triggered
This is a classic hardware-level divide.
- Level-Triggered: As long as the interrupt line remains high, the interrupt will keep firing. This is typically used in shared bus scenarios. This means the handler must pull the level low after processing (or manipulate the hardware to make the device pull it low), otherwise it will enter a dead loop of interrupts.
- Edge-Triggered: It only fires once at the exact moment the level goes from low to high (rising edge). After that, no matter how long you hold it high, it won't fire again unless you pull it low and then high again.
Edge-triggered is relatively simple and less likely to lose interrupts (if it's just a momentary pulse), but if you handle the interrupt too quickly, you might miss a second pulse. Level-triggered is very robust, but if the hardware isn't handled properly, it turns into an "interrupt storm."
Interrupt Context Guide: What to Do and What Not to Do
Here is a simple survival guide.
In a hard interrupt context:
- ✅ Allowed: Modifying registers, using
spinlock(the non-sleeping kind), reading and writingper-cpuvariables. - ❌ Forbidden: Calling
kmalloc(GFP_KERNEL), callingmutex_lock(), calling any function that might invokeschedule().
How do you check? The kernel provides a macro called might_sleep(). If you call a function that might sleep in a place where you cannot sleep, and the kernel has DEBUG_ATOMIC_SLEEP configured, the system will directly panic and print a stack trace. Trust me, this is much better than a mysterious crash later on.
Interrupt Masking: Default Behavior and Control
When you enter an interrupt handler, to prevent reentrancy, the kernel defaults to masking this specific IRQ line on all CPUs.
Note this detail: it masks this line on all CPUs. Why? Because if you are handling this interrupt on CPU0 and CPU1 receives an interrupt from the same device, and they share data, you'd have to add locks. To simplify this complexity, the kernel defaults to an "exclusive" strategy.
But this impacts performance. If you are certain your handling is fast and you can protect the data with a spinlock, you can use the IRQF_DISABLED flag (this is rarely used now, since the kernel's default behavior is already quite complex), or manually call irq_set_affinity inside the handler to adjust.
Interrupt Stacks: Does the Kernel Maintain Separate Stacks?
This is a deep question.
In early Linux, interrupts directly borrowed the current process's kernel stack. This meant that if your process stack was already deep (e.g., complex system call nesting) and an interrupt suddenly arrived, the stack might overflow.
Current kernels (on most architectures) allocate a dedicated irq_stack for each CPU.
When an interrupt occurs, the CPU switches to this dedicated stack to run. This greatly increases system robustness and also allows us to do slightly more things in an interrupt (but only slightly more).
Alright, the theoretical groundwork is mostly laid. Now we have the "doorbell" (hardware interrupt), the "secretary" (Generic IRQ layer), and the "clerk" (handler function).
In the next section, we will apply this theory to real code and see how a modern driver elegantly manages all of this through the managed resource API.