Skip to main content

4.2 Technical Preparation and the Truth About Kernel Interrupt Handling

In the previous section, we discussed a lot of theory about the "doorbell"—what interrupts are, how the CPU responds, and how the kernel stack switches. Now, let's bring our focus back to reality. As driver developers, we don't directly wire up those cables; that's the job of the kernel hardware layer and the BSP. What we actually interact with is an abstraction layer provided by the kernel.

But before we start writing code, we need to confirm two things:

  1. Is your environment ready? (This sounds like a no-brainer, but many people only realize their kernel headers aren't installed when compilation fails.)
  2. When you write request_irq(), who exactly are you making a request to, and for what?

In this section, we will tear down Linux's generic IRQ handling layer and see how it shields us from underlying hardware differences. Then, through real code snippets (from an Intel network driver and an STM32 driver), we will step-by-step break down request_irq, devm_request_irq, and the modern threaded interrupt model.


Hardware Interrupts and the Kernel's Takeover Process

Most peripherals (network cards, disks, keyboards, mice) don't want to be polled by the CPU. Polling is a brute-force approach that not only wastes CPU cycles but also consumes power. The efficiency of interrupts lies in the fact that software only runs when something actually happens.

Let's quickly run through the process at the hardware level. Although you don't need to solder circuit boards, you do need to understand how signals reach the kernel.

Modern motherboards all have an interrupt controller chip.

  • On x86, this is called the IO-APIC (IO-[Advanced] Programmable Interrupt Controller).
  • On ARM, it is usually the GIC (Generic Interrupt Controller).

For the sake of discussion, we'll uniformly call it the PIC.

Simplified path of interrupt flow:

  1. A peripheral (e.g., a network card receives a packet) pulls high the line connected to the PIC.
  2. The PIC captures this signal, stores it in a register, and then pulls high the interrupt pin leading to the CPU.
  3. After executing every instruction, the CPU checks the interrupt pin. Once an interrupt is detected, the hardware automatically saves the scene and jumps to the kernel's preset low-level handling code (usually asm_do_IRQ on ARM).
  4. The kernel's low-level code eventually calls into the generic IRQ layer, looking up the list of handler functions corresponding to this IRQ.
  5. Your driver function gets called.

Here is a counterintuitive fact: Hardware interrupts are the highest-priority events in Linux. They preempt any running code—whether it's a user-space browser or other kernel threads. Unless you use "threaded interrupts," which we will cover later.

Analogy time: Restaurant pagers

You can think of the CPU as the chef in a restaurant, and the peripherals as the waiters.

  1. Old approach (Polling): The chef runs out every two seconds and yells, "Any dishes ready? No? I'll go back to chopping." The chef is exhausted, and the food might get cold.
  2. New approach (Interrupt): The chef focuses on chopping. When a dish is ready, the waiter presses a pager (PIC). When the pager rings, the chef stops the knife mid-air (saves the scene), runs out to take the order (interrupt handler), and comes back to chopping after handling it.

However, this analogy has an imperfect part: the chef can decide the order in which to take orders. But in the real hardware world, once an interrupt (pager) rings, the chef must immediately drop everything to handle it, no matter how expensive the caviar they are cutting. This is the origin of atomicity.


NAPI: When the "Pager" Rings Too Fast

Speaking of network cards, there is an important modern mechanism we must mention. In early legacy network cards (10M/100M era), triggering an interrupt for every incoming packet was perfectly normal. But today, following that logic, a 10 Gbps network card might trigger millions of interrupts per second. This would trap the CPU in a state called Livelock—it's so busy handling interrupts that it has no time for anything else, making the system appear frozen.

To solve this problem, modern operating systems (including Linux) and network drivers introduced NAPI (New API). The idea is a hybrid model:

  • Normally, it uses interrupts. Once the first packet arrives, the driver disables interrupts and switches to polling mode, draining that entire burst of packets in one go.
  • After draining, it re-enables interrupts and waits for the next burst.

In code, this usually manifests as calling napi_schedule() inside the interrupt handler. We will see it in the code examples later.


Allocating a Hardware IRQ: Starting with request_irq

As a driver author, one of your core tasks is to "catch" interrupts. But the problem is that interrupt routing methods vary wildly across platforms (PCI devices read through bus configuration, embedded devices use the Device Tree). To avoid driving driver developers crazy, the Linux kernel provides a generic IRQ handling layer.

This means the code you write can, in theory, be compiled and run on x86, ARM, or even RISC-V without modifying a single line of code.

The kernel maintains an interrupt descriptor array, indexed by the IRQ number. Under each IRQ hangs a linked list, and the nodes on this list are the registered handler functions (struct irqaction). Your task is simply: hang your function on this linked list.

The kernel provides four main APIs to do this:

  1. request_irq() - The old-school approach, manually managed.
  2. devm_request_irq() - The managed version, recommended.
  3. request_threaded_irq() - Threaded interrupt.
  4. devm_request_threaded_irq() - Managed threaded interrupt, highly recommended.

Let's break them down in order.


request_irq() - The Classic Entry Point

This is just like calling sigaction() in user space to register a signal handler, except this is for the kernel.

#include <linux/interrupt.h>

int __must_check
request_irq(unsigned int irq,
irq_handler_t (*handler_func)(int, void *),
unsigned long flags,
const char *name,
void *dev);

The parameters are extremely important; let's look at them one by one:

  1. int irq: This is the IRQ number you want to register.
    • How do you get this number? This is a classic question.
    • Modern embedded: Parse the Device Tree (DTS).
    • PCI devices: Obtain via pci_dev->irq.
    • Legacy platforms: Might be hardcoded (definitely don't learn this).
  2. irq_handler_t handler: Your handler function pointer.
    • The function prototype is irqreturn_t (*)(int, void *).
  3. unsigned long flags: Flag bitmask. Set to 0 for default behavior.
  4. const char *name: Your driver's name. This will appear in /proc/interrupts for easy debugging.
  5. void *dev: This is a "private data" pointer.
    • When the interrupt occurs, this pointer is passed back to your handler.
    • Key point: If you use a shared interrupt (IRQF_SHARED), this parameter must be non-NULL. Otherwise, how would the kernel know who to free? If you really have nothing to pass, passing THIS_MODULE is a workaround.

Return value: Follows kernel convention, 0 for success, negative for failure. Since it has __must_check, you must check the return value.

Analogy callback: Back to the restaurant Remember that pager? request_irq is like writing on the kitchen blackboard: "If pager number 5 rings, please call Zhang San to handle it."

Here, the dev parameter is like telling the chef: "Zhang San is holding this plate of food."

The difference is: In a restaurant, there is only one Zhang San. But in Linux, if pager 5 is shared (e.g., several tables share one bell), the chef needs to yell: "Who ordered dish number 5? Here is your token." Only when the token matches will the person go do the work.

Freeing an IRQ: free_irq()

Since you borrowed it, you must return it. This is usually called in the driver's remove() or disconnect() method.

void *free_irq(unsigned int irq, void *dev_id);

Note that the second parameter must be exactly the same dev pointer you passed during registration.

⚠️ Pitfall warning:

  • If it's a shared interrupt, make sure to disable this interrupt on the board before calling free_irq(). Otherwise, if an interrupt arrives during the freeing process and your original handler is already unbound, the system might get confused.
  • free_irq() will wait for all currently executing handlers to finish before returning. This means if you deadlock yourself inside the handler, free_irq will never return.

Interrupt Flags: The IRQF_ Family

That flags parameter is a bitmask used to control interrupt behavior. They are defined in <linux/interrupt.h>.

  • IRQF_SHARED: This is one of the most common flags. It allows multiple devices to share the same IRQ number.

    • Strict requirement: You must provide a unique dev_id for each device.
    • Typical scenario: PCI devices, legacy ISA devices.
    • Consequence: If your interrupt handler gets called, you must first read the hardware registers to confirm whether it was your device that triggered it. If not, return IRQ_NONE immediately.
  • IRQF_ONESHOT: This is for threaded interrupts.

    • Meaning: When the hardirq (Top Half) finishes executing, do not immediately re-enable this IRQ.
    • Purpose: Keep the IRQ line disabled until the corresponding threaded handler finishes running. This is crucial for level-triggered interrupts; otherwise, it will form an "interrupt storm".
  • IRQF_TRIGGER_*: These flags specify the electrical trigger characteristics (rising edge, falling edge, high level, low level).

    • Usually, these are configured by the Device Tree or kernel BSP code, and drivers rarely set them manually. But if you are writing some bare-metal drivers, you might use them.

Level-Triggered vs. Edge-Triggered

This is a hardware concept, but it has a fatal impact on driver logic.

  1. Level-Triggered:

    • As long as the signal line is high, the interrupt keeps firing.
    • Rule: You must "extinguish" it in the handler (e.g., write to a register to clear it). If you don't, it will trigger again immediately after the handler returns. This is a nightmare for shared interrupts—your handler might be called frantically.
  2. Edge-Triggered:

    • Only triggers once at the exact moment the signal goes from low to high (rising edge).
    • Characteristic: Easy to handle, not prone to losing events, but might miss events under heavy load.

Analogy callback: Back to the pager

  • Edge-triggered: The pager button is a momentary switch. Press it for a "ding", and it stops when you let go. Even if you hold it down, there's only one ding.
  • Level-triggered: The pager is a toggle switch. Flick it up, and it keeps ringing until you flick it down (Ack).
  • If you forget to flick it down, the chef will stand at your door holding a kitchen knife, staring at you.

Hands-on: Dissecting the Intel Network Driver Code

Talk is cheap. Let's look at some real code—the Intel 82597EX (IXGB) 10GbE network card driver.

Registering the interrupt (drivers/net/ethernet/intel/ixgb/ixgb_main.c):

static int ixgb_up(struct ixgb_adapter *adapter)
{
struct net_device *netdev = adapter->netdev;
int err, irq_flags = IRQF_SHARED;

err = request_irq(adapter->pdev->irq, ixgb_intr, irq_flags,
netdev->name, netdev);
if (err) {
// 错误处理...
}
// ...
}

Step-by-step breakdown:

  1. IRQ source: adapter->pdev->irq. This is read from the configuration space for us by the PCI layer.
  2. Handler: ixgb_intr.
  3. Flags: IRQF_SHARED. Because the PCI specification allows devices to share IRQs.
  4. Name: netdev->name. This way, you can see names like eth0 in /proc/interrupts.
  5. Dev: netdev. This is a struct pointer that the kernel can use to reverse-lookup the adapter private data (netdev_priv).

Freeing the interrupt:

static void ixgb_down(struct ixgb_adapter *adapter, bool kill_watchdog)
{
// ...
napi_disable(&adapter->napi); /* 必须先禁用 NAPI */
ixgb_irq_disable(adapter); /* 硬件层禁用中断 */
free_irq(adapter->pdev->irq, netdev);
}

Notice the order: disable NAPI first, then disable hardware interrupts, and finally free the IRQ. This is very important.


Implementing the Interrupt Handler

Your handler runs in Interrupt Context. This is a very strict atomic environment.

Function signature:

static irqreturn_t my_handler(int irq, void *dev_id)
{
// ...
}

Return value type: irqreturn_t. It's actually an enum:

  • IRQ_NONE (0): Not my interrupt, or I haven't finished handling it.
  • IRQ_HANDLED (1): I handled it.
  • IRQ_WAKE_THREAD (2): I handled a bit of it; now wake up a kernel thread to continue.

Ironclad rules of interrupt context: Your handler is in atomic context.

  • Cannot sleep: Any function that might call schedule() is forbidden.
    • copy_to_user() / copy_from_user(): Might trigger a page fault, leading to sleep.
    • kmalloc(..., GFP_KERNEL): Must use GFP_ATOMIC instead.
    • mutex_lock(): Must use spinlock instead.
  • Cannot execute for too long: Although there is no hard time limit, it's generally recommended to finish within tens of microseconds. If it takes longer, you need to consider a Bottom Half mechanism.

Analogy callback: Back to the restaurant Your handler is the chef rushing into a private room to handle an emergency.

  • He can't stop to answer the phone (sleep).
  • He can't chat for half an hour (takes too long).
  • He must quickly determine "is this the right table's dish", and if so, serve the food or pacify the guests, then immediately run back to the kitchen.

If the guest needs to complain for half an hour (a big task), the chef should say: "Please wait a moment, I'll call the manager." This is just like IRQ_WAKE_THREAD.

Code in action: Keyboard Controller Interrupt (i8042)

Let's see how the classic keyboard/mouse controller driver does it (drivers/input/serio/i8042.c):

static irqreturn_t i8042_interrupt(int irq, void *dev_id)
{
unsigned char str, data;

str = i8042_read_status(); // 1. 读状态寄存器
data = i8042_read_data(); // 2. 读数据

// 3. 把数据交给上层输入子系统
if (likely(serio && !filtered))
serio_interrupt(serio, data, flag);

return IRQ_RETVAL(ret);
}

Fast, accurate, and ruthless. Read the register, pass the data, return. No messing around.


Modern Approach: Managed Resources (devm_request_irq)

The traditional request_irq has a major headache: if you forget to call free_irq in remove, or if a certain path in remove returns early causing free_irq to be skipped, the IRQ is leaked. The next time you call insmod, registration will fail.

The modern kernel recommends using the Devres (Managed Resources) mechanism.

API:

int __must_check
devm_request_irq(struct device *dev, unsigned int irq,
irq_handler_t handler,
unsigned long irqflags, const char *devname,
void *dev_id);

The only difference is the first parameter: struct device *dev. Once you pass it in, the kernel records that this IRQ belongs to this device. When the device is removed (driver unloaded), the kernel will automatically call free_irq for you.

Practical example:

static int my_driver_probe(struct platform_device *pdev)
{
struct resource *res;
int ret;

// 1. 获取 IRQ 资源 (从 Device Tree 或平台数据)
res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
if (!res) {
dev_err(&pdev->dev, "get interrupt resource failed.\n");
return -ENXIO;
}

// 2. 注册托管中断
ret = devm_request_irq(&pdev->dev, res->start, my_irq_handler,
IRQF_TRIGGER_HIGH, pdev->name, pdev);
if (ret) {
dev_err(&pdev->dev, "request interrupt failed.\n");
return ret;
}

// 哪怕这里 return ret; 出错,不需要手动 free_irq
return 0;
}

// 注意:这里不需要 remove 函数来 free_irq,内核全包了

This is just like using a smart pointer (std::shared_ptr) to automatically manage the lifecycle.


The Ultimate Topic: Threaded Interrupts

Remember when we said "you cannot sleep in interrupt context"? This causes big trouble for handling high-throughput data (like large-packet network transfers). If you really need to do something slightly complex in an interrupt, or if you're not sure whether it might sleep, is there a way out?

Yes, and that is threaded interrupts.

It's an idea borrowed from the real-time Linux (PREEMPT_RT) project and merged into the Mainline Kernel (2.6.30). Its core concept is: Turn interrupt handling into a kernel thread.

Since it's a thread, it can sleep, be scheduled, and be preempted by other higher-priority threads.

request_threaded_irq()

int __must_check
request_threaded_irq(unsigned int irq,
irq_handler_t handler, // Primary handler
irq_handler_t thread_fn, // Threaded handler
unsigned long flags,
const char *name, void *dev);

There are two handlers here:

  1. handler (Primary): This runs in hard interrupt context. It must be extremely fast.
    • If you don't set this (pass NULL), the kernel will install a default one for you that only does one thing: return IRQ_WAKE_THREAD.
  2. thread_fn (Threaded): This runs in a kernel thread context.
    • Here you can do almost anything (as long as you don't forget to add locks).

Workflow:

  1. A hardware interrupt occurs.
  2. The Primary handler executes instantly (atomic context).
    • Does the bare minimum work.
    • Returns IRQ_WAKE_THREAD.
  3. The kernel wakes the corresponding kernel thread (usually named irq/24-eth0).
  4. The Threaded handler executes (process context), doing the rest of the dirty work.

Analogy callback: Back to the restaurant

  • Normal interrupt: The chef personally rushes into the private room, drops off the food, and runs away without saying a word.
  • Threaded interrupt:
    1. Primary Handler: The waiter hears the pager ring, immediately rushes over to take a look, and realizes it's table 5.
    2. Threaded Handler: The waiter runs back to the kitchen, gives the order to the chef, and the chef starts cooking (this takes time).
    3. While cooking, the waiter (interrupt system) can continue responding to other pagers.

⚠️ Important flag: IRQF_ONESHOT. If you use threaded interrupts, you usually must add IRQF_ONESHOT. Because your threaded handler runs slowly, if you don't keep the interrupt line disabled, the hardware will keep sending interrupts to the CPU, causing it to collapse.

Managed version: Of course, there is also devm_request_threaded_irq(), which is the current best practice.

Code in action: STM32 Driver

Let's see how ST's I2C driver uses it (drivers/i2c/busses/i2c-stm32f7.c):

static int stm32f7_i2c_probe(struct platform_device *pdev)
{
// ...
ret = devm_request_threaded_irq(&pdev->dev, irq_event,
stm32f7_i2c_isr_event, // Primary (Hardirq)
stm32f7_i2c_isr_event_thread, // Thread fn
IRQF_ONESHOT,
pdev->name, i2c_dev);
// ...
}

Here, the Primary handler does the hardware check, and the Threaded handler does the actual I2C data transfer.


Why Use Threaded Interrupts?

Besides the reason of "I want to sleep," there is a deeper reason: priority control.

In standard Linux, hardware interrupts have an extremely high priority; they preempt any user-space process, even if your process is real-time priority 99. But in real-time systems, this is terrible. A network interrupt storm could starve your critical control tasks.

With threaded interrupts, interrupt handling becomes a kernel thread with a priority of, say, 50. If you have a user-space real-time task (SCHED_FIFO, prio 60), it can preempt the interrupt handling thread!

This turns uncontrollable hard interrupts into controllable threads. This is one of the core logics of RT-Linux.


Context Checking Tools

Sometimes you write a piece of code but aren't sure if it's running in interrupt context. The kernel provides macros to help you:

if (in_irq())
// 我在硬中断上下文
if (in_softirq())
// 我在软中断上下文
if (in_task())
// 我是普通进程

There is also a macro might_sleep(); if you call it in a context where sleeping is forbidden, the kernel will immediately OOPS (if CONFIG_DEBUG_ATOMIC_SLEEP is enabled).


Viewing IRQ Information: /proc/interrupts

Finally, let's see how to verify our work. Let's read /proc/interrupts:

$ cat /proc/interrupts
CPU0 CPU1
24: 1234 5678 IO-APIC 24-fasteoi eth0
25: 10 20 IO-APIC 25-edge i8042

Column breakdown:

  1. IRQ number: The first column (24, 25).
  2. Interrupt count: The number of triggers on each CPU core. This is useful for load balancing analysis.
  3. Controller type: IO-APIC or GIC, etc.
  4. Hardware trigger type: fasteoi, edge, etc.
  5. Device name: The last column, eth0 or i8042. This is the name you passed in request_irq.

Section Summary

We've come a long way. From how hardware signals pull high the CPU pin, to how the kernel abstracts it with the irq_desc array, to how you register a handler in your code using devm_request_threaded_irq. We even explored how to use NAPI to deal with network floods, and how to use threaded interrupts to let real-time tasks starve interrupts (or vice versa).

Next, we will dive deep into a special kind of "half-interrupt"—softirqs and tasklets. They are the kernel's core mechanisms for handling tasks that are "slightly less urgent, but still very urgent."

Are you ready? In the next section, we will stop talking about hardware and start talking about the kernel's internal scheduling magic.