3.2 Accessing Hardware I/O Memory in the Kernel

Let's get straight to the point.

As a driver author, you'll repeatedly face this scenario: you need to directly manipulate registers or memory on a peripheral chip. This is what we call "I/O memory." In low-level driver development, this is essentially the core task—programming hardware by issuing instructions to these registers.

But there's a catch. In Linux, you want to directly access this hardware I/O memory? It's not that simple.

Understanding the Dilemma of Direct Access

You need to understand one fact: the I/O memory on hardware chips is absolutely not ordinary RAM.

The Linux kernel strictly forbids kernel modules or driver authors from directly accessing these hardware I/O memory locations. Why? We covered this in detail in Chapter 7 of our previous book, Linux Kernel Programming—on modern operating systems with virtual memory management (MMU), all memory accesses must go through the MMU and page tables.

Let's briefly review this process, as it's key to understanding the issue. When software (a process or the kernel) attempts to access an address, it doesn't just throw the address directly onto the physical bus. Here's a typical "routing" process:

CPU Cache Check: First, it checks whether the data corresponding to this virtual address is already in the CPU's L1/L2 cache. If it is, great; if not, this is an expensive LLC miss.
TLB Lookup: If the cache misses, the virtual address is handed to the MMU. The MMU first checks the TLB (Translation Lookaside Buffer). If the TLB has a record of the virtual-to-physical address mapping, it gets the physical address directly; if not found, this is an expensive TLB miss.
Page Table Walk: If the TLB also fails, the MMU has to dutifully walk the page table. For user-space access, it checks the user page tables; for kernel access, it checks the kernel page tables. Ultimately, it translates the virtual address into a physical address.
Onto the Bus: This physical address is finally placed on the bus, and the actual read/write operation occurs.

(The exact order of this process varies across architectures—for example, ARM often checks the MMU before the cache—but the logic is consistent.)

Think about it: on a modern OS, even ordinary RAM cannot be directly physically accessed by software; everything is virtualized. So, for hardware peripheral memory that isn't even RAM, the situation is even more complex.

If this peripheral memory doesn't even count as RAM, is it in the page tables? If not, how do we access it? Without solving this problem, driver code would just be issuing commands into the void.

The Solution—Mapping I/O Memory and I/O Ports

To solve this problem, modern processors offer two paths. This is the fork in the road to understanding hardware drivers.

Memory-Mapped I/O (MMIO): A portion of the processor's address space is "ceded" to peripheral devices. In other words, the peripheral's registers are mapped into our memory address space.
Port-Mapped I/O (PMIO / PIO): Dedicated assembly instructions (and corresponding machine code) are provided to access a separate I/O address space. This space is distinct from the memory space.

We'll dive deep into both techniques next. But before that, we must first learn one thing: politely asking the kernel for permission.

Asking the Kernel for Permission

Don't rush to act—think about it: the kernel is the ultimate manager of system resources. If you want to use I/O resources, you must file a request first.

This isn't just a formality. When you request resources, the kernel actually builds internal data structures (like struct resource) to record which region is occupied by which driver, preventing conflicts.

A proper I/O operation flow must include these three steps:

Before I/O: Request the right to use the memory or port region.
During I/O: Perform the actual read/write (using MMIO or PMIO).
After I/O: Return the region to the kernel.

To implement this flow, the kernel provides a series of APIs. Which group you use depends on whether you take the MMIO or PMIO route.

Method of access	Before performing any I/O	Perform the I/O	After performing the I/O
MMIO	`request_mem_region()`	(See the MMIO section below)	`release_mem_region()`
PMIO	`request_region()`	(See the PMIO section below)	`release_region()`

These macros are defined in the linux/ioport.h header file. Let's look at their signatures:

/* 申请和释放 MMIO 区域 */
struct resource *request_mem_region(resource_size_t start, unsigned long n, const char *name);
void release_mem_region(resource_size_t start, unsigned long n);

/* 申请和释放 PMIO 区域 */
struct resource *request_region(unsigned long start, unsigned long n, const char *name);
void release_region(unsigned long start, unsigned long n);

Parameter descriptions:

start: The starting address of the I/O memory region or port.
- For MMIO, this is a physical (or bus) address.
- For PMIO, this is a port number.
n: The length of the region (in bytes or number of ports).
name: The name you give to this region, usually the driver name. This name appears in the /proc filesystem for easy debugging.

The return value is a struct resource pointer. If it returns NULL, the request failed (usually because it's already occupied by another driver), and the driver typically returns -EBUSY.

Alright, with the ticket in hand, let's see how to use it. We'll start with the most mainstream approach: MMIO.

Understanding and Using Memory-Mapped I/O (MMIO)

In MMIO mode, the CPU knows that certain special regions in its address space are reserved for peripherals. You can directly consult the processor or SoC's datasheet—the physical memory map will spell this out clearly.

To give you a concrete feel, let's look at a real example: the Raspberry Pi.

The Raspberry Pi uses Broadcom's BCM2835 (or later) SoC. On page 90 of the official document BCM2835 ARM Peripherals, there is a physical memory map. The mapping for the GPIO register block looks like this:

(Description of the original Figure 3.1: Shows the address range of the GPIO register block)

The most critical column here is Address. This is the physical address (or bus address)—the location of the GPIO registers as seen in the ARM processor's physical address space.

Starting address: 0x7e200000
Length: The documentation states there are 41 32-bit registers, so the length is approximately 41 * 4 = 164 bytes.

⚠️ Warning The Raspberry Pi situation is actually a bit more complex because the BCM2835 has multiple MMUs. There's a VideoCore MMU responsible for converting ARM bus addresses to ARM physical addresses, and then the regular ARM MMU converts physical addresses to virtual addresses. In principle, however, it's still a block of memory mapped into the address space.

Using the `ioremap*()` API

As we mentioned in the previous section, you absolutely cannot directly read or write these physical addresses (0x7e2...). The correct approach is to tell Linux to map these bus addresses into the kernel's virtual address space (VAS). Only then can we access them via a kernel virtual pointer.

This is where the ioremap() API comes in.

#include <asm/io.h>
void __iomem *ioremap(phys_addr_t offset, size_t size);

Note the first parameter of this API: phys_addr_t. This is one of the few scenarios in Linux driver development where you need to directly provide a physical address (another is DMA operations).

When you call ioremap(), the kernel modifies the page tables, carves out a region in the kernel's VAS, and establishes a mapping from a "virtual address" to the "hardware physical address."

Just as the mmap() system call maps kernel memory to user space, ioremap() maps peripheral I/O memory to kernel space.

This API returns a kernel virtual address (KVA) of type void *. But there's a strange suffix here, __iomem, which forms the void __iomem * type.

__iomem is merely a compiler attribute that disappears after compilation.
Its purpose is to remind human developers (and static analysis tools): this is an I/O address, not an ordinary memory pointer! Never use it as a regular pointer!

Returning to the Raspberry Pi GPIO example. If we want to map that GPIO register block into kernel space, the code would look something like this:

#define GPIO_REG_BASE    0x7e200000  // 物理基地址
#define GPIO_REG_LEN     164         // 41 个寄存器 * 4 字节

static void __iomem *iobase;

/* 1. 先向内核申请这块区域的使用权 */
if (!request_mem_region(GPIO_REG_BASE, GPIO_REG_LEN, "mydriver")) {
    pr_warn("couldn't get region for MMIO, aborting\n");
    return -EBUSY;
}

/* 2. 建立映射 */
iobase = ioremap(GPIO_REG_BASE, GPIO_REG_LEN);
if (!iobase) {
    /* 映射失败处理... */
    release_mem_region(GPIO_REG_BASE, GPIO_REG_LEN);
    return -ENOMEM;
}

/* 3. 现在可以通过 iobase 进行 I/O 操作了... */

/* 4. 用完了，记得清理现场 */
iounmap(iobase);
release_mem_region(GPIO_REG_BASE, GPIO_REG_LEN);

Here, iobase is a kernel virtual address (KVA). It typically resides in the kernel's vmalloc region.

The following diagram illustrates this mapping relationship:

(Description of the original Figure 3.2: Shows how Physical I/O Peripherals are mapped to the vmalloc area of the Kernel VAS through page tables)

The Next Generation—`devm_*` Managed API

If you've written modern Linux drivers, you'll find that the approach above is already considered "old-school." Although many legacy drivers still use it and understanding it is fundamental, modern driver authors are expected to use a more elegant resource management API—the devm_* family of functions.

Just as we use devm_kmalloc() instead of kmalloc(), the advantage of devm_ioremap() is that when a driver detaches or a device is unloaded, the kernel automatically calls iounmap() for you, without you needing to worry about it at all.

Its signature is as follows:

void __iomem *devm_ioremap(struct device *dev, resource_size_t offset,
                           resource_size_t size);

Note that the first parameter is a pointer to struct device. In a platform driver's probe function, this pointer is usually readily available.

Obtaining Device Resources

Now the question arises: where does the second parameter of devm_ioremap(), offset (the physical address), come from?

We can't just hardcode it in the driver, right? Although in the past (in the early ARM era) everyone did exactly that, hardcoding I/O resources in board files (arch/arm/mach-xxx). But now, especially in the ARM and embedded domains, everyone uses the Device Tree.

The Device Tree is a data structure written in a specific language that describes hardware topology (.dts files). It is compiled into binary form (.dtb) at kernel build time and passed to the kernel by the bootloader. The kernel parses it at boot time and automatically generates device and resource information.

Driver authors typically use the platform_get_resource() API to extract physical addresses from these data structures.

Let's look at a real kernel code snippet from the Samsung Exynos 4 SoC video driver (drivers/gpu/drm/exynos/exynos_mixer.c):

struct resource *res;

/* 从 platform 设备中获取 IORESOURCE_MEM 类型的资源 */
res = platform_get_resource(mixer_ctx->pdev, IORESOURCE_MEM, 0);
if (res == NULL) {
    dev_err(dev, "get memory resource failed.\n");
    return -ENXIO;
}

/* 获取到了物理地址，现在映射它 */
mixer_ctx->mixer_regs = devm_ioremap(dev, res->start, resource_size(res));
if (mixer_ctx->mixer_regs == NULL) {
    dev_err(dev, "register mapping failed.\n");
    return -ENXIO;
}

One-Stop Service: `devm_ioremap_resource()`

There's an even lazier, more commonly used API called devm_ioremap_resource(). It does three things:

Checks resource validity.
Calls devm_request_mem_region() to request the memory.
Calls devm_ioremap() to establish the mapping.

This is so convenient that it was called over 1,400 times in the Linux 5.4 kernel. Its signature is as follows:

void __iomem *devm_ioremap_resource(struct device *dev, const struct resource *res);

Usage example (from the Raspberry Pi random number generator driver):

static int bcm2835_rng_probe(struct platform_device *pdev)
{
    struct resource *r;

    /* 拿到资源结构体 */
    r = platform_get_resource(pdev, IORESOURCE_MEM, 0);

    /* 一步到位：检查、申请、映射 */
    priv->base = devm_ioremap_resource(dev, r);
    if (IS_ERR(priv->base))
        return PTR_ERR(priv->base);

    /* ... */
}

If you see devm_request_and_ioremap() online, don't panic—that's an antique from before 2013, now replaced by devm_ioremap_resource().

Viewing Mappings via `/proc/iomem`

When you successfully call request_mem_region(), you can see the corresponding entry in the /proc/iomem pseudo-file. This requires root privileges.

On an x86_64 virtual machine:

$ sudo cat /proc/iomem
00000000-00000fff : Reserved
00001000-0009fbff : System RAM
...
00100000-3ffeffff : System RAM
18800000-194031d0 : Kernel code
...
fee00000-fee00fff : Local APIC

Note that the left column consists entirely of physical addresses (or bus addresses). You can see where the system RAM is and where the Kernel's code segment resides in physical memory.

If you run this on a Raspberry Pi, you'll see a different style:

pi@raspberrypi:~ $ sudo cat /proc/iomem
00000000-3b3fffff : System RAM
...
3f200000-3f2000b3 : gpio@7e200000
3f201000-3f2011ff : serial@7e201000

Notice that gpio@7e200000?

The left 3f200000... is the ARM bus address.
The right @7e200000 is the ARM physical address.
This is another special visual effect brought by the BCM2835's multi-layer MMU setup.

A few key points about /proc/iomem:

It shows the I/O memory regions currently mapped by the kernel or drivers.
Entries are created when request_mem_region() is called.
Entries are removed when release_mem_region() is called.

Alright, the address is mapped—how do we actually read and write the data?

MMIO — Performing the Actual I/O

Now, the peripheral's I/O memory is mapped into your kernel VAS. To you, it looks just like a block of ordinary memory.

But never treat it as ordinary memory!

You cannot directly use C's conventional operations like *ptr = value or val = *ptr to read or write it. The reasons involve memory barriers, cache side effects, endianness issues, and more. The kernel provides a dedicated set of wrapper APIs.

Performing 1- to 8-Byte Reads and Writes

The kernel provides read and write functions for different bit widths (8, 16, 32, 64 bit):

Read: ioread8(), ioread16(), ioread32(), ioread64()
Write: iowrite8(), iowrite16(), iowrite32(), iowrite64()

Their signatures are as follows:

#include <linux/io.h>

u8 ioread8(const volatile void __iomem *addr);
u16 ioread16(const volatile void __iomem *addr);
u32 ioread32(const volatile void __iomem *addr);
/* ... ioread64 ... */

void iowrite8(u8 value, volatile void __iomem *addr);
void iowrite16(u16 value, volatile void __iomem *addr);
void iowrite32(u32 value, volatile void __iomem *addr);
/* ... iowrite64 ... */

Here, addr is the return value you got from ioremap(), plus an offset.

For example, to read a 32-bit register (assuming an offset of 0x10):

u32 reg_value;
reg_value = ioread32(iobase + 0x10);

⚠️ Warning These I/O routines directly manipulate hardware, so they inherently do not fail (there's no meaningful error code to return). If your driver isn't working, it's usually because the address is wrong, the mapping is wrong, or the offset was miscalculated—not because the ioread32 function itself reported an error.

A common hardware self-test technique is the "loopback test": write a value in, then read it back. If the values match, it indicates the hardware connection and data path are basically sound.

Performing Repetitive (Block) I/O Operations

If you need to read or write hundreds of bytes (like a FIFO), wrapping ioread8 in a for loop works but isn't efficient. The kernel provides repetitive versions of these APIs, which typically use highly optimized assembly loops internally.

Read: ioread8_rep(), ioread16_rep(), ioread32_rep(), ioread64_rep()
Write: iowrite8_rep(), iowrite16_rep(), iowrite32_rep(), iowrite64_rep()

Taking 8-bit repetitive reads as an example:

void ioread8_rep(const volatile void __iomem *addr, void *buffer, unsigned int count);

This continuously reads count bytes from the MMIO address addr and fills the kernel buffer buffer.

Similarly, for repetitive writes:

void iowrite8_rep(volatile void __iomem *addr, const void *buffer, unsigned int count);

`memset` and `memcpy` Variants

For MMIO, the standard memset() and memcpy() won't work. You must use the dedicated I/O versions:

void memset_io(volatile void __iomem *addr, int value, size_t size);

void memcpy_fromio(void *buffer, const volatile void __iomem *addr, size_t size);
void memcpy_toio(volatile void __iomem *addr, const void *buffer, size_t size);

memset_io: Fills an I/O memory region with a specific value.
memcpy_fromio: Copies data from hardware to kernel memory.
memcpy_toio: Copies data from kernel memory to hardware.

Additional Note: There's also an older set of APIs in the kernel: readb(), readw(), readl(), readq() and their corresponding write...(). They are functionally similar to ioread..., but modern drivers are recommended to use the ioread... family. We mention them here only so you won't be completely baffled when reading old code (like drivers from the 2.6 era).

Understanding and Using Port-Mapped I/O (PMIO)

Having covered MMIO, let's look at that "parallel universe"—PMIO (also known as PIO).

In PMIO mode, the CPU has dedicated assembly instructions (like x86's in / out) to read and write I/O ports. This I/O address space is completely independent of the memory address space.

On x86, this port address space is typically 0x0000 to 0xffff (64 KB).
Don't confuse "ports" here with network ports (TCP/UDP ports). I/O ports here are essentially another name for hardware registers.

Although most modern processors (including ARM) rely primarily on MMIO, the x86 architecture still extensively retains PMIO. Ancient yet important devices like the keyboard controller i8042, DMA controllers, timers, and RTCs still live in the I/O port space.

PMIO — Performing the Actual I/O

Compared to MMIO's layered mapping, PMIO is much more straightforward and brutal, because the CPU itself has instruction-level support.

Of course, politeness is still required—we still need to call request_region() and release_region() first.

The APIs for reading and writing I/O ports are:

Read: inb(), inw(), inl() (b=8bit, w=16bit, l=32bit)
Write: outb(), outw(), outl()

Their signatures are as follows:

u8 inb(unsigned long addr);
u16 inw(unsigned long addr);
u32 inl(unsigned long addr);

void outb(u8 value, unsigned long addr);
void outw(u16 value, unsigned long addr);
void outl(u32 value, unsigned long addr);

Here, addr is a port number (like 0x60), not a memory address.

PMIO Example: The `i8042` Keyboard Controller

Let's see how the classic i8042 driver uses PMIO.

In the driver header file drivers/input/serio/i8042-io.h, the register (port) addresses are defined:

#define I8042_COMMAND_REG   0x64
#define I8042_STATUS_REG    0x64
#define I8042_DATA_REG      0x60

You might ask: why do COMMAND_REG and STATUS_REG have the same address? This is typical hardware design: reading and writing the same port has different meanings. Reading it accesses the status register; writing it accesses the command register.

Because these are all 8-bit registers, the driver exclusively uses inb and outb:

static inline int i8042_read_data(void)
{
    return inb(I8042_DATA_REG);
}

static inline void i8042_write_command(int val)
{
    outb(val, I8042_COMMAND_REG);
}

Simple and direct.

Viewing Ports via `/proc/ioports`

Just as /proc/iomem corresponds to MMIO, the kernel provides /proc/ioports to view currently occupied I/O ports.

On an x86_64 virtual machine:

$ sudo cat /proc/ioports
0000-0cf7 : PCI Bus 0000:00
  0000-001f : dma1
  0040-0043 : timer0
  0060-0060 : keyboard
  0064-0064 : keyboard
  0070-0071 : rtc_cmos

You can see that the 0x60 and 0x64 ports are indeed marked as occupied by keyboard.

If you run this command on a Raspberry Pi (ARM), you usually won't see much, if anything, because ARM devices primarily use MMIO and rarely use I/O ports.

Supplementary Notes on PMIO

String Instructions (Repetitive I/O): Just as MMIO has the _rep suffix, PMIO also has corresponding repetitive operation versions, called the ins and outs families.
```
void insb(unsigned long addr, void *buffer, unsigned int count);
void outsw(unsigned long addr, const void *buffer, unsigned int count);
/* ... insw, insl, outsb, outsw, outsl ... */
```
For example, insw reads from I/O port addr count times (2 bytes each time) into a buffer.
Paused I/O (_p suffix): There's another group of APIs called inb_p(), outb_p(), etc. Here, _p stands for pause. In the era of early slow peripherals, this meant inserting a small delay between I/O operations. But in modern kernels, these are usually just simple macro wrappers without actual delay functionality, maintained purely for backward compatibility.
User-Space PIO: You can also gain inb/outb permissions from user space via the iopl() or ioperm() system calls. This requires root privileges (CAP_SYS_RAWIO). This allows you to write a user-space driver.

Chapter Reflection

In this chapter, we walked through the complete chain from "physical address" to "virtual address" to "actual read/write." This is the foundation of all hardware manipulation.

Without MMU and page table mapping, the code we write would be like standing outside a glass wall looking at the hardware, reaching out but unable to touch it. Through request_mem_region and ioremap, we finally got that key, bridging the kernel virtual space and the hardware physical space.

But we also saw that the world isn't always unified. MMIO is the modern mainstream, disguising hardware as memory; but in the x86 world, the shadow of PMIO still lingers—a separate I/O space that requires dedicated inb/outb instructions to visit. Only by understanding both mechanisms can you truly see the full picture of Linux hardware drivers.

So far, our drivers have been proactive—we actively read registers and actively write configurations.

But real-world hardware often doesn't work like that. It doesn't meekly wait to be polled; instead, it actively sends signals to the CPU when needed: "I have data!" or "An error occurred!"

This is the topic of the next chapter—Interrupts. In this mechanism, control will be reversed: instead of the driver polling the hardware, the hardware interrupts the CPU. This is the necessary path to high-performance asynchronous I/O.

Are you ready? See you in the next chapter.

Exercises

Exercise 1: Application

Question: When writing a network device driver, you need to map the device registers' physical address 0xFE000000 into the kernel virtual address space for access. Assuming the register region is 4096 bytes in size and the device resources have been properly requested, write the core code statement that calls the kernel API to map it and obtain the kernel virtual address vaddr.

Answer and Analysis

Answer: void __iomem *vaddr = ioremap(0xFE000000, 4096);

Analysis: In the Linux kernel, accessing I/O memory requires using ioremap() to map the physical address to the kernel virtual address space. The first parameter is the physical/bus address (phys_addr_t), and the second parameter is the mapping length. The return value is a kernel virtual address of type void __iomem *, which serves as a marker to remind the compiler and static analysis tools that this is I/O memory, not ordinary RAM.

Exercise 2: Understanding

Question: When accessing device hardware registers, why can't we directly use the dereference operator (like *ptr = value;) or memcpy() to read and write the address returned by ioremap, and instead must use I/O-specific APIs like ioread32() / iowrite32()? (Please list two main reasons)

Answer and Analysis

Answer: 1. To guarantee atomicity of operations (avoiding byte merging errors on certain architectures); 2. To handle memory barrier and ordering issues (ensuring I/O executes in order); 3. To handle architecture-specific endianness differences (e.g., a little-endian CPU accessing a big-endian device).

Analysis: Ordinary memory accesses can be optimized, reordered, or even merged by the compiler, which is fatal in hardware I/O access because register reads and writes often have side effects and require strict timing. APIs like iowrite internally use memory barriers to prevent CPU out-of-order execution and guarantee that operations are atomic (i.e., writing a complete 32 bits at once). At the same time, these APIs abstract away underlying hardware architecture differences, improving driver portability.

Exercise 3: Application

Question: Suppose you are writing a Platform driver and need to initialize the device's I/O memory in the probe function. Given that the reg property is already defined in the Device Tree, write the complete code logic to obtain the resource and perform the mapping using the modern devm_* managed API (assuming the device structure pointer is pdev and the device pointer is dev).

Answer and Analysis

Answer: struct resource *res; void __iomem *base;

res = platform_get_resource(pdev, IORESOURCE_MEM, 0); if (!res) return -ENXIO;

base = devm_ioremap_resource(dev, res); if (IS_ERR(base)) return PTR_ERR(base);

Analysis: Modern Linux driver development recommends using the devm_* managed API, which automatically handles resource release and avoids memory leaks when the driver is unloaded. First, use platform_get_resource to obtain the IORESOURCE_MEM type resource structure from the Device Tree. Then use devm_ioremap_resource, which completes both request_mem_region (requesting the resource) and ioremap (establishing the mapping) in one step. If an error occurs, it returns ERR_PTR, which needs to be checked using IS_ERR.

Exercise 4: Thinking

Question: In ARM architecture embedded system development, the device manual shows that a peripheral register's physical address is 0x3F201000. However, in the Linux kernel, driver programs typically don't hardcode this address directly to ioremap, but instead obtain it through the Device Tree (DTS) and platform_get_resource. From the perspectives of "software-hardware decoupling" and "kernel generality," analyze why this approach is recommended.

Answer and Analysis

Answer: The recommended approach is to separate hardware description from driver code, letting the Device Tree handle hardware topology and addresses while the driver handles only the logic. Reasons:

Code portability: The same peripheral might have different physical addresses on different boards or SoCs. With the Device Tree, the driver code can adapt to different hardware without modification.
Kernel generality: The kernel can be compiled into a universal image and adapted to different boards by loading different DTBs, avoiding the need to maintain separate kernel branches for each board.
Modular design: This aligns with the Linux device model's design philosophy of "separating drivers from devices," facilitating unified management of hardware information.

Analysis: If addresses are hardcoded (like #define PHY_ADDR 0x3F...), the driver becomes tightly coupled to a specific board. Once the hardware address changes (due to a chip revision or routing changes), the driver code must be modified and recompiled. Through the Device Tree, physical addresses become "data" rather than "code." The bootloader can load the appropriate hardware description file based on the specific board being run, while the Linux kernel and driver code remain universally compatible at the binary level. This is the standard paradigm for modern embedded Linux development.

Key Takeaways

Kernel driver communication with hardware must be built on strict resource management. Developers must never directly dereference physical address pointers to touch hardware. The correct flow is to first request ownership of the I/O memory region from the kernel to prevent driver conflicts, and then establish a secure mapping from the physical address to a kernel virtual address via the ioremap mechanism. Modern driver development recommends using the devm_ioremap_resource "one-stop" API directly. It automatically handles resource checking, requesting, and mapping, and automatically takes care of release when the driver unloads, effectively mitigating the risk of resource leaks.

Hardware registers are not ordinary memory; their read and write operations have side effects and are timing-sensitive. Therefore, using standard pointer dereferences or memcpy is strictly prohibited. You must use the kernel's dedicated APIs (such as ioread32 and iowrite32) to access I/O memory. These functions encapsulate memory barriers and prevent compiler optimization, ensuring that instructions execute strictly in code order and that access widths strictly match hardware specifications. This strict interface isolation is the cornerstone of reliable hardware operations.

The Linux kernel supports two fundamentally different hardware access models: Memory-Mapped I/O (MMIO) and Port-Mapped I/O (PMIO). MMIO maps peripheral registers into the CPU's physical memory space, primarily used in ARM and other embedded systems, and is accessed via virtual pointers tagged with ioremap. PMIO, on the other hand, is a traditional feature of the x86 architecture with a separate I/O address space, requiring dedicated in/out assembly instructions and their wrapper functions (like inb/outb). The underlying mechanisms and API usage for the two are completely different.

When handling large data transfers, to improve efficiency and bypass function call overhead, the kernel provides bulk read/write APIs for I/O operations. For example, repetitive read and write functions like ioread32_rep and iowrite8_rep can leverage processor string instruction characteristics to efficiently move data in FIFO buffers. Compared to simply wrapping single read/write API calls in a loop, this mechanism can significantly reduce bus access latency and improve throughput.

The kernel provides developers with a window into system resource allocation through the /proc/iomem and /proc/ioports pseudo-files. By viewing these files, developers can verify whether a driver has successfully claimed the expected hardware address range or ports. This is the most intuitive means of debugging hardware driver conflicts and confirming resource registration status. Understanding this view helps quickly locate resource contention issues and ensures correct connections between the driver and the hardware.

Understanding the Dilemma of Direct Access​

The Solution—Mapping I/O Memory and I/O Ports​

Asking the Kernel for Permission​

Understanding and Using Memory-Mapped I/O (MMIO)​

Using the ioremap*() API​

The Next Generation—devm_* Managed API​

Obtaining Device Resources​

One-Stop Service: devm_ioremap_resource()​

Viewing Mappings via /proc/iomem​

MMIO — Performing the Actual I/O​

Performing 1- to 8-Byte Reads and Writes​

Performing Repetitive (Block) I/O Operations​

memset and memcpy Variants​

Understanding and Using Port-Mapped I/O (PMIO)​

PMIO — Performing the Actual I/O​

PMIO Example: The i8042 Keyboard Controller​

Viewing Ports via /proc/ioports​

Supplementary Notes on PMIO​

Chapter Reflection​

Exercises​

Exercise 1: Application​

Exercise 2: Understanding​

Exercise 3: Application​

Exercise 4: Thinking​

Key Takeaways​

Understanding the Dilemma of Direct Access

The Solution—Mapping I/O Memory and I/O Ports

Asking the Kernel for Permission

Understanding and Using Memory-Mapped I/O (MMIO)

Using the `ioremap*()` API

The Next Generation—`devm_*` Managed API

Obtaining Device Resources

One-Stop Service: `devm_ioremap_resource()`

Viewing Mappings via `/proc/iomem`

MMIO — Performing the Actual I/O

Performing 1- to 8-Byte Reads and Writes

Performing Repetitive (Block) I/O Operations

`memset` and `memcpy` Variants

Understanding and Using Port-Mapped I/O (PMIO)

PMIO — Performing the Actual I/O

PMIO Example: The `i8042` Keyboard Controller

Viewing Ports via `/proc/ioports`

Supplementary Notes on PMIO

Chapter Reflection

Exercises

Exercise 1: Application

Exercise 2: Understanding

Exercise 3: Application

Exercise 4: Thinking

Key Takeaways