Chapter 3: Beyond Memory — When the Kernel Reaches for Hardware
In this chapter, we tackle the "last mile" problem of driver-hardware communication.
It's more dangerous than it looks.
When writing user-space programs, accessing memory is the most routine thing in the world — point a pointer, read the data, simple as that. But inside the kernel, when you try to touch a hardware register through a pointer, things are completely different. If you think you can just cast a physical address to a pointer and dereference it, you're one compile away from a system crash.
Why? Because hardware isn't RAM. You can't treat it like ordinary memory — reads and writes have side effects, caches will sabotage you, and compiler optimizations can "swallow" your critical instructions.
This chapter's mission is to figure out how the kernel safely dances through this minefield: how to request permission, how to map hardware addresses into kernel space, how to use dedicated APIs to exchange data with the outside world, and how to gracefully clean up when it's all over.
We'll cover two fundamentally different hardware access models: Memory-Mapped I/O (MMIO) and Port-Mapped I/O (PMIO). They look similar, but the underlying mechanisms and details are completely different.
Ready? Buckle up, let's begin.
3.1 Accessing Hardware I/O Memory from the Kernel
The first problem we need to solve is straightforward: how does the kernel access hardware registers?
This sounds like a silly question — if you have the physical address, can't you just read and write directly?
No, you can't. Under Linux's protected mode, things aren't that simple. Imagine your code is running in a virtual address space, holding a physical address from a hardware manual (like 0x3F200000). If you dare to stuff that address into a pointer and dereference it, the CPU will slap you with a oops because you're trying to operate on an illegal (or unmapped) virtual address.
What we need to do is build a safe bridge between the kernel's virtual address space (VAS) and the device's physical I/O memory.
There are two core concepts we need to distinguish first: I/O ports and I/O memory.
I/O Memory vs. I/O Ports: Two Fundamentally Different Paths
When hardware designers create peripheral controllers, they typically hand over control in one of two ways:
- Memory-Mapped I/O (MMIO): This is the standard approach on modern ARM, MIPS, and most embedded systems. Designers map the peripheral's registers directly into the processor's physical memory address space. To you, accessing a register looks just like accessing a chunk of ordinary RAM — of course, that's just an illusion.
- Port-Mapped I/O (PMIO / PIO): This is a traditional x86 specialty. The CPU provides a separate set of instructions (
in/out) and a separate address space (the I/O port space). Even if your memory address is only 32 bits, the I/O port space might still be a separate 16-bit space (64KB in size). On this architecture, you can't access registers with ordinary pointers — you must use dedicated instructions.
Modern Linux kernels encapsulate both cases into a unified set of APIs for compatibility, but you need to understand the underlying mechanism differences.
In this first section, we'll focus on MMIO — because it's what you'll encounter most often when writing SoC drivers and Device Tree drivers. As for PMIO (port I/O), we'll dive into that in the next section.
Requesting from the Kernel: Ask Before You Take
Before we start mapping addresses, there's a bureaucratic process we can't skip.
The Linux kernel is also a resource manager. If two drivers both decide they have the right to operate on the same physical memory region, the consequences would be disastrous — one is writing configuration while the other is cutting power, and the motherboard gives up. To prevent these "collisions," the kernel maintains a resource tree.
Before any driver actually touches hardware, it must raise its hand: "This region is mine, nobody else touch it."
That's what request_mem_region() does.
1. Requesting an I/O Memory Region
You need to provide the starting physical address and length of the region.
struct resource *request_mem_region(unsigned long start, unsigned long len,
const char *name);
start: The physical starting address of the I/O memory.len: The size of this region (in bytes).name: A string identifying who is claiming this region (displayed in/proc/iomem).
If the request succeeds, it returns a pointer to a struct resource; if it fails (for example, if another driver already claimed it), it returns NULL.
Analogy time:
You can think of
request_mem_regionas reserving a room at a hotel front desk.You tell the front desk: "Starting from room 3F20, I want to reserve 100 rooms, register the name as 'my_driver'." The front desk checks the computer, and if that range is unoccupied, writes your name in the registry.
But there's a catch: Getting the room card doesn't mean you've entered the room yet. You've only ensured that nobody else will walk in. To actually open the door (access the data), you need another key.
2. Requesting an I/O Port Region
By the way, if you're working with x86 port I/O, the corresponding API is request_region() — the logic is exactly the same, just the target is port addresses instead:
struct resource *request_region(unsigned long start, unsigned long len,
const char *name);
3. Releasing Resources
When unloading the driver, don't forget to return the territory you claimed. This is basic courtesy; otherwise, that memory region will stay marked as "busy," and the next time you load the driver, it will fail.
void release_mem_region(unsigned long start, unsigned long len);
void release_region(unsigned long start, unsigned long len);
Getting the Key: Using ioremap*() APIs
Alright, the resource has been requested (request_mem_region succeeded), and now we hold the physical address. But that's still not enough.
Kernel code runs in a virtual address space, and the CPU's MMU (Memory Management Unit) doesn't know which virtual address corresponds to this physical address. You need to set up page table mappings to map the device's physical address into the kernel virtual address space (typically the high address region starting with 0xFFFF...).
That's the job of ioremap().
void __iomem *ioremap(phys_addr_t offset, size_t size);
offset: The physical address (the one you see in the manual).size: How much you want to map.- Return value: A pointer of type
void __iomem *. This is the "virtual key" you'll use to read and write registers afterward.
About the __iomem Marker
Notice the __iomem in the return type. This is a compiler attribute (used by sparse for static analysis) that tells you (and the compiler): "This is not an ordinary memory pointer, don't optimize it carelessly!"
You absolutely cannot use it as a regular pointer — for example, directly dereferencing it with *, or passing it to memcpy. Doing so might trigger a CPU alignment fault on ARM, or on x86, cause data to never actually reach the hardware due to cache coherency issues.
Reverse Operation: Unmapping
When you no longer need the hardware, or the driver is about to unload, you must tear down this mapping:
void iounmap(void volatile __iomem *addr);
Returning to the "hotel room" analogy:
ioremap is the process of using the room card to open the door. You get a virtual address (room card), and from then on, you rely on it to enter the room. iounmap is checking out and returning the card.
⚠️ Warning
Don't get the order wrong: first request_mem_region (ensure the room is yours), then ioremap (open the door).
When releasing, do the reverse: first iounmap (leave the room), then release_mem_region (check out).
If you reverse the order, you might be holding an already-invalidated pointer and still writing data to it — the consequences are unpredictable.
The Modern Approach: devm_* Managed APIs
If you've been writing drivers for a while, you'll find that request_... and ioremap... along with their corresponding release_... and iounmap... are practically the source of nightmares.
The most common scenario: you map memory in your driver's probe function, but forget to unmap in the remove function, or miss a step on an error handling path. This leads to memory leaks.
To save us careless engineers, the kernel introduced the devm_* (Device Managed) family of APIs. These APIs automatically track resource lifecycles — when the driver detaches, the kernel automatically releases these resources for you.
The two most commonly used are:
devm_ioremap(): The managed version ofioremap.devm_request_mem_region(): The managed version ofrequest_mem_region.
When you use devm_ioremap, you don't even need to explicitly call iounmap. When the driver unloads or probe fails, the kernel automatically handles the cleanup for you. This not only reduces lines of code but, more importantly, prevents those dumb bugs where you "forget to release a resource on an error handling path."
Acquiring Resources: platform_get_resource
In the real world (especially in embedded Linux), your driver is typically a platform_driver. This means you shouldn't hardcode physical addresses in your code (like #define PHY_ADDR 0x3F200000) — that's frowned upon.
The correct approach is to acquire resources from the Device Tree or kernel static configuration.
That's where platform_get_resource() comes in:
struct resource *platform_get_resource(struct platform_device *pdev,
unsigned int type, unsigned int num);
pdev: Your platform device pointer.type: The resource type, usuallyIORESOURCE_MEM.num: The index number, usually0(to get the firstregproperty).
It returns a struct resource pointer containing start (physical starting address) and end (end address).
With this struct, you can use resource_size(res) to get the length, or directly use its member variables to pass to request_mem_region.
The Ultimate Weapon: devm_ioremap_resource()
Because "requesting resources" and "mapping memory" are almost always done together and are always tedious, kernel developers eventually decided to merge them into one super API.
That's devm_ioremap_resource(). If you're writing modern Linux drivers, this should be your most frequently used function.
void __iomem *devm_ioremap_resource(struct device *dev, struct resource *res);
It does three things in one go:
- Checks: Ensures the passed-in
resis valid. - Requests: Internally calls
devm_request_mem_region()to claim the memory region. - Maps: Calls
devm_ioremap()to establish the mapping.
If any step fails, it returns ERR_PTR() (an error pointer) and has already printed an error log internally — you just need to check if the return value is IS_ERR().
This design fits our "tinkering engineering" intuition perfectly: give device, give resource, want pointer. Get pointer, get to work.
Verifying the Mapping: Checking via /proc/iomem
Sometimes after writing the code, you're not sure: did the mapping actually succeed? Does the kernel acknowledge this territory?
You can check /proc/iomem. This file shows the entire system's memory resource allocation map.
$ cat /proc/iomem
...
3f200000-3f200fff : /soc/gpio@7e200000
3f200000-3f200fff : pinctrl-bcm2835
...
If you see your driver's name in this list and the corresponding address range matches your expectations, congratulations — the request step is solid.
Practical I/O: ioreadX / iowriteX
Alright, the address is mapped, and we have the pointer (void __iomem *). Now we need to actually read and write the hardware.
Remember this iron rule: never use ordinary pointer dereferencing (*ptr) or memcpy to access I/O memory.
Hardware I/O memory and ordinary RAM are fundamentally different:
- Side effects: Writing to a register might trigger a hardware action (like starting data transmission) — it's not like writing to memory where you're just storing a 0 or 1.
- Timing: Instruction order must not be shuffled. Ordinary compiler optimizations might merge two write operations or reorder them, which is disastrous for hardware.
- Bus width: You must access according to the hardware-specified bit width (8-bit, 16-bit, 32-bit), otherwise you might read garbage data.
To solve these problems, the kernel provides a set of read/write functions with built-in barrier functionality.
MMIO Read APIs
u8 ioread8(void __iomem *addr);
u16 ioread16(void __iomem *addr);
u32 ioread32(void __iomem *addr);
u64 ioread64(void __iomem *addr);
These functions ensure:
- The compiler won't optimize away this read operation.
- Instructions execute in strict code order (memory barrier).
- Access width strictly matches (for example,
ioread16will issue a 16-bit read instruction).
MMIO Write APIs
void iowrite8(u8 value, void __iomem *addr);
void iowrite16(u16 value, void __iomem *addr);
void iowrite32(u32 value, void __iomem *addr);
void iowrite64(u64 value, void __iomem *addr);
For example, if you want to write 0xFF to a GPIO set register, you'd write:
u32 __iomem *reg_base; // 假设已经映射好了
iowrite32(0xFF, reg_base + OFFSET_SET);
The OFFSET_SET here is automatically handled by the kernel API as a byte offset.
Bulk Operations: ioreadX_rep / iowriteX_rep
Sometimes you need to read a bunch of data from a FIFO (First-In-First-Out buffer) in one go, or write a bunch of data. If you wrap ioread32 in a for loop, the efficiency might not be great because of the function call overhead on each iteration.
This is where you can use "repeated read/write" instructions:
void ioread8_rep(void __iomem *addr, void *buf, unsigned long count);
void ioread16_rep(void __iomem *addr, void *buf, unsigned long count);
void ioread32_rep(void __iomem *addr, void *buf, unsigned long count);
void iowrite8_rep(void __iomem *addr, const void *buf, unsigned long count);
void iowrite16_rep(void __iomem *addr, const void *buf, unsigned long count);
void iowrite32_rep(void __iomem *addr, const void *buf, unsigned long count);
These functions leverage the CPU's string instructions (if the architecture supports them) to move data, making them much faster.
Setting and Copying: memset_io / memcpy_fromio / memcpy_toio
Although we don't recommend using memcpy, sometimes you really do need to zero out a block of I/O memory or do bulk transfers. The kernel provides corresponding "Io" versions:
memset_io(void __iomem *addr, int value, size_t size): Like memset, sets a region of I/O memory to a specific value.memcpy_fromio(void *buffer, const void __iomem *addr, size_t size): Copies data from I/O memory to RAM (reads hardware state into memory variables).memcpy_toio(void __iomem *addr, const void *buffer, size_t size): Copies data from RAM to I/O memory (writes configuration to hardware).
⚠️ There's a huge pitfall here: These operations can be very slow, especially memcpy_toio. Because this isn't just a memory copy — every write might go through the bus and reach the actual hardware chip. If your hardware FIFO is full, your CPU might stall here for a long time. So, when in interrupt context or holding a spinlock, be extremely careful using memcpy_toio with large blocks of data.
Section Summary: The Bridge from Physical to Virtual
In this section, we walked through the entire process from "getting a physical address" to "successfully writing to a register." This path isn't smooth — we carefully avoided the traps of direct pointer manipulation, learned how to politely request resources from the kernel (request_mem_region), how to establish safe mappings (ioremap), and how to use standardized APIs (ioread32/iowrite32) to talk to hardware.
This is the foundation of all hardware manipulation. If you skip these steps and directly force-modify addresses, the kernel will kick you out without mercy.
But this is only half the story.
What we just covered is MMIO — treating hardware as memory. But in the PC world, there's a large group of ancient and stubborn devices (like parallel ports, serial ports, or old sound cards on x86) that don't live in the memory address space. Instead, they hide in a parallel universe called "I/O ports."
In the next section, we'll shift our gaze from ioremap and look at that special world that uses the inb and outb instructions. Only by understanding it will you truly see the full picture of Linux hardware drivers.