14.10 Notification Chains
In the previous section, we discussed NFC as a "handshake protocol" where the kernel plays the role of an ingenious translator. But the kernel doesn't just translate between hardware components—it must also constantly monitor state changes across the entire system.
The networking world is never static. Cables get unplugged, MTUs get modified, MAC addresses get rewritten, and devices get unloaded—these events happen all the time. When they occur, the Network Stack and other related subsystems must be notified immediately. Otherwise, the routing table keeps sending packets to old addresses, the ARP cache keeps querying dead ports, and the entire networking logic collapses in an instant.
This requires a nervous system.
The Silence of the Old Approach
In early designs, handling such state changes was extremely painful. If subsystem A wanted to know whether a device in subsystem B was still alive, there were usually only two options: polling, or forcibly modifying subsystem B's code by stuffing it full of subsystem A's logic.
Polling is too dumb and slow, while forced code coupling turns the architecture into a tangled mess—the ARP module shouldn't care about the VLAN module's internal implementation, and the firewall shouldn't need to change just because a network driver's code was modified.
We need a mechanism that allows subsystems to "subscribe" to events they care about, and when those events occur, the kernel broadcasts the message. This is the Notification Chain.
Core Mechanism of Notification Chains
You can think of notification chains as a "publish-subscribe" system inside the kernel—much like a newspaper subscription.
But the "newspaper subscription" analogy falls short in one respect: a paper delivery person usually doesn't care whether you've finished reading, nor do they worry about someone else cutting in during delivery. The kernel's notification chains, however, require precise control over execution order, concurrency safety, and priority.
Back to the "Newspaper": notifier_block
In the kernel, every "subscriber" must fill out a subscription slip, which is the notifier_block structure:
struct notifier_block {
int (*notifier_call)(struct notifier_block *, unsigned long, void *);
struct notifier_block __rcu *next;
int priority;
};
This slip contains only three key pieces of information:
notifier_call: The callback function pointer. When an event occurs, the kernel "calls this number."next: A pointer to the next subscriber. All subscription slips are linked together into a linked list.priority: Priority. A higher number means earlier notification. This is crucial—some modules must react before others (e.g., shutting down hardware before notifying upper layers to disconnect).
The Four Faces of the Chain
The kernel doesn't have just one type of notification chain. Depending on the use case (whether sleeping is allowed, whether it's in an atomic context, whether lock protection exists), notification chains come in four "flavors."
Although they all rely on the core logic in kernel/notifier.c under the hood, we see different wrapper interfaces when using them:
atomic_notifier_chain_register(): Used in atomic contexts. It disables interrupts (or uses a spinlock) when executing callbacks, making it extremely strict. Callback functions must not sleep.blocking_notifier_chain_register(): Used in process contexts. It holds a mutex and allows callback functions to sleep.raw_notifier_chain_register(): The most primitive and permissive version. It has no lock protection whatsoever. This is typically used by subsystems that already manage their own locking (like the networking subsystem).srcu_notifier_chain_register(): Uses the SRCU (Sleepable Read-Copy Update) mechanism, balancing high performance with the ability to sleep.
The networking subsystem primarily uses raw_notifier_chain. Why? Because network code paths are extremely complex—sometimes they are already holding locks, and sometimes they cannot sleep. Using Raw chains allows the networking subsystem to decide for itself how to handle locking, rather than being constrained by a generic locking mechanism.
Network Event Checklist: A Comprehensive Diagnostic Report
When a network device changes, the kernel emits a specific event code. Table 14-1 lists all possible events. This isn't just a table; it's a diagnostic report of a network device's lifecycle.
We can categorize these events into several groups:
1. Life and Death
The most fundamental state changes.
NETDEV_UP/NETDEV_DOWN: The device is brought up or shut down by an administrator.NETDEV_REGISTER/NETDEV_UNREGISTER: Registration and unregistration of the device kernel object.- There is a subtle timing difference here: when
NETDEV_UNREGISTERoccurs, the device still exists; whenNETDEV_UNREGISTER_FINALoccurs, it is truly the last breath—the device's memory is about to be freed. If you need to access the device structure while handlingUNREGISTER, do not wait untilFINAL.
- There is a subtle timing difference here: when
NETDEV_POST_INIT: Occurs during device registration, but before creating sysfs and other objects. This is a very early initialization hook.
2. Attribute Changes
The device is still there, but its "appearance" has changed.
NETDEV_CHANGEMTU: The Maximum Transmission Unit changed. This triggers routing table recalculation because fragmentation strategies may have completely changed.NETDEV_CHANGEADDR: The MAC address changed.NETDEV_CHANGENAME: The device name changed (e.g., frometh0tolan0).NETDEV_FEAT_CHANGE: Hardware features changed (e.g., TSO/GSO disabled via ethtool).
3. Switches for Special Scenarios
These events usually serve as a final check before an action takes place, or to handle special virtual device logic.
NETDEV_PRE_UP: The device is about to come UP, but hasn't yet. This is an opportunity for a veto.- This is critical. For example, a WiFi driver (
cfg80211) will check: if the hardware switch is physically turned off or killed by rfkill, it will return an error here and refuse to bring the device up. If it waits untilNETDEV_UPto reject the request, the state machine will be messed up.
- This is critical. For example, a WiFi driver (
NETDEV_GOING_DOWN: The device is about to shut down. This is the last chance to "clean up the scene."NETDEV_PRE_TYPE_CHANGE/NETDEV_POST_TYPE_CHANGE: The device type is about to change / has changed. This is typically used by Bonding or Team drivers when a device changes from a regular port to a Bond slave.
4. Aggregation and Migration
This is the battleground for virtualization and high availability.
NETDEV_BONDING_FAILOVER: A failover occurred in the Bond driver. Data flow instantly switches from one link to another.NETDEV_NOTIFY_PEERS: Notify neighbors. Typically used after virtual machine migration or failover, where the device actively announces to the network: "I'm not at that old location anymore, I'm here now, please update your ARP tables!"NETDEV_JOIN: A new device has joined (become a slave device).
Hands-on Practice: Connecting to netdev_chain
Just looking at the event table is dry. Let's look at a real-world example. How does Linux's bridge module listen for network device changes?
When a network card is added to a bridge, if someone changes that card's MTU, the bridge's MTU must also change (taking the minimum value of all slave devices). How is this achieved? Through netdev_chain.
Step 1: Define the Subscription Slip
The bridge module defines its own notifier_block in net/bridge/br_notify.c:
struct notifier_block br_device_notifier = {
.notifier_call = br_device_event
};
No complex initialization—just fill in the callback function address directly.
Step 2: Register at the Front Desk
During module initialization, we register it on the kernel's notification chain. Here we use the networking subsystem's dedicated wrapper interface register_netdevice_notifier() (which is essentially a wrapper around raw_notifier_chain_register()):
static int __init br_init(void)
{
...
register_netdevice_notifier(&br_device_notifier);
...
}
Once this line of code executes successfully, the bridge module's ears are perked up.
Step 3: Handle the Incoming Call
When any event occurs on a network device, the kernel calls call_netdevice_notifiers(), which in turn triggers br_device_event().
Note the function signature—this is the standard template for all notification callbacks:
static int br_device_event(struct notifier_block *unused, unsigned long event, void *ptr)
{
struct net_device *dev = ptr;
struct net_bridge_port *p;
struct net_bridge *br;
bool changed_addr;
int err;
. . .
unused: This is thenotifier_blockwe just defined (sometimes a single block manages multiple callbacks; although it's called "unused" here, it can be useful in complex scenarios).event: These are the macros from Table 14-1 (e.g.,NETDEV_CHANGEMTU).ptr: A pointer to the device that generated the event (struct net_device).
Inside the callback function, there is typically a giant switch statement to filter for the events it cares about:
switch (event) {
case NETDEV_CHANGEMTU:
dev_set_mtu(br->dev, br_min_mtu(br));
break;
. . .
}
This is the core of the handling logic: I don't care who sent the notification; I only care about what happened. If it's NETDEV_CHANGEMTU, I recalculate the minimum MTU for the entire bridge and set it on the master bridge device; if it's NETDEV_DOWN, I might need to remove the corresponding port from the bridge.
Are There Other Chains?
Besides the network device chain (netdev_chain), there are several other important branches within the networking subsystem:
inet6addr_chain: Specifically monitors IPv6 address changes. If an IPv6 address is added or removed, this chain notifies all subscribers.netevent_notif_chain: Handles lower-level network events, such as neighbor table entry changes (FIB updates).inetaddr_chain: The notification chain corresponding to IPv4 addresses.
Even outside of networking, the clock event subsystem (clockevents_chain) and the alert subsystem during system crashes (panic_notifier_list) all use this mechanism. This proves the highly versatile design of notification chains.
Summary
In this section, we explored the kernel's "notification chain" mechanism.
From a technical perspective, it is a dispatcher based on callback linked lists; but from an architectural perspective, it is the cornerstone of decoupling. It allows kernel developers to confidently write modular code without worrying about low-level mistakes like "modified A but forgot to notify B."
Through notifier_block, we completely separate the generation of events (network device state changes) from the handling of events (routing updates, firewall refreshes, bridge logic reorganization).
In the next section, we will shift our focus from software logic back to hardware buses, looking at the foundation that supports network cards (especially PCIe network cards)—the PCI subsystem. We will see how the kernel enumerates devices, allocates resources, and implements magical features like Wake-on-LAN.