Skip to main content

ch02_7

2.7 Quick Reference

The most frustrating part of writing code isn't not understanding the theory—it's knowing the theory but forgetting that damn function's name, or what parameters it actually takes.

In this chapter, we dissected the skeleton of Netlink, the muscle of Generic Netlink, and even traced the veins of RTNL. Now, as we return to our editors ready to type that first line of code, we need a "cheat sheet"—not some hollow document that just lists function names, but a quick reference that tells you "what dirty work this function actually does."

I've put together the core APIs we interact with most in this chapter (and across the kernel source tree). Rather than flipping through hundreds of lines of include/linux/netlink.h, just look here.

Note: This isn't a cold, dry API manual. I've annotated the easily overlooked behavioral details right alongside them—like whether it actually validates the length for you, or whether it automatically sends an ACK.


These are the infrastructure functions for Netlink communication, handling the common logic from socket creation to message encapsulation.

struct sock *netlink_kernel_create(struct net *net, int unit, struct netlink_kernel_cfg *cfg)

  • Purpose: Creates a Netlink socket in kernel space. This is the starting point for all kernel-side Netlink communication.
  • Parameters: unit is the protocol number (e.g., NETLINK_GENERIC), and cfg is the configuration structure (where you plug in your callback functions).
  • Details: Returns a sock pointer on success, or NULL on failure. When it fails, it's usually due to a protocol number conflict or insufficient memory.

int netlink_rcv_skb(struct sk_buff *skb, int (*cb)(struct sk_buff *, struct nlmsghdr *))

  • Purpose: The "standard butler" for receiving Netlink messages. It gets called inside your input callback.
  • Why we need it: It handles all the tedious dirty work for you—checking if the message header length is out of bounds (NLMSG_HDRLEN), skipping control messages (like NLMSG_ERROR), and handling the NLM_F_ACK flag (if the user requests a reply, it automatically calls netlink_ack() to send an error packet).
  • Your job: You just need to stuff your actual business logic into the cb callback function you pass to it. If cb returns a non-zero value, netlink_rcv_skb considers it an error and sends an ERROR message back.

struct sk_buff *nlmsg_new(size_t payload, gfp_t flags)

  • Purpose: Allocates a new Netlink message buffer (sk_buff).
  • Details: Internally, it calls alloc_skb(). If you specify payload as 0, it's smart enough to allocate at least an aligned message header length (NLMSG_HDRLEN). This prevents you from corrupting memory before you even fill in the message header.

void *nlmsg_put(struct sk_buff *skb, u32 portid, u32 seq, int type, int len, int flags)

  • Purpose: Constructs a Netlink message header and places it into the data area of skb.
  • Details: It calls __nlmsg_put(). If there isn't enough space (insufficient tail room in skb), it returns NULL and discards skb.
  • Tip: Always check the return value! If it returns NULL, it means the skb you allocated is too small. Don't keep writing—just throw away skb and start over.

struct nlmsghdr *nlmsg_hdr(const struct sk_buff *skb)

  • Purpose: Gets the Netlink message header pointer from a skb.
  • Details: Essentially, it just returns skb->data with a type cast. Don't let the macro scare you; it just saves you a few lines of explicit type casting.

struct netlink_sock *nlk_sk(struct sock *sk)

  • Purpose: Gets the Netlink-specific netlink_sock structure from the generic sock structure.
  • Scenario: Use this macro when you need to access Netlink-specific fields (like nl_pid or groups). Defined in net/netlink/af_netlink.h.

Routing Netlink is the most heavily used Netlink consumer. Here is a set of APIs we frequently interact with, primarily used for manipulating network interfaces, routing tables, and IP addresses.

int rtnl_register(int protocol, int msgtype, rtnl_doit_func doit, rtnl_dumpit_func dumpit, rtnl_calcit_func calcit)

  • Purpose: Binds a specific RTNetlink message type (e.g., RTM_NEWLINK) to your handler function.
  • Parameters:
    • doit: Handles operations on a single object (create/delete/modify).
    • dumpit: Handles list dumps (like ip link show).
    • calcit: Calculates the buffer size needed for the dump (optional).
  • Details: If you provide neither doit nor dumpit, the kernel will reject the request outright.

static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)

  • Purpose: The RTNetlink message dispatcher. It parses the message type, looks up the previously registered doit or dumpit callbacks, and invokes them.
  • Details: You typically don't call this directly when writing drivers. It serves as the rcv callback for the RTNetlink socket, dispatching messages to the correct handler functions for you.

static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev, int type, u32 pid, u32 seq, u32 change, unsigned int flags, u32 ext_filter_mask)

  • Purpose: Fills network interface (net_device) information into a Netlink message.
  • Details: It does two things: first puts down an nlmsghdr, immediately followed by an ifinfomsg structure. If you want to send a "link up" or "IP changed" notification, this function is your go-to.

void rtnl_notify(struct sk_buff *skb, struct net *net, u32 pid, u32 group, struct nlmsghdr *nlh, gfp_t flags)

  • Purpose: Sends an RTNetlink message to user space.
  • Scenario: When network state changes in the kernel (e.g., unplugging a cable, changing an IP), the kernel uses this function to notify user space processes subscribed to that group (like NetworkManager).

The Generic Netlink API is a bit more complex than standard Netlink because it adds an extra layer of "family management" logic.

int genl_register_family(struct genl_family *family)

  • Purpose: Registers a new Generic Netlink family.
  • Details: It validates the family's integrity and allocates a unique family ID. If you set family->id to GENL_ID_GENERATE, the kernel will automatically generate an ID for you.
  • Note: Names must be unique. Registering the same name twice will result in an immediate error.

int genl_register_family_with_ops(struct genl_family *family, struct genl_ops *ops, size_t n_ops)

  • Purpose: The atomic version—simultaneously registers a family and its set of operations (ops).
  • Why it's recommended: This is the most commonly used API. It's equivalent to calling genl_register_family() and then looping through to call genl_register_ops(). If something goes wrong midway, it cleans up the half-registered family for you, leaving no garbage behind.
  • Constraint: Each ops must contain at least one of doit or dumpit, otherwise it directly returns -EINVAL.

int genl_register_ops(struct genl_family *family, struct genl_ops *ops)

  • Purpose: Adds a specific command operation to an already registered family.
  • Constraint: The same command ID (cmd) cannot be registered twice.

void genl_unregister_mc_group(struct genl_family *family, struct genl_multicast_group *grp)

  • Purpose: Unregisters a multicast group.
  • Details: Unregistering a family automatically cleans up all its groups, so you don't need to manually unregister them one by one when your driver exits. But if you want to dynamically add or remove groups, this function comes in handy.

void *genlmsg_put(struct sk_buff *skb, u32 portid, u32 seq, struct genl_family *family, int flags, u8 cmd)

  • Purpose: Adds the Generic Netlink header to a skb.
  • Details: It actually calls nlmsg_put() to add the Netlink header, and then immediately fills in a genlmsghdr (which contains your cmd).
  • Return: Returns a pointer to the payload area after the Generic Netlink header. You'll typically use this return value as the starting position for filling in attributes (nla_put).

2.7.4 Memory and Locking

struct sk_buff *netlink_alloc_skb(struct sock *ssk, unsigned int size, u32 dst_portid, gfp_t gfp_mask)

  • Purpose: Allocates a sk_buff specifically for Netlink.
  • Scenario: Primarily used for advanced optimization scenarios like NETLINK_MMAP (Memory-Mapped I/O), which is rare in ordinary driver development. If you aren't using mmap, just stick with nlmsg_new.

void genl_lock(void) / void genl_unlock(void)

  • Purpose: Operate on the Generic Netlink global mutex (genl_mutex).
  • Scenario: Protects the family registration/unregistration process. In most cases, the kernel-internal genl_register_* family of functions already handles the locking for you. Unless you're writing extremely low-level concurrent logic, don't touch this.

Chapter Echoes

We spent an entire chapter dissecting Netlink, which might seem a bit lengthy—after all, on the surface it's just a "mechanism for the kernel and user space to chat."

But there's a subtle turning point here: the boundary between the modern Linux kernel and user space is becoming increasingly "thin." In the past, we sent rigid commands via ioctl; now, through Netlink (especially Generic Netlink), the kernel and user space act more like two equal processes collaborating through a carefully designed message protocol. You can not only send down commands but also receive event notifications; you can not only query a single object but also dump an entire database.

This "bidirectional, asynchronous, extensible" communication model is key to understanding the modern Linux network stack. Without Netlink, the ip command couldn't exist, iw couldn't control complex wireless drivers, and even container technology would struggle immensely when performing network isolation.

Remember the question we raised at the beginning of this chapter—why don't we use ioctl anymore? Now the answer is clear: because what we need is a dynamic, future-proof interface, not just a bunch of static command codes.

In the next chapter, we'll temporarily leave the control plane and explore the other extreme of the data plane: when a network failure occurs, who delivers the news? The answer is the ICMP protocol. We'll see how the kernel generates ICMP messages and how these error packets traverse the protocol stack to ultimately reach the application.


Exercises

Exercise 1: Understanding

Question: When developing user space network tools, would you choose the IOCTL-based net-tools package (like ifconfig) or the Netlink socket-based iproute2 package (like ip)? Please list two main technical advantages of Netlink sockets over IOCTL.

Answer and Analysis

Answer: We should choose iproute2 (Netlink). Advantage 1: Netlink supports the kernel proactively sending asynchronous messages to user space, whereas IOCTL can only handle synchronous requests initiated by user space. Advantage 2: Netlink is built on the socket mechanism, supports multicast, and doesn't require complex IOCTL command number definitions.

Analysis: This chapter explicitly points out that Netlink was created to replace the clunky communication method of IOCTL. The main flaw of IOCTL is that it is synchronous, and the kernel cannot initiate communication on its own (for example, it cannot proactively notify user space that a network interface has gone down). Netlink solves this problem. Furthermore, Netlink leverages the standard socket API, making bidirectional communication and multicast handling much simpler and more flexible.

Exercise 2: Application

Question: Suppose you need to write a user space daemon to monitor IPv4 routing table changes in a Linux system (e.g., receiving a notification when an administrator executes ip route add). Based on this chapter's description of rtnetlink and Netlink message handling, which specific Netlink protocol family must this daemon bind to, and which multicast group must it join to receive these asynchronous events?

Answer and Analysis

Answer: Protocol family: NETLINK_ROUTE (or AF_NETLINK paired with protocol type NETLINK_ROUTE) Multicast group: RTNLGRP_IPV4_ROUTE

Analysis: Based on this chapter's content, routing and link-related messages belong to the rtnetlink protocol family (NETLINK_ROUTE). To receive specific asynchronous events, a user space socket needs not only to create a NETLINK_ROUTE type socket but also to join the corresponding multicast group via bind(). As mentioned in the text, when a new routing entry is inserted, the kernel calls rtmsg_fib() via rtnl_notify() to notify all listeners registered to the RTNLGRP_IPV4_ROUTE group.

Exercise 3: Thinking

Question: When designing a custom Netlink protocol, the number of standard Netlink protocols is limited to 32 (MAX_LINKS). To overcome this limitation and support communication for more dynamic kernel subsystems, Linux introduced Generic Netlink (genl). Briefly describe how Generic Netlink uses multiplexing techniques to break through this quantity limit, and provide an example of an actual subsystem that uses it.

Answer and Analysis

Answer: Principle: Generic Netlink itself only occupies one standard Netlink protocol number (NETLINK_GENERIC), acting as a multiplexer. It introduces the concept of a genl_family (family), allowing different subsystems (like nl80211) to register as different families. By using the family ID to distinguish messages from different subsystems on the same Netlink socket, it theoretically supports hundreds or even thousands of different protocols. Example: nl80211 (used for wireless subsystem configuration, utilized by the iw tool).

Analysis: This question tests a deep understanding of the Netlink architecture. Standard Netlink is similar to a flat channel with a limited number of slots (32). Generic Netlink dedicates one of these channels to act as a "head manager" (Multiplexer), supporting an arbitrary number of communication types within this channel through a "sub-channel" approach. This is the core design philosophy behind the Generic Netlink protocol discussed in this chapter, and it's the reason why modern subsystems like wireless drivers and task statistics tend to use it.


Key Takeaways

Netlink sockets are the core mechanism for communication between the modern Linux kernel and user space, completely replacing the outdated ioctl approach. By supporting bidirectional asynchronous communication and multicast mechanisms, Netlink not only allows user space to actively query kernel state (such as routing tables and network interface info) but, more critically, enables the kernel to proactively "push" notifications to subscribed user processes when events occur (such as network interfaces going up or down, or routing changes). This event-driven model is the foundation for the efficient operation of modern network management tools like iproute2.

Developing Netlink applications in user space typically relies on the libnl or libmnl libraries to avoid dealing directly with tedious system calls and buffer operations. The foundation of this communication is the address structure sockaddr_nl, where the nl_pid field acts as the "port number" concept: set it to 0 when sending to the kernel, but be careful of conflicts in multi-threaded programs using auto-binding, because multiple sockets within the same process that don't manually specify different nl_pid values will lead to receive confusion.

The core of kernel Netlink message handling lies in the message dispatch and callback registration mechanism. The input callback function specified when creating a socket via netlink_kernel_create (like rtnetlink's rtnetlink_rcv) serves as the main entry point, while specific message types (like RTM_NEWLINK) register their handler functions into the kernel's multi-dimensional lookup table via rtnl_register(). When a message arrives, the kernel dispatches it to the corresponding doit (for single operations) or dumpit (for bulk dumps) function based on the protocol number and message type.

The Netlink protocol uses a "header + TLV attributes" binary format to ensure extreme extensibility. The message header nlmsghdr contains the length, type, and sequence number, followed by a payload using TLV (Type-Length-Value) encoding that supports nesting to build complex data structures. The kernel strictly validates incoming attributes for type and length through a predefined nla_policy policy array, and only requests that pass validation are executed, thereby ensuring kernel security.

To address the scarcity of Netlink protocol number resources (only 32 available), the kernel introduced Generic Netlink as a universal multiplexing solution. It occupies only a single protocol number (NETLINK_GENERIC) and uses a "main service desk" called nlctrl to dynamically allocate and manage an unlimited number of sub-families (like nl80211), achieving runtime extension based on string naming. This mechanism allows developers to implement specific kernel module communication interfaces by registering custom genl_family and genl_ops without modifying the core kernel code.