2.4 NETLINK_ROUTE Messages: More Than Just Routing
In the previous section, we spent our time deep in the Generic Netlink mechanism, tearing apart message headers, TLVs, and validation policies. Now that the foundation is laid, it's time to shift our focus back to the networking subsystem itself and see how the "veteran" Netlink protocol families actually work.
The heavyweight among them is undoubtedly NETLINK_ROUTE.
Don't let the name fool you—although it's called Route Netlink, it handles far more than just routing tables. It acts as the "general manager" of the entire network configuration, overseeing network interfaces, addresses, neighbor tables, and traffic control.
Specifically, the NETLINK_ROUTE message families are divided into several branches by function:
- LINK: Manages network interfaces, such as bringing
eth0up or down, or renaming it. - ADDR: Manages the addition and deletion of IP addresses.
- ROUTE: The actual routing messages, managing the FIB (Forwarding Information Base).
- NEIGH: Manages the neighbor subsystem, which means the ARP and ND (Neighbor Discovery) tables.
- RULE: Manages policy routing rules.
- QDISC, TCLASS, ACTION: This area relates to QoS (traffic control), covering queuing disciplines, traffic classification, and action handling.
- NEIGHTBL: Configuration of the neighbor table itself.
- ADDRLABEL: Address labels (typically used for IPv6).
Mapping CRUD to Message Types
Regardless of the family above, their message type design follows a very intuitive CRUD (Create, Read, Update, Delete) logic.
For most objects (such as routes, addresses, or neighbors), there are only three operations:
- Create: Corresponds to the
RTM_NEWXXXmessage (e.g.,RTM_NEWROUTE). - Delete: Corresponds to the
RTM_DELXXXmessage (e.g.,RTM_DELROUTE). - Query: Corresponds to the
RTM_GETXXXmessage (e.g.,RTM_GETROUTE).
This is just like standard database operations.
However, there is one exception: LINK (network interfaces). Configuring a network interface is sometimes more complex than simple "create, delete, query"—you might just want to modify a specific parameter (like the MTU) rather than tearing down and recreating the interface. Therefore, on top of the three standard messages, the Link family has an extra message dedicated to modification: RTM_SETLINK.
When Operations Fail: The nlmsgerr Structure
When writing userspace programs, what we fear most isn't rejection—it's silence. You send a request, the kernel doesn't respond, and you have no idea whether it never arrived or failed during processing.
The designers of the Netlink protocol clearly considered this. They designed a standard error reporting mechanism that works universally, whether you're communicating over standard Netlink or Generic Netlink.
All of this is wrapped up in an nlmsgerr structure:
struct nlmsgerr {
int error; /* 负数表示标准 errno 错误码,0 表示成功/ACK */
struct nlmsghdr msg; /* 触发错误的原始请求消息头 */
};
You can think of this structure as a "return receipt."
When the message you sent has a problem—for example, if nlmsg_type contains a value the kernel doesn't recognize—the kernel doesn't just sit there. It sends back a Netlink message. The message header type of this reply is NLMSG_ERROR, immediately followed by the nlmsgerr structure above.
At this point, an interesting behavior occurs:
If the error field is not 0 (for example, -EOPNOTSUPP), this "return receipt" will also have the header of the original message you sent appended to it (that is, the msg field).
Why do this?
Consider a scenario where you're handling multi-threaded or asynchronous communication and have fired off several requests simultaneously. When one of them returns an error, how do you know which request failed?
The kernel thoughtfully attaches your original request header verbatim. You simply compare the sequence number in that header to immediately match it up and identify which operation went wrong.
Figure 2-4 shows the memory layout of this error message:
[Figure 2-4: Netlink Error Message Layout]
- Netlink Header (type = NLMSG_ERROR)
- Error Code (int) : e.g.,
-EINVALor-EOPNOTSUPP- Original Request Header : the
nlmsghdrof the original request (present only when error != 0)- Original Payload : (Note: in practice, the code usually only appends the header, not the entire payload, unless there are special debugging requirements)
The Nuances of the ACK Mechanism
Sometimes we send a request not to modify configuration, but simply to confirm "did you get this?" This is where the ACK (Acknowledgment) mechanism comes in.
Requesting an ACK is simple: add NLM_F_ACK to the flags of the outgoing message.
After receiving it, if the kernel processes it without issues, it will send back a message of type NLMSG_ERROR. But note, there's a counter-intuitive design here:
Even on success (ACK), the message type the kernel replies with is NLMSG_ERROR.
The difference lies in the error field of the nlmsgerr structure:
- If
erroris 0, this is not a "return receipt" but a "delivery confirmation." - In this case, the
msgfield (the original request header) is not appended to the reply—because it succeeded, the kernel figures there's no need to send your garbage back to you.
If you want to dig into the details, you can look at the netlink_ack() function in net/netlink/af_netlink.c. Those few lines of code spell out the logic above perfectly clearly.
Where We Are Now
We now have a list of NETLINK_ROUTE message types (LINK, ADDR, ROUTE...) in hand, and a mental model for error handling and ACKs.
But this is just the "manual." Next, we'll move into the practical portion and use these tools to manipulate the routing table (FIB) inside the kernel. We'll see exactly which TLV attributes need to be nested like Russian dolls inside the message body when we push a route using a single RTM_NEWROUTE message.