2.3 Netlink Message Header
In the previous section, we saw how the kernel places messages into the send queue — tracing the call chain from rtnetlink all the way to the generic netlink module.
But that's like knowing how a post office sorts mail without ever looking at the envelopes. Netlink communication is fundamentally about packets, and these packets aren't arbitrary — they must follow the rules laid out in RFC 3549 ("Linux Netlink as an IP Services Protocol").
In this section, instead of tracing function calls, we tear open a packet and examine the raw bytes.
Header Structure
Every Netlink packet must begin with a fixed 16-byte header. This header is defined in the kernel as struct nlmsghdr, located in include/uapi/linux/netlink.h:
struct nlmsghdr
{
__u32 nlmsg_len; /* 消息总长度(包含头部) */
__u16 nlmsg_type; /* 消息类型 */
__u16 nlmsg_flags; /* 标志位 */
__u32 nlmsg_seq; /* 序列号 */
__u32 nlmsg_pid; /* 端口 ID (Port ID) */
};
This is Netlink's "ID card." We can break these 16 bytes down into five fields:
-
nlmsg_len: This doesn't just refer to the data length — it's the total message length, including the header itself. When a parser reads this value, it knows exactly how many bytes to consume before moving on to the next message. -
nlmsg_type: This determines the purpose of the packet. There are some generic control types (less thanNLMSG_MIN_TYPE, i.e., 0x10):NLMSG_NOOP: No-op; silently ignored upon receipt.NLMSG_ERROR: Indicates an error. If the message sender requested an acknowledgment (NLM_F_ACK), the kernel will reply with this type when something goes wrong.NLMSG_DONE: A multi-part message terminator. Receiving this signals the end of a large message transfer.NLMSG_OVERRUN: Buffer overflow — data was lost. This is a critical error signal.
But the more interesting aspect is that each specific Netlink protocol family (such as the
rtnetlinkwe mentioned in the previous section) defines its own "dialect" within this field. For example,NETLINK_ROUTEdefines types likeRTM_NEWLINK(create a network interface),RTM_DELLINK(delete a network interface), andRTM_NEWROUTE(create a route). The kernel relies on this field to dispatch messages to the corresponding handler functions. -
nlmsg_flags: These are the "behavior directives" for the message. The most commonly used ones include:- Request and Acknowledgment:
NLM_F_REQUEST: Indicates that this is a request message.NLM_F_ACK: Requires the receiver to send back a confirmation packet. Useful for debugging, but typically disabled in production for performance reasons.
- Data Dumping:
NLM_F_DUMP: This is critical. When you want to "give me the entire routing table" or "give me all network interface info," you set this flag.NLM_F_MULTI: Used in conjunction withDUMP. Since the returned data volume is usually too large for a single packet, the kernel sends it in batches. All packets except the last one will carry theMULTIflag.
- Creation and Modification (for CRUD operations):
NLM_F_CREATE: Create if it doesn't exist.NLM_F_EXCL: Return an error if it already exists (used with CREATE to implement "create if not exists").NLM_F_REPLACE: Overwrite an existing entry.
- Request and Acknowledgment:
-
nlmsg_seq: Sequence number. This differs from TCP sequence numbers — the Netlink layer doesn't enforce continuity. Its primary purpose is to allow user-space programs to match "requests" with "replies." If I send out sequence number 5, and the returning ACK also has sequence number 5, I know it's the response to my message. -
nlmsg_pid: This is the sender's "port number."- If the message is sent by the kernel, this field is always 0.
- If sent by user space, it's typically the PID (Process ID) of the sending process.
- This explains how the kernel knows where to send replies — it simply copies
nlmsg_pidas the destination address.
(Figure 2-3. nlmsg header shows the memory layout of these 16 bytes)
Payload and TLV Encoding
Following the header is the payload.
We can't just shove raw data in there — that would make parsing a nightmare for the kernel. Netlink uses an extremely classic encoding format: TLV (Type-Length-Value).
This pattern is ubiquitous in network protocols (e.g., IPv6 extension headers), and its core idea is self-description. Want to pass an IP address? No problem. A string? No problem either. As long as you specify the type and length in the header, the parser can understand it.
Preceding every Netlink attribute is a small header defined by struct nlattr:
struct nlattr {
__u16 nla_len; /* 该属性的总长度(包含头部) */
__u16 nla_type; /* 属性类型 */
};
nla_len: Tells the parser how many bytes to read before skipping to the next attribute.nla_type: Defines what kind of data is contained in the attribute.NLA_U32: Contains a 32-bit unsigned integer.NLA_STRING: Contains a string.NLA_NESTED: Indicates that the Value of this attribute contains yet another set of TLV structures — i.e., nested attributes. This allows us to build complex tree-like data structures.
⚠️ Warning: Alignment Issues
Although the struct definition looks straightforward, in terms of memory layout, every Netlink attribute must be aligned to a 4-byte boundary (NLA_ALIGNTO). If you don't add padding for alignment when manually constructing packets, the kernel might silently drop them during parsing due to alignment errors, or produce garbled output.
Attribute Validation Policy
The kernel doesn't accept just anything. After receiving a Netlink message, it must verify that the attributes are valid.
Each protocol family defines an attribute validation policy, represented by an array called struct nla_policy. You'll notice its structure is almost identical to struct nlattr:
struct nla_policy {
u16 type; /* 期望的类型,如 NLA_U32 */
u16 len; /* 期望的长度限制 */
};
This array is indexed by attribute type. When the kernel calls nlmsg_parse() to parse a message, it also calls validate_nla() (in lib/nlattr.c) to validate each attribute against this policy table.
The validation rules are quite detailed:
- For fixed-length types (like
NLA_U32), thelenin the policy is usually ignored since the length is predetermined. - For strings (
NLA_STRING),lenrepresents the maximum allowed length (excluding the terminating\0). Strings exceeding this length are rejected. - For flags (
NLA_FLAG),lenis completely useless. The mere presence of this attribute meanstrue, and its absence meansfalse— the value itself carries no meaning.
Here's a gotcha: If the received attribute type exceeds the maxtype defined in the policy array, the kernel will silently ignore it. This is for backward compatibility — when an older kernel receives extended attributes from newer user space, it won't throw an error but will simply skip the parts it doesn't understand. However, this also means that if you find a newly added attribute isn't taking effect, first check whether you filled in the wrong nla_type, causing it to be discarded as an "unknown extension."
Kernel Receive Path
Finally, let's piece these fragments together and see how the kernel processes a Generic Netlink message (the entry point is in genl_rcv_msg()):
- Is it a Dump?: First, check if
NLM_F_DUMPis present innlmsg_flags.- If yes, call
netlink_dump_start(). This triggers the kernel to iterate over the specified table (e.g., the routing table), pack all entries, and send them back to user space.
- If yes, call
- Not a Dump? Then parse:
- Call
nlmsg_parse(). This validates every attribute according to thenla_policywe just discussed. - If validation fails, the process aborts immediately and returns an error code.
- Call
- Execute the operation:
- Only after validation passes does the flow continue to the next step, calling the
doit()callback function you registered ingenl_ops.
- Only after validation passes does the flow continue to the next step, calling the
It's like a strict customs checkpoint: first they check your visa (the header), then they inspect your luggage (attribute validation). Only when both pass are you allowed to enter (execute the callback).
The Netlink message header and TLV mechanism form the cornerstone of the entire IPC protocol. Now that the foundation is laid, in the next section we'll start building upward — looking at what the most commonly used NETLINK_ROUTE messages in the networking subsystem actually look like, and how they control the routing table and network interface states.