Skip to main content

Chapter 2: When Userspace Meets the Kernel

Imagine you are writing an application that needs to monitor network traffic changes. Whenever the kernel's routing table changes or a new network interface comes online, your program needs to know immediately.

Twenty years ago, you would probably have opened a character device file and used the archaic ioctl mechanism to poll the kernel. Those were dark days—ioctl was like a one-way tin can telephone; you shout, the kernel replies, and if you don't shout, you never know what's happening on the other side.

But modern Linux doesn't work that way. What we need is a two-way "phone call" channel, ideally with caller ID (multicast). That is exactly why Netlink Sockets exist.

Our mission in this chapter is to thoroughly understand this modern kernel communication mechanism. From why we abandoned ioctl, to hand-crafting Netlink message formats, to diving into the kernel code to see how it processes these requests.

This is harder than it looks—hard because it is a complete protocol, not just a simple function call. We'll start with the tools we should use in userspace to strike up this conversation.


If you really want to hand-write every line of code from scratch, using raw socket() system calls to handle Netlink communication, you certainly can. But it's like writing a web server in assembly language—sure, you're a guru, but maintaining it will make you want to cry.

Standing on the shoulders of giants is always wise. When developing Netlink applications in userspace, there are two ready-made libraries worth your attention: libnl and libmnl.

First up is libnl. This is currently the most mainstream and feature-complete Netlink user library.

You can think of it as the "standard library" of the Netlink world. It's not just a simple wrapper around low-level system calls; it provides a complete, object-oriented API for handling Netlink communication. The famous iproute2 package (which powers the ip command you use every day) relies on libnl under the hood.

This library was developed by Thomas Graf, and its structure is highly modular. In addition to the core libnl library (which handles basic socket operations, message sending/receiving, and cache management), it provides specialized sub-libraries for different Netlink protocol families, mainly including:

  • libnl-genl: For handling Generic Netlink (we'll cover this later—it's the ultimate solution for running out of protocol numbers).
  • libnl-route: Specifically for handling network-related Netlink messages like routing and links (corresponding to NETLINK_ROUTE).
  • libnl-nf: For handling Netfilter-related messages (such as firewall rules).

If your project needs to handle complex network configurations—like dynamically modifying routes, managing VLANs, or interacting with the wireless subsystem—libnl is the top choice. It handles many nasty details for you, such as message assembly, attribute parsing (that complex TLV format), and receiving asynchronous notifications.

libmnl: The Minimalist's Choice

But libnl has its "problem"—it's too heavy. Sometimes you just want to send a simple Netlink message without linking against a bunch of libraries or dealing with dozens of objects at every turn.

This is where libmnl comes to the rescue.

libmnl was written by Pablo Neira Ayuso (who also contributed heavily to Netfilter's core), and its design philosophy is "minimalism." This is a tiny library with only a few hundred lines of code, and it does exactly one thing: it lets you write as little boilerplate code as possible while retaining full control over Netlink messages.

It doesn't do caching, it doesn't build complex object trees—it just sends buffers and receives buffers. For embedded development or tools that only need to interact with the kernel a few times, libmnl is actually easier to get started with than libnl because its behavior is very direct—what you see is what you get.

Selection Advice:

  • Building a complex network management tool like Wireshark? Choose libnl.
  • Writing a dedicated small tool to configure a specific kernel feature, where size matters? Choose libmnl.

The sockaddr_nl Structure

Once we've chosen a library, and before diving into specific APIs, we need to take a look at what a Netlink "phone number" looks like.

In TCP/IP, we use sockaddr_in to represent an address (IP + port). In the Netlink world, the corresponding structure is sockaddr_nl. It's defined in the kernel header file include/uapi/linux/netlink.h and looks very plain:

struct sockaddr_nl {
__kernel_sa_family_t nl_family; /* AF_NETLINK */
unsigned short nl_pad; /* zero */
__u32 nl_pid; /* port ID */
__u32 nl_groups; /* multicast groups mask */
};

There are a few fields here worth breaking down, because they determine whether a "letter" can actually reach its destination.

1. nl_family

No hesitation here—fill in AF_NETLINK. This tells the kernel, "I'm part of the Netlink family, don't treat me as TCP/IP traffic."

2. nl_pad

Padding field. Must be 0. This is for alignment—don't worry about why, just set it to 0.

3. nl_pid (Port ID)

The name of this field is a bit misleading. Although it's called pid (Process ID), it actually represents the address of the Netlink Socket, i.e., the "Port ID".

  • For the kernel: This value is usually 0. When sending a message from userspace to the kernel, we set the destination address's nl_pid to 0.
  • For userspace:
    • The simplest approach: Set it to the current process's PID (getpid()). This makes it obvious at a glance who is sending the message.
    • The laziest approach (but also the most common): Just set it to 0, or leave it blank entirely, and call bind() directly.

What happens then? The kernel calls an internal method called netlink_autobind(). It attempts to assign the current thread's PID to this socket.

⚠️ Pitfall Warning There's a trap here that many people stumble into the first time they write a multithreaded program: if you create two Netlink sockets in a single process and want both to auto-bind via netlink_autobind(), you're going to have trouble—because they will both be assigned the same PID.

The result? When you send a request message to the kernel and the kernel replies, it doesn't know which socket should receive it. The response either gets sent arbitrarily or only to one of them.

The fix: If you open multiple Netlink sockets in a single program, you must manually assign them different nl_pid values. Don't be lazy.

Netlink doesn't just serve networking. Subsystems like SELinux, the audit system, and device hotplug all use Netlink. But what we care about most is rtnetlink (Route Netlink), which is specifically designed to handle network core messages like routing, neighbor tables (ARP), and link states.

4. nl_groups

Multicast group mask. One of Netlink's strengths is its support for multicast—the kernel can broadcast events to a group of interested sockets (e.g., "a network interface went down!"). nl_groups is a bitmask used to subscribe to the event groups you're interested in. If you only want unicast communication, set this to 0.


Userspace Packages for Controlling TCP/IP Networking

Having covered the low-level protocols and libraries, let's come back down to earth and see how the commands we type in the terminal actually map to these concepts.

Currently, there are two mainstream toolsets in the Linux world: the aging but still widely used net-tools, and the modern standard iproute2.

iproute2: The Modern Standard

This is the toolset installed by default on most major distributions. It is built almost entirely on Netlink Sockets. When you type ip addr add or ip route, what happens behind the scenes is: the command opens a Netlink socket, constructs a RTM_NEWADDR or RTM_NEWROUTE message, sends it to the kernel, and waits for the kernel's acknowledgment.

iproute2 includes these commands you probably use every day:

  • ip: Manages routing tables, network interfaces, addresses, etc. (the all-rounder).
  • ss: Dumps socket statistics (the replacement for netstat, and much faster).
  • tc: Traffic control, used for configuring QoS and traffic shaping (essential for network performance work).
  • bridge: Manages network bridges.
  • lnstat: Views network statistics.

Although iproute2 mostly uses Netlink, there is one exception: ip tuntap. This command is used to add or delete TUN/TAP virtual devices, and it hasn't fully migrated to Netlink on the kernel side yet—it still uses IOCTL. If you dig into the TUN/TAP driver code in the kernel, you'll still see a bunch of ioctl handler functions with no trace of rtnetlink—this is a legacy issue, and we'll mention it again later when we discuss the kernel implementation.

net-tools: Tears of a Bygone Era

If you see someone still using ifconfig, route, arp, or netstat, they are using net-tools.

This toolset is based on IOCTL. Its functionality is much weaker than iproute2's, and it doesn't support many newer network features (like namespace-related operations) at all. It is now basically in "maintenance mode" and has even been marked as deprecated on some distributions.

Here's a very intuitive comparison: if you want to know the detailed state of a socket, netstat reads from the /proc filesystem, which is slow and inaccurate; whereas ss asks the kernel directly via Netlink's INET_DIAG mechanism, which is fast and provides real-time data.

Later in this chapter, we'll dedicate a specific section ("Adding and deleting a routing entry") to demonstrate how to manually add and delete a routing table entry via Netlink, just like iproute2 does. By then, you'll realize just how much dirty work goes on behind the scenes of that single ip route add command.

Alright, we've now taken stock of the userspace tools and libraries. Next, we'll plunge our scalpel deep into the kernel to examine Kernel Netlink Sockets—how this engine processes requests from userspace and how it packages messages to send back. Understanding this step is what truly crosses the threshold into kernel development.