Skip to main content

1.3 Linux Kernel Network Development Model

The networking subsystem is incredibly complex, and it evolves at breakneck speed—fast enough that if you blink, you might miss an API change.

In the previous section, we discussed devices, buffers, and various subsystems—that was the "static" anatomy. Now it's time to throw all of that into the "dynamic" fray. Linux kernel development is nothing like writing typical business logic. It follows a collaborative model that may seem archaic, even cumbersome, but is exceptionally efficient in practice.

For anyone looking to dive deep into kernel networking, understanding this model is just as important as understanding the code. Perhaps even more so.

Why? Because when you try to fix a tricky bug or adapt your code to the latest kernel, you'll find that the code itself is just the tip of the iceberg. Beneath the surface lie Git trees, mailing lists, patch conventions, and the discerning eyes of maintainers. If you don't understand these rules, your code could be flawless and still never make it into mainline.

Pre-Development Gear Check

Before we begin this game, we need to take inventory of our gear. There are no GUI IDE wizards here—just the command line and a text editor.

In the world of kernel development, Git isn't just a tool; it's the lingua franca.

Whatever your goal—whether it's fixing a bug that crashes a network card, submitting a patch that optimizes the sk_buff path, or simply understanding the history of a specific code snippet—you must master Git. Linus Torvalds originally created Git to manage kernel code, so in this domain, there is no more native or powerful tool.

Often, you'll need to track down exactly which version introduced a bug, which requires knowing how to use Git bisect. Or you might need to backport a feature to an older kernel version, which requires knowing how to revert and merge patches. Alternatively, if you want to experiment with the latest "bleeding-edge" features, you need to know which Git tree to pull from.

Here is a checklist of basic survival skills:

  • How to apply a patch: Received a .patch file from someone—how do you apply it to your codebase?
  • How to read a patch: Can you glance at code modifications from others and immediately grasp the intent and potential risks?
  • How to locate a problematic patch: The system suddenly hangs—how do you figure out which commit caused it?
  • How to revert a patch: Once you've identified the problem, how do you precisely undo that change without messing up other modifications?
  • How to clone a Git tree: How do you get your hands on the two core repositories, net and net-next?
  • How to rebase: How do you keep your local changes in sync with mainline's rapid pace?

If you aren't familiar with Git yet, I highly recommend Scott Chacon's Pro Git (it's free, just search online). It's not just a tool manual; it's your passport to this community.

Submitting Patches: A Game of Etiquette

Alright, let's say you've written a cool patch that fixes packet loss when a network card is under full load. Now you want to submit it to mainline. This is where the real challenge begins.

The kernel community has a strict, almost rigid set of etiquette. This isn't to make your life difficult, but to maintain efficiency amidst hundreds of emails every day.

First, your code must pass the kernel coding style checks. This isn't a "suggestion." If your code uses 4 spaces for indentation instead of tabs, or if your braces are in the wrong place, maintainers won't even look at it—they'll bounce it back immediately.

Second, you need to test. And not just once on your PC. You have to consider different architectures and different configurations.

Then there's the submission method. Although a tiny minority of people send patches using Gmail's web interface (which is genuinely not recommended), the professional approach is to properly configure git send-email. This ensures your patches appear on the mailing list in the correct format (inline, with proper encoding), making it easy for others to quote and reply directly.

During this process, there are a few scripts you must know—they are the community's security checkpoints:

  • scripts/checkpatch.pl: This is a Perl script. Run it once before sending your patch. It acts like a strict disciplinarian checking your code style: are there trailing spaces? Do the comments conform to the standard? Even if there's just one extra space, it will throw an error. Don't find it annoying—it saves you from a lot of public embarrassment.

  • scripts/get_maintainer.pl: This script is absolutely crucial. The kernel is so massive that you likely have no idea who maintains your network card driver. Run this script, and it will analyze the file paths you modified and tell you: "The primary maintainer for this file is A, and the Cc list includes B, C, and D." If you send your patch to the wrong place, having it sink into oblivion is the best-case scenario; the worst-case is being called out on the mailing list.

Remember, submitting patches requires patience. Even a five-line fix might take several days to be accepted. This is perfectly normal.

Two Worlds: net and net-next

Linux networking subsystem development operates in a dual-track world. Understanding this is your first step toward participating.

All network development, whether it's patches or discussions (RFCs), primarily converges on a single mailing list:

netdev mailing list: netdev@vger.kernel.org This is a high-traffic list. Hundreds of emails a day, mostly patches, code reviews, and debates over new features.

Behind this list hang two crucial Git repositories:

  1. net (http://git.kernel.org/?p=linux/kernel/git/davem/net.git): This is the world of fixes. It holds fixes for kernel code that has already been merged into mainline. If you find that a network card doesn't work on the current 5.x kernel and you fix it, your patch will eventually be merged into this branch and ultimately flow into the current stable kernels.

  2. net-next (http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git): This is the world of the future. The code here is prepared for the next merge window. It's filled with new features, experimental protocol rewrites, and architectural adjustments. If you're developing a brand-new network protocol or refactoring the entire TCP stack, this is your target.

These two repositories are managed by the networking subsystem maintainer, David Miller. Periodically, he sends the changes from both trees to Linus via Pull Requests to be merged into mainline.

There is a very critical timing detail here where many newcomers stumble: the merge window closure period.

When Linus starts merging content from net-next into mainline, the net-next repository is frozen. At this point, you cannot submit new patches to net-next. The maintainer will post an announcement on the netdev mailing list: "Merge window is open/closed." If you send a patch during the closure period, it will either be ignored or returned—because the mainline is undergoing drastic changes, and your patch could instantly create conflicts.

Exceptions and Branches

Although netdev is the home base for network development, not all networking subsystems live here.

Some subsystems are too large, too specialized, or are maintained by specific companies, so they have their own Git repositories and mailing lists:

  • Wireless: Has its own mailing list and Git tree. However, its final Pull Request still goes to netdev.
  • Bluetooth: Same deal—independently maintained, but ultimately folded into the networking ecosystem.
  • IPsec: No independent mailing list; discussed directly on netdev.
  • IEEE 802.15.4 (6LoWPAN): Also lacks an independent list.

So, when you want to modify a specific module, the first thing you should do is use scripts/get_maintainer.pl to confirm exactly who maintains it.

About This Book's Environment

Finally, let's have a practical talk about the code environment used in this book.

One of the worst things that can happen in software development is when the code in the book doesn't match the code on your machine. APIs change, struct fields get renamed, and following along feels like reading gibberish.

To solve this problem, all code snippets and examples in this book, unless otherwise stated, are based on kernel version 3.9.

This version was released in 2013. It's old enough that many classic mechanisms had already solidified, yet new enough to include most of the core concepts of the modern network stack (although things like eBPF, which are more modern, hadn't been born yet—but that's a story for later).

  • Getting the source code: You can download the tarball from kernel.org, or directly clone the net or net-next trees mentioned earlier using Git, and then run git checkout v3.9.
  • Browsing the code: If you just want to look up a struct definition or function call, I highly recommend LXR (Linux Cross Reference). You can go to lxr.free-electrons.com (or nowadays, lxr.linux.no, etc.) to browse the code online. Click on a variable name, and it takes you to every place it's referenced—this is an absolute lifesaver when reading complex structures like sk_buff.
  • Local LXR: If you've modified the kernel and want to build your own cross-reference index, you can even install an LXR server on your local machine.

Alright, the gear check is complete, and we have our map. It's time to dive deep into the code.


Chapter 1 Echoes: Building the Foundations of Network Understanding

This chapter was just a primer, but we covered quite a bit of territory. Let's piece these fragments back together into a complete map and see what we've actually built.

On the surface, we were learning the architecture of the Linux networking subsystem. But in essence, we were building an intuition for "layering" and "data flow".

We first saw the concrete projection of the OSI model in Linux—the harsh reality of seven abstract layers translated into C code. We saw how net_device, the "hardware spokesperson," manages network cards, and how sk_buff, the "universal shipping label," is passed between the various checkpoints in the kernel.

If you ask me what the most important takeaway from this chapter is, I'd say: the kernel network stack isn't a monolith; it's an assembly line made up of countless hooks.

  • Starting from the NAPI interrupt polling in the hardware driver, packets are picked up.
  • They are filtered through the firewall rules of Netfilter.
  • They pass through the intersections of the routing subsystem.
  • They go through the neighbor subsystem (ARP/NDISC) to find the MAC address of the next hop.
  • Finally, they land in a Socket, to be caught by a user-space program.

This chapter pointed out the major stations on this assembly line.

Don't forget the two core structures we mentioned at the end of the previous section—net_device and sk_buff. They are the currency of this assembly line. Appendix A lists every line of their definitions in detail; consider it your "dictionary" for when you write drivers later—whenever you encounter an unfamiliar field, just look it up in that chapter.

Remember those advanced features we mentioned at the end of the previous section? RDMA bypasses the CPU to move memory directly, and Namespaces turn one machine into a thousand virtual networks. These features might seem flashy, but they are still built on top of the foundational assembly line we established in this chapter. Even RDMA needs to register a network card first; even a Namespace needs its own loopback device.

In the journey ahead, we will no longer just "look" at this assembly line. We're going to start getting our hands dirty.

In Chapter 2, we'll start with Netlink Sockets. Why Netlink? Because it's the two-way radio between user space and kernel space. If you want to create virtual network cards at runtime, modify the routing table, or configure a VPN, you have to go through it. It's the master switch for many advanced network features.

Are you ready? It's time to start debugging.


Exercises

Exercise 1: Understanding

Question: The sk_buff structure is the core data structure used to manage network packets in the Linux kernel network stack. As a packet is passed from L2 (the network device driver layer) to L3 (the network layer, such as IPv4), the driver typically calls the eth_type_trans() method. After this method completes successfully, which part of the packet does the data pointer in the sk_buff structure point to? What is the purpose of this design?

Answer and Analysis

Answer: It points to the L3 (network layer) header, such as the IP header. The purpose is to ensure that skb->data always points to the header currently being processed by that protocol layer, making it convenient for upper-layer protocols to read data directly.

Analysis: Based on the text's description of The Socket Buffer, when a packet is at L2, skb->data points to the Ethernet header. The eth_type_trans() method moves the data pointer forward by 14 bytes (ETH_HLEN, the size of the Ethernet header) by calling skb_pull_inline(). This causes the pointer to skip the L2 header and point directly to the L3 header. This design pattern allows the various protocol layers of the kernel network stack (L3, L4) to access their respective headers through a unified interface, without having to manually calculate offsets every time.

Exercise 2: Application

Question: Suppose you are developing a driver for an embedded device company. You need to enable promiscuous mode on a network card to support packet capture analysis with tcpdump. However, after running two packet capture programs (tcpdump and wireshark) simultaneously, you manually close one of them and find that the network card is still in promiscuous mode. Based on the text's description of the Promiscuity counter, explain why the network card didn't automatically exit promiscuous mode.

Answer and Analysis

Answer: Because the promiscuity counter is a counter, not a boolean switch. When multiple packet capture programs are enabled, the counter is greater than 1. Closing one program only decrements the counter by 1; it does not reset it to zero.

Analysis: The text explicitly states that the Linux network stack uses a promiscuity counter rather than a simple boolean value to manage promiscuous mode. Every time a sniffer like tcpdump or wireshark starts, the counter increments by 1; when closed, it decrements by 1. The network card only truly exits promiscuous mode when the counter drops to 0. In this scenario, when both programs are running, the counter becomes 2. After closing one, the counter is 1, so the network card remains in promiscuous mode to serve the other program that is still running. This mechanism supports the need for multiple sniffers to run concurrently.

Exercise 3: Thinking

Question: NAPI (New API) is a hybrid interrupt/polling mechanism designed to improve network performance under high load. Compare NAPI with the traditional "interrupt-driven" mode and analyze why NAPI can deliver better performance when packet throughput is extremely high.

Answer and Analysis

Answer: Under high load, the traditional interrupt mode leads to an "interrupt storm," consuming massive amounts of CPU resources on context switching rather than efficiently processing data. NAPI switches to a polling mode to process packets in batches, significantly reducing the overhead of interrupt handling and context switching.

Analysis: Based on the description of New API (NAPI), older network drivers triggered an interrupt for every single packet received. In high-traffic scenarios, this leads to frequent interrupt requests (an "interrupt storm"), where the CPU spends all its time saving and restoring contexts (context switching), causing throughput to plummet. NAPI's solution is: under high load, the driver stops generating an interrupt for each packet, and instead lets the kernel periodically poll the driver to retrieve packets in bulk. While this approach sacrifices a tiny bit of latency, it drastically reduces the CPU's interrupt handling overhead, thereby improving overall data processing capacity.


Key Takeaways

The core of the Linux kernel networking subsystem lies in the "iron triangle" processing from L2 to L4, which decouples the physical layer from the application layer and focuses on the high-speed flow, verification, and forwarding of packets. This process is not a simple linear transmission; it is full of complex logic such as routing decisions, firewall filtering, NAT translation, and fragmentation/reassembly. The kernel is essentially a precision factory that repeatedly inspects and modifies packets across the various layers of the protocol stack.

The kernel abstracts physical network cards into software objects through the net_device structure, cleverly managing multi-process shared resources using internal mechanisms like the promiscuity counter. At the same time, paired with the NAPI (New API) mechanism, it dynamically switches between interrupt-driven and polling modes based on network load, effectively solving the CPU livelock problem caused by frequent context switching in traditional pure interrupt modes under high-concurrency, small-packet scenarios.

sk_buff (SKB) is the sole carrier for packets within the kernel. By maintaining pointers like head, data, and tail, it flexibly handles the stripping and adding of protocol headers. This design allows packets to efficiently "peel off" layer by layer via functions like skb_pull as they are passed from the link layer to the transport layer, ensuring zero-copy high performance when transitioning between different protocol layers.

Developing and debugging network code requires a deep understanding of the Git dual-track system (net and net-next) and the mailing list collaboration culture. The net branch is responsible for fixing stable kernel issues, while net-next carries new features targeting the next merge window. Mastering script tools like scripts/checkpatch.pl and get_maintainer.pl, and following strict coding style and submission etiquette, are key prerequisites for your code to be accepted by the community and merged into mainline.