Skip to main content

14.3 Network Namespaces Implementation

In the previous section, we covered the UTS namespace—that easy target that only manages the hostname. It served as a warm-up, showing us how the kernel transforms the bad habit of "global variables" into "namespace-private data."

But that skeleton holds far too little substance.

If we throw you into a truly isolated environment, what good is simply changing a hostname? You need independent network interfaces, independent routing tables, independent iptables rules, and even your own complete set of TCP/IP stack parameters. That is the tough nut we need to crack in this section: Network Namespaces.

You can think of a network namespace as a fully independent virtual router or virtual machine. It isn't just simple configuration isolation; it's a deep copy of the entire network stack.


14.3 Network Namespaces Implementation

Logically, a network namespace is another complete copy of the network stack. This means it has its own independent everything:

  • Network devices
  • Routing tables
  • Neighbouring tables
  • Netfilter tables (iptables rules)
  • Network sockets
  • Network-related files under /proc
  • Network-related files under /sys

There is a highly practical feature here that easily creates an "illusion of magic." Let's clear it up right away.

If you create a network namespace named ns1, the kernel follows a special lookup logic: when a network application inside ns1 reads a configuration file (such as /etc/hosts), it first looks under /etc/netns/ns1/; if not found, it falls back to the global /etc/.

It's as if you are preparing a dedicated "drawer" for each namespace.

But the "drawer" analogy breaks down in one respect here: the kernel doesn't actually modify the file reading logic of all applications. This feature is implemented via bind mount (bind mounts), and it is limited to namespaces created using the ip netns add command. If you directly use the unshare() or clone() system calls to forcefully create a namespace, this configuration file magic won't automatically take effect—you have to mount it yourself.

The Network Namespace Object (struct net)

Now let's tear away the surface and go straight to the heart.

The core data structure of a network namespace is struct net. You can think of it as the god object of that "independent world," with all network-related state hanging off of it.

The code is a bit long, but we need to chew through it piece by piece. This is the foundation for understanding everything:

struct net {
. . .
struct user_namespace *user_ns; /* Owning user namespace */
unsigned int proc_inum;
struct proc_dir_entry *proc_net;
struct proc_dir_entry *proc_net_stat;
. . .
struct list_head dev_base_head;
struct hlist_head *dev_name_head;
struct hlist_head *dev_index_head;
. . .
int ifindex;
. . .
struct net_device *loopback_dev; /* The loopback */
. . .
atomic_t count; /* To decided when the network
* namespace should be shut down.
*/
struct netns_ipv4 ipv4;
#if IS_ENABLED(CONFIG_IPV6)
struct netns_ipv6 ipv6;
#endif
#if defined(CONFIG_IP_SCTP) || defined(CONFIG_IP_SCTP_MODULE)
struct netns_sctp sctp;
#endif
. . .
#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
struct netns_ct ct;
#endif
#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
struct netns_nf_frag nf_frag;
#endif
. . .
struct net_generic __rcu *gen;
#ifdef CONFIG_XFRM
struct netns_xfrm xfrm;
#endif
. . .
};

(include/net/net_namespace.h)

Don't be intimidated by this pile of pointers. Let's break down a few key fields to see how they actually implement "isolation."

user_ns: This is the "owner" of the network namespace. It is the user namespace that created this network namespace. For the initial network namespace object (init_net), its owner is naturally the initial user namespace (init_user_ns). This is assigned in the setup_net() method.

proc_inum: Remember the inode number from the previous section's UTS namespace that uniquely identifies a namespace? It's the same here. Each network namespace has a unique inode number in the /proc filesystem. It is allocated by proc_alloc_inum() (during the net_ns_net_init() call) and released via proc_free_inum() during net_ns_net_exit() cleanup.

proc_net and proc_net_stat: Because each namespace needs its own independent /proc/net and /proc/net/stat views, there are two pointers here pointing to these procfs directory entries.

Device Management Triad:

  • dev_base_head: Points to the list head of all network devices within this namespace.
  • dev_name_head: This is a hash table where the key is the device name (e.g., eth0), used for fast lookups.
  • dev_index_head: This is also a hash table where the key is the device index (ifindex).

ifindex: This is the last allocated device index in this namespace. Note that indices are also virtualized. This means that in namespace A and namespace B, the lo device can both have an index of 1, and their respective eth0 can share the same index numbers. They do not interfere with each other.

loopback_dev: The loopback device. This is the only network device that exists by default in every newly created network namespace. It is assigned in the loopback_net_init() method (located in drivers/net/loopback.c). Remember, you cannot move the lo device from one namespace to another; it is nailed down there.

count: This is the reference count. It is initialized to 1 when the network namespace is created. get_net() increments it, and put_net() decrements it. If it drops to 0 in put_net(), it triggers __put_net(), which puts the namespace on a global cleanup list (cleanup_list) to be completely destroyed later.

Protocol Stack Private Data: The next few fields are the "private turf" of the major protocol stacks:

  • ipv4 (struct netns_ipv4): All private data for IPv4 lives here, such as routing tables, sysctl parameters, etc.
  • ipv6 (struct netns_ipv6): The corresponding structure for IPv6.
  • sctp, ct (netfilter connection tracking), xfrm (IPsec): and so on.

gen (struct net_generic): There is an engineering trade-off here. If every subsystem wanted to stuff private data into struct net, this structure would bloat beyond recognition. To prevent struct net from turning into a junkyard, the kernel introduced net_generic. It is a generic pointer array that allows optional subsystems (like the sit module) to request an ID here and store their own private data without directly modifying the definition of struct net.


To give you a more intuitive feel for the depth of this "isolation," let's dive into netns_ipv4. This is far more than just changing an IP address; it's a clone of the entire IPv4 worldview:

struct netns_ipv4 {
. . .
#ifdef CONFIG_IP_MULTIPLE_TABLES
struct fib_rules_ops *rules_ops;
bool fib_has_custom_rules;
struct fib_table *fib_local;
struct fib_table *fib_main;
struct fib_table *fib_default;
#endif
. . .
struct hlist_head *fib_table_hash;
struct sock *fibnl;
struct sock **icmp_sk;
. . .
#ifdef CONFIG_NETFILTER
struct xt_table *iptable_filter;
struct xt_table *iptable_mangle;
struct xt_table *iptable_raw;
struct xt_table *arptable_filter;
#ifdef CONFIG_SECURITY
struct xt_table *iptable_security;
#endif
struct xt_table *nat_table;
#endif
int sysctl_icmp_echo_ignore_all;
int sysctl_icmp_echo_ignore_broadcasts;
. . .
int sysctl_tcp_ecn;
. . .
long sysctl_tcp_mem[3];
. . .
#ifdef CONFIG_IP_MROUTE
#ifndef CONFIG_IP_MROUTE_MULTIPLE_TABLES
struct mr_table *mrt;
#else
struct list_head mr_tables;
. . .
#endif
#endif
};

(net/netns/ipv4.h)

See? Here we have FIB routing tables (fib_local, fib_main, etc.); Here we have Netfilter tables (iptable_filter, nat_table, etc.); Here we have multicast routing tables (mrt); Here we even have sysctl parameters (like sysctl_tcp_mem).

Returning to our "virtual router" analogy: you should now be able to see that the fields in netns_ipv4 are the routing rule configuration panel, firewall configuration panel, and TCP parameter tuning knobs inside that virtual router.

Device and Socket Ownership

Since the world is now divided into so many copies, how do network devices and sockets know which copy they belong to?

For network devices (struct net_device): The kernel adds a member called nd_net, which is a pointer to struct net.

  • Use dev_net_set() to set ownership.
  • Use dev_net() to query ownership.

The rule is simple: a network device can only belong to one namespace at any given time. When you register a device, or move a device from one namespace to another, this pointer gets updated.

Let's look at a real-world example—the scenario of registering a VLAN device:

static int register_vlan_device(struct net_device *real_dev, u16 vlan_id)
{
struct net_device *new_dev;

Suppose we want to create a VLAN device. Which namespace should this new device belong to? The answer is: inherited from the underlying physical device. Here, real_dev is the physical NIC. We first get its namespace:

struct net *net = dev_net(real_dev);
. . .
new_dev = alloc_netdev(sizeof(struct vlan_dev_priv), name, vlan_setup);

if (new_dev == NULL)
return -ENOBUFS;

The device is allocated. Now, through dev_net_set(), we register its "residency" in the namespace we just looked up:

dev_net_set(new_dev, net);
. . .
}

For sockets (struct sock): The logic is the same. The kernel adds a sk_net pointer to struct sock.

  • Use sock_net_set() to set it.
  • Use sock_net() to query it.

The same restriction applies: a socket can only belong to one namespace. When you create a socket in a certain namespace, it is locked there. Cross-namespace communication must go through some sort of "gateway" mechanism, not direct access.

The Life and Death of Namespaces: pernet_operations

At system boot, the kernel creates a default network namespace: init_net. In the early moments after power-on, all physical NICs, all sockets, and even the lo device belong to it.

As the system runs, we create new namespaces. At this point, the kernel faces a problem: some subsystems (like the IPv4 one we just mentioned, or certain driver modules) need to do some initialization work (like creating files under the new /proc/net) when each new namespace is created, and do some cleanup work when the namespace is destroyed.

To solve this requirement, the kernel introduced struct pernet_operations. This is a callback mechanism:

struct pernet_operations {
. . .
int (*init)(struct net *net);
void (*exit)(struct net *net);
. . .
int *id;
size_t size;
};

(include/net/net_namespace.h)

If you are writing a network device driver or subsystem, you need to define a pernet_operations object, implement the init and exit callbacks, and then call:

  • register_pernet_device(): for devices.
  • register_pernet_subsys(): for subsystems.

Let's look at a practical example: the PPPoE module.

The PPPoE module needs to export session information to /proc/net/pppoe. Because it needs to support namespaces, it must create this proc file in every new namespace when that namespace is created.

It defines its own pernet_operations:

static struct pernet_operations pppoe_net_ops = {
.init = pppoe_init_net,
.exit = pppoe_exit_net,
.id = &pppoe_net_id,
.size = sizeof(struct pppoe_net),
}

(net/ppp/pppoe.c)

In the init callback pppoe_init_net(), it does one thing: creates the pppoe file under the current namespace's proc_net.

static __net_init int pppoe_init_net(struct net *net)
{
struct pppoe_net *pn = pppoe_pernet(net);
struct proc_dir_entry *pde;

rwlock_init(&pn->hash_lock);

pde = proc_create("pppoe", S_IRUGO, net->proc_net, &pppoe_seq_fops);
#ifdef CONFIG_PROC_FS
if (!pde)
return -ENOMEM;
#endif
return 0;
}

And in the exit callback pppoe_exit_net(), it is responsible for deleting the file:

static __net_exit void pppoe_exit_net(struct net *net)
{
remove_proc_entry("pppoe", net->proc_net);
}

How does the network namespace module register itself?

The namespace module itself is also a subsystem. It registers net_ns_ops during the boot phase:

static struct pernet_operations __net_initdata net_ns_ops = {
.init = net_ns_net_init,
.exit = net_ns_net_exit,
};

static int __init net_ns_init(void)
{
. . .
register_pernet_subsys(&net_ns_ops);
. . .
}

(net/core/net_namespace.c)

Whenever a new namespace is created, net_ns_net_init is called. It does only one thing: allocates that unique proc inode number (proc_inum):

static __net_init int net_ns_net_init(struct net *net)
{
return proc_alloc_inum(&net->proc_inum);
}

Upon destruction, it calls net_ns_net_exit to release it:

static __net_exit void net_ns_net_exit(struct net *net)
{
proc_free_inum(net->proc_inum);
}

How to Create a Network Namespace?

Alright, enough theory. When you actually want to build a new world, you have three paths to choose from:

  1. The Hardcore Way: Write a C program using the clone() or unshare() system calls with the CLONE_NEWNET flag. This is like building a house with your own bare hands.
  2. The Tooling Way: Use the ip netns command from the iproute2 package. This is the most common approach, and we'll see it shortly.
  3. The Lazy Way: Use the unshare command-line tool provided by util-linux, with the --net parameter.

No matter which path you choose, the destination is the same: a new struct net is allocated, a new lo device is created, and you are the god of this new world.