Skip to main content

ch09_3

9.3 Connection Tracking Initialization: Injecting "State" into the Network Stack

In the previous section, we left a loose end, discussing how to attach our own functions to Netfilter checkpoints. But hooks alone are useless—someone actually has to do the work.

In this section, we look at the largest and most complex set of hooks in the Linux kernel—connection tracking—and how it spreads itself across the entire network stack. It is the foundation of NAT and the prerequisite for a firewall to understand "state."


Registration: From Arrays to Checkpoints

Connection tracking requires more than just one hook. To intercept packets along different paths, it defines a set of nf_hook_ops, all packed into the ipv4_conntrack_ops array:

static struct nf_hook_ops ipv4_conntrack_ops[] __read_mostly = {
{
.hook = ipv4_conntrack_in,
.owner = THIS_MODULE,
.pf = NFPROTO_IPV4,
.hooknum = NF_INET_PRE_ROUTING,
.priority = NF_IP_PRI_CONNTRACK,
},
{
.hook = ipv4_conntrack_local,
.owner = THIS_MODULE,
.pf = NFPROTO_IPV4,
.hooknum = NF_INET_LOCAL_OUT,
.priority = NF_IP_PRI_CONNTRACK,
},
{
.hook = ipv4_helper,
.owner = THIS_MODULE,
.pf = NFPROTO_IPV4,
.hooknum = NF_INET_POST_ROUTING,
.priority = NF_IP_PRI_CONNTRACK_HELPER,
},
{
.hook = ipv4_confirm,
.owner = THIS_MODULE,
.pf = NFPROTO_IPV4,
.hooknum = NF_INET_POST_ROUTING,
.priority = NF_IP_PRI_CONNTRACK_CONFIRM,
},
{
.hook = ipv4_helper,
.owner = THIS_MODULE,
.pf = NFPROTO_IPV4,
.hooknum = NF_INET_LOCAL_IN,
.priority = NF_IP_PRI_CONNTRACK_HELPER,
},
{
.hook = ipv4_confirm,
.owner = THIS_MODULE,
.pf = NFPROTO_IPV4,
.hooknum = NF_INET_LOCAL_IN,
.priority = NF_IP_PRI_CONNTRACK_CONFIRM,
},
};

Two of these play the absolute leading roles. Their priorities are extremely high (numerically very low, at -200), meaning they must run before any firewall rules:

  1. ipv4_conntrack_in: Attached to NF_INET_PRE_ROUTING.
  2. ipv4_conntrack_local: Attached to NF_INET_LOCAL_OUT.

Why these two points?

  • PRE_ROUTING: All incoming packets from the outside, whether destined for the local machine or being forwarded, pass through here. If we don't intercept here, the state of forwarded packets will be lost.
  • LOCAL_OUT: Packets originating locally. If we don't intercept here, locally sent packets will only have an "outbound" leg with no "return," leaving the connection tracking table half-dead.

As for the remaining ipv4_helper and ipv4_confirm, their priorities are lower. They handle complex edge cases (like FTP data connection negotiation) and final confirmations, which we'll cover later.


Entry Point: nf_conntrack_in()

Regardless of whether a packet comes in through ipv4_conntrack_in or ipv4_conntrack_local, it ultimately converges on a single core function: nf_conntrack_in().

This function resides in the core layer of connection tracking. It is protocol-agnostic, completely independent of whether we're dealing with IPv4 or IPv6. Its signature looks roughly like this:

unsigned int nf_conntrack_in(struct net *net,
u_int8_t pf,
unsigned int hooknum,
struct sk_buff *skb);
  • pf: Protocol family, telling it whether this is PF_INET (IPv4) or PF_INET6 (IPv6).
  • hooknum: Which hook the packet came in from (PRE_ROUTING or LOCAL_OUT).

Its task is simple: look at the packet and ask itself—"Have I seen this guy before?"

  • If yes, find the corresponding connection entry (struct nf_conn) and update its state.
  • If no, create a new entry and stuff it into the hash table.

This is the essence of connection tracking: turning a stateless network stack into a stateful session table.

⚠️ Performance Warning

As long as you enable CONFIG_NF_CONNTRACK when compiling the kernel, this set of hooks gets attached—even if you haven't written a single iptables rule.

This comes at a cost. Every single packet must pass through connection tracking, query the hash table, and calculate the hash value. If your device is a pure forwarding node that doesn't need state at all and isn't running NAT, this overhead is completely wasted. In such scenarios, consider compiling connection tracking as a module and simply not loading it—you'll save a significant amount of CPU.


The Registration Action: nf_register_hooks()

After defining the array, we eventually need to tell the kernel about it. This is done in nf_conntrack_l3proto_ipv4_init():

static int __init nf_conntrack_l3proto_ipv4_init(void)
{
int ret;

...
ret = nf_register_hooks(ipv4_conntrack_ops,
ARRAY_SIZE(ipv4_conntrack_ops));
...
}

A single nf_register_hooks call atomically registers all 6 hooks from the ipv4_conntrack_ops array in one shot.


Visualization: Where the Hooks Are

To keep you from getting lost in the code maze, let's pause and draw a diagram. Figure 9-1 below shows the exact positions of these callback functions within the network stack. You can think of it as a toll booth map for a highway network.

Figure 9-1. Connection Tracking hooks (IPv4)

(This presents the logic of the original figure: a packet comes in from Netdev, passes through ipv4_conntrack_in (PRE_ROUTING), and then diverges to either forwarding or local delivery. Locally originated packets pass through ipv4_conntrack_local (LOCAL_OUT). Finally, Helper and Confirm logic are handled at POST_ROUTING and LOCAL_IN.)

(Note: To keep the diagram from turning into a tangled mess, I've omitted complex scenarios like IPsec, fragmentation, and multicast, and I didn't draw details like ip_queue_xmit for locally sent packets. What we have here is the main highway.)


Identifying the Subject: nf_conntrack_tuple

How does connection tracking know that two packets belong to the same connection?

It can't rely solely on IP addresses, because NAT changes them; nor can it rely solely on ports, because ports get reused. It needs something that uniquely identifies a "unidirectional flow"—and that is the nf_conntrack_tuple.

struct nf_conntrack_tuple {
struct nf_conntrack_man src;

/* These are the parts of the tuple which are fixed. */
struct {
union nf_inet_addr u3;
union {
/* Add other protocols here. */
__be16 all;

struct {
__be16 port;
} tcp;
struct {
__be16 port;
} udp;
struct {
u_int8_t type, code;
} icmp;
struct {
__be16 port;
} dccp;
struct {
__be16 port;
} sctp;
struct {
__be16 key;
} gre;
} u;

/* The protocol. */
u_int8_t protonum;

/* The direction (for tuplehash) */
u_int8_t dir;
} dst;
};

This structure is somewhat like a "one-way ticket." It only describes a flow in one direction:

  • src (source): Contains the source IP and source port.
  • dst (destination): Contains the destination IP, destination port, and protocol number (TCP/UDP/ICMP, etc.).

Notice that union. Because TCP looks at ports, ICMP looks at Type and Code, and GRE looks at Keys, a union is used here to accommodate different L4 protocols. Each protocol has its own connection tracking module (such as nf_conntrack_proto_tcp.c), and they know how to extract this information from the packet and fill it in.

You can think of a tuple as the fingerprint of connection tracking.

As long as two packets have the same tuple (or are exact source/destination reversals of each other), the kernel considers them part of the same connection. This tuple is used later to look up the hash table, so the lookup speed must be extremely fast.


At this point, we've attached the hooks and know how to use tuples to identify packets. But once a connection is recorded, what does it actually look like? What exactly is hidden inside that mysterious struct nf_conn? That's what we'll dig into next.