9.4 Connection Tracking Entries

In the previous section, we discussed nf_conntrack_tuple—that "one-way ticket." Now the question arises: with this ticket in hand, where exactly does the kernel look for the corresponding "passenger record"? What does the structure that holds the connection state, known as struct nf_conn, actually look like?

This is the heart of connection tracking.

Anatomy of a Connection Tracking Entry: `struct nf_conn`

This structure is fairly large, so let's break it down. You can find its definition in include/net/netfilter/nf_conntrack.h.

struct nf_conn {
        /* Usage count in here is 1 for hash table/destruct timer, 1 per skb,
           plus 1 for any connection(s) we are `master' for */
        struct nf_conntrack ct_general;

        spinlock_t lock;

        /* XXX should I move this to the tail ? - Y.K */
        /* These are my tuples; original and reply */
        struct nf_conntrack_tuple_hash tuplehash[IP_CT_DIR_MAX];

        /* Have we seen traffic both ways yet? (bitset) */
        unsigned long status;

        /* If we were expected by an expectation, this will be it */
        struct nf_conn *master;

        /* Timer function; drops refcnt when it goes off. */
        struct timer_list timeout;

        . . .

        /* Extensions */
        struct nf_ct_ext *ext;
#ifdef CONFIG_NET_NS
        struct net *ct_net;
#endif

        /* Storage reserved for other modules, must be the last member */
        union nf_conntrack_proto proto;
};

Several fields here are critical and worth a closer look.

1. tuplehash[IP_CT_DIR_MAX]: Bidirectional Fingerprints

Remember the "one-way ticket" analogy from the last section? If you're only holding an outbound ticket, you can't get back. A complete connection consists of two directions: the original direction and the reply direction.

This array stores the tuples for both directions:

tuplehash[0] (or IP_CT_DIR_ORIGINAL): The flow in the original direction (e.g., the SYN packet you send to the server).
tuplehash[1] (or IP_CT_DIR_REPLY): The flow in the reply direction (the server's SYN+ACK back to you).

Both elements are inserted into the hash table. No matter which direction a packet comes from, the kernel can calculate the hash value and find this nf_conn structure.

2. status: The Connection State Machine

This is a bitmap. When a connection is first created, it might be in the IP_CT_NEW state; once both sides have successfully communicated (e.g., the TCP handshake is complete, or a UDP reply has been received), it transitions to IP_CT_ESTABLISHED.

You can see the ip_conntrack_info enumeration in include/uapi/linux/netfilter/nf_conntrack_common.h, which defines all possible states.

3. master: The Master-Slave Relationship

This pointer points to the "master connection." What does this mean? Consider the FTP protocol: it runs a control connection on the well-known port 21, but data transfers use separate ports. The kernel knows that the connection on port 21 is the "boss," and the dynamically opened data connection is the "minion." master is the pointer that allows the minion to find its boss.

This is typically set by the init_conntrack() method when it discovers that a packet matches an "expected connection."

4. timeout: The Ticking Time Bomb

Connections don't exist forever. Every nf_conn has a timer attached to it. If no traffic is seen for a period of time, the timer expires, the connection is destroyed, and its memory is reclaimed.

For UDP: If no reply is received, the timeout is short; if both sides are exchanging packets, the timeout is longer.
When we call __nf_conntrack_alloc() to allocate this object, the timer is set to death_by_timeout()—which sounds pretty hardcore.

5. ext and proto: Extensibility

ext: Points to an extension area. Some modules need to store private data in the connection tracking entry (e.g., timestamps, billing information). Instead of modifying the main structure, they can just hang it here.
proto: This is a union reserved for protocol layers (TCP/UDP/ICMP) to store their own private data. Placing it at the end avoids wasting memory space due to padding.

The Entry Point for Connection Tracking: `nf_conntrack_in()`

Now that we've seen the structure, let's look at how the kernel actually uses it. When a packet enters the Netfilter framework and connection tracking is enabled, it ultimately calls nf_conntrack_in().

This function is essentially the central nervous system of the entire conntrack subsystem.

unsigned int nf_conntrack_in(struct net *net, u_int8_t pf, unsigned int hooknum,
                              struct sk_buff *skb)
{
        struct nf_conn *ct, *tmpl = NULL;
        enum ip_conntrack_info ctinfo;
        struct nf_conntrack_l3proto *l3proto;
        struct nf_conntrack_l4proto *l4proto;
        unsigned int *timeouts;
        unsigned int dataoff;
        u_int8_t protonum;
        int set_reply = 0;
        int ret;

Step 1: Have We Seen This Packet Before?

First, it checks whether the packet has already been processed (e.g., it looped through a loopback device, or was marked as untracked).

        if (skb->nfct) {
                /* Previously seen (loopback or untracked)?  Ignore. */
                tmpl = (struct nf_conn *)skb->nfct;
                if (!nf_ct_is_template(tmpl)) {
                        NF_CT_STAT_INC_ATOMIC(net, ignore);
                        return NF_ACCEPT;
                }
                skb->nfct = NULL;
        }

If it's a template connection, it will be temporarily saved and potentially reattached later (this is used in certain special TCP handling scenarios).

Step 2: Confirming Identity (L3 and L4 Protocols)

Next, the kernel confirms which Layer 3 protocol this is. For IPv4, it looks for nf_conntrack_l3proto_ipv4.

        l3proto = __nf_ct_l3proto_find(pf);

Then comes the most important part: extracting the Layer 4 protocol number. Only by knowing whether the packet is TCP or UDP can the kernel find the corresponding handling logic.

        ret = l3proto->get_l4proto(skb, skb_network_offset(skb),
                                   &dataoff, &protonum);
        if (ret <= 0) {
           . . .
                ret = -ret;
                goto out;
        }

        l4proto = __nf_ct_l4proto_find(pf, protonum);

Here, ipv4_get_l4proto() strips open the IP header and tells us what the Protocol field is (6 for TCP, 17 for UDP). Once we have protonum, we immediately locate the corresponding L4 protocol object, l4proto.

Step 3: Error Checking

We're not done yet. The L4 protocol might have its own quirks. Before actually starting tracking, the protocol module checks whether the packet is malformed.

udp_error(): Verifies the checksum and packet length.
tcp_error(): Checks whether the flags are valid and the sequence numbers make sense.

        if (l4proto->error != NULL) {
                ret = l4proto->error(net, tmpl, skb, dataoff, &ctinfo,
                                     pf, hooknum);
                if (ret <= 0) {
                        NF_CT_STAT_INC_ATOMIC(net, error);
                        NF_CT_STAT_INC_ATOMIC(net, invalid);
                        ret = -ret;
                        goto out;
                }
                /* ICMP[v6] protocol trackers may assign one conntrack. */
                if (skb->nfct)
                        goto out;
        }

If an error is detected here, the packet is dropped immediately, and the statistical counter invalid is incremented.

Step 4: The Core Operation—Lookup or Create

This is the most critical step. resolve_normal_ct() does the following:

Calculate the hash: Calls hash_conntrack_raw() to compute a hash value based on the packet's 5-tuple.
Lookup: Uses the hash value to search the hash table for a __nf_conntrack_find_get() to see if an existing record exists.
Create: If not found, this is a new connection. It calls init_conntrack() to allocate a new nf_conn structure.
Add to the unconfirmed list: Note that at this point, the new connection doesn't go directly into the hash table. Instead, it is thrown into the "unconfirmed list."

Why is there an "unconfirmed list"? Because the packet still has to pass through subsequent NAT or filter modules. If it gets DROPped along the way, this connection entry shouldn't exist. Only when the packet successfully passes all checks does __nf_conntrack_confirm() get called to move it into the official hash table. This list hangs off the network namespace's netns_ct.

        ct = resolve_normal_ct(net, tmpl, skb, dataoff, pf, protonum,
                                l3proto, l4proto, &set_reply, &ctinfo);
        if (!ct) {
                /* Not valid part of a connection */
                NF_CT_STAT_INC_ATOMIC(net, invalid);
                ret = NF_ACCEPT;
                goto out;
        }
        if (IS_ERR(ct)) {
                /* Too stressed to deal. */
                NF_CT_STAT_INC_ATOMIC(net, drop);
                ret = NF_DROP;
                goto out;
        }

If IS_ERR(ct) is true, it means the kernel is too busy (e.g., the hash table is full), and it directly returns NF_DROP.

At the same time, resolve_normal_ct() attaches the found (or newly created) nf_conn pointer to the SKB's nfct member, and stuffs the connection state (e.g., IP_CT_NEW) into nfctinfo. This way, other modules downstream can glance at the SKB and immediately know who this connection belongs to and what state it's in.

Step 5: Handling Protocol State and Timeouts

Now that we have the connection object, we need to update its state.

First is the timeout policy. For UDP, if the connection is unidirectional (no reply has been seen), the timeout might be 30 seconds; if it's bidirectional, it might be 180 seconds. nf_ct_timeout_lookup() is responsible for deciding this number.

        /* Decide what timeout policy we want to apply to this flow. */
        timeouts = nf_ct_timeout_lookup(net, ct, l4proto);

Then, it calls the specific protocol handler. For example, UDP calls udp_packet(), and TCP calls tcp_packet().

        ret = l4proto->packet(ct, skb, dataoff, ctinfo, pf, hooknum, timeouts);
        if (ret <= 0) {
                 /* Invalid: inverse of the return code tells
                  * the netfilter core what to do */
                pr_debug("nf_conntrack_in: Can't track with proto module\n");
                nf_conntrack_put(skb->nfct);
                skb->nfct = NULL;
                NF_CT_STAT_INC_ATOMIC(net, invalid);
                if (ret == -NF_DROP)
                        NF_CT_STAT_INC_ATOMIC(net, drop);
                ret = -ret;
                goto out;
        }

Inside udp_packet(), it calls nf_ct_refresh_acct() to refresh the timer. If this is the first time a reply has been seen, it marks the connection as "replied."

Step 6: Event Triggering and Cleanup

If the set_reply flag is set (indicating this is the first packet in the reverse direction), an event also needs to be triggered to tell other subsystems, "Hey, this connection is now fully established."

        if (set_reply && !test_and_set_bit(IPS_SEEN_REPLY_BIT, &ct->status))
                 nf_conntrack_event_cache(IPCT_REPLY, ct);

Finally, it handles any previous template tmpl and drops the reference count.

        out:
        if (tmpl) {
                 /* Special case: we have to repeat this hook, assign the
                  * template again to this packet. We assume that this packet
                  * has no conntrack assigned. This is used by nf_ct_tcp. */
                 if (ret == NF_REPEAT)
                        skb->nfct = (struct nf_conntrack *)tmpl;
                 else
                        nf_ct_put(tmpl);
        }

        return ret;
}

The Final Piece of the Puzzle: Confirmation

We mentioned earlier that newly created connections are first thrown into the "unconfirmed list." How do they become official?

This happens in the ipv4_confirm() function, which is hooked to NF_INET_POST_ROUTING and NF_INET_LOCAL_IN.

When a packet passes through without a hitch and is about to leave the protocol stack (or be handed off to a local process), ipv4_confirm() calls __nf_conntrack_confirm(). This function does two things:

Removes the connection from the unconfirmed list.
Officially inserts it into the global hash table.

Only now is this connection truly "on the job." If the packet gets DROPped by any rule along the way, this confirm function is never called, and the unconfirmed entry is eventually destroyed—ensuring that the hash table only contains genuinely live connections.

Anatomy of a Connection Tracking Entry: struct nf_conn​

The Entry Point for Connection Tracking: nf_conntrack_in()​

The Final Piece of the Puzzle: Confirmation​

Anatomy of a Connection Tracking Entry: `struct nf_conn`

The Entry Point for Connection Tracking: `nf_conntrack_in()`

The Final Piece of the Puzzle: Confirmation