Skip to main content

9.8 The Dance Between NAT Hook Callbacks and Conntrack Hook Callbacks

In the previous section, we registered the nf_nat_ipv4_ops hook array into the kernel, much like setting up checkpoints on a highway that every packet must pass through. But if you look closely at these checkpoints, you'll notice something interesting: at some of them, both conntrack is checking IDs and NAT is modifying addresses. They crowd the same hook point, and the order in which they execute is a detail that can literally make or break the connection.

The Priority Game at Hook Points

Take the NF_INET_PRE_ROUTING hook point as an example. This is the first checkpoint all inbound packets reach.

At this point, two key callbacks are registered:

  1. Conntrack callback: ipv4_conntrack_in(), with a priority of NF_IP_PRI_CONNTRACK (-200).
  2. NAT callback: nf_nat_ipv4_in(), with a priority of NF_IP_PRI_NAT_DST (-100).

Under Netfilter's rules, a lower priority value means earlier execution. This means conntrack at -200 will execute before NAT at -100.

Why is this order so important? Because NAT relies heavily on conntrack.

Look at the code in nf_nat_ipv4_fn()—it's one of the core logic paths for the NAT callback (during the PRE_ROUTING phase, nf_nat_ipv4_in calls it directly):

static unsigned int nf_nat_ipv4_fn(unsigned int hooknum,
struct sk_buff *skb,
const struct net_device *in,
const struct net_device *out,
int (*okfn)(struct sk_buff *))
{
struct nf_conn *ct;
. . .
/* Don't try to NAT if this packet is not conntracked */
if (nf_ct_is_untracked(ct))
return NF_ACCEPT;
. . .
}

(net/ipv4/netfilter/iptable_nat.c)

Note

The nf_nat_ipv4_fn() method isn't just called here; it's also the core implementation of the NAT POST_ROUTING callback nf_nat_ipv4_out() that we'll discuss later.

This line of code is key: if (nf_ct_is_untracked(ct)). The essence of NAT is a table lookup and rewrite, and that table is the conntrack table. If conntrack hasn't built awareness of this packet yet (it can't find ct), NAT can't perform address translation and must let it pass (NF_ACCEPT). If the order were reversed and NAT ran first, it would query the conntrack table, find nothing, and the DNAT rule would fail.


Now look at the NF_INET_POST_ROUTING hook point, where things get a bit more complex. Three callbacks are registered here:

  1. NAT callback: nf_nat_ipv4_out(), priority NF_IP_PRI_NAT_SRC (100).
  2. Helper callback: ipv4_helper(), priority NF_IP_PRI_CONNTRACK_HELPER (300).
  3. Confirm callback: ipv4_confirm(), priority NF_IP_PRI_CONNTRACK_CONFIRM (INT_MAX, the maximum value).

In ascending order, the execution sequence is: NAT (100) → Helper (300) → Confirm (INT_MAX)

This also makes perfect sense:

  • NAT modifies the address first: It changes the source address to a public IP (SNAT).
  • Helper inspects the data next: Some protocols (like FTP) embed IP address information inside the packet payload. The helper needs to modify the payload contents based on the new address only after NAT has finished its changes.
  • Confirm finalizes last: After the dust settles, it officially confirms this conntrack entry into the hash table.

Practical Walkthrough: A DNAT Packet's Journey

Just reading the code can be abstract, so let's walk through a real-world scenario. This is a classic DNAT (Destination NAT) configuration.

Scenario setup: The machine in the middle (an AMD server) runs Linux and acts as a gateway. We want to forward incoming UDP traffic destined for its own port 9999 to the laptop on the left (192.168.1.8).

Rule:

iptables -t nat -A PREROUTING -j DNAT -p udp --dport 9999 --to-destination 192.168.1.8

(Rule: For any incoming UDP packet with a destination port of 9999, change the destination IP to 192.168.1.8)

Packet flow: The Desktop on the right sends a UDP packet to 192.168.1.9:9999.

We can use a diagram (the concept from Figure 9-4) to see this packet's complete lifecycle in the kernel:

  1. PREROUTING Hook (Netfilter entry):

    • ipv4_conntrack_in() (priority -200): The kernel identifies the packet first. "Who is this? Never seen it before, create a new connection entry."
    • nf_nat_ipv4_in() -> nf_nat_ipv4_fn() (priority -100):
      • NAT queries the conntrack entry just created.
      • It discovers the rule is DNAT, so it changes the destination IP in the IP header from 192.168.1.9 to 192.168.1.8.
      • It also updates the tuple information in the conntrack entry, recording "this packet was originally destined for 1.9, but has now been redirected to 1.8."
  2. Routing Decision:

    • The kernel checks the routing table: the destination IP is 192.168.1.8, which isn't local but is on the LAN to the left. So it decides to FORWARD the packet.
  3. FORWARD Hook:

    • This is mainly for filtering (the filter table). Assuming no rules are configured, the packet passes straight through.
  4. POST_ROUTING Hook (before leaving the local machine):

    • nf_nat_ipv4_out() (priority 100): This is the SNAT point, but in this DNAT forwarding example, this step might just do some validation or processing (if it were MASQUERADE, the source address would be modified here).
    • ipv4_helper() (priority 300): UDP typically doesn't need a helper; if this were FTP, there would be heavy lifting here.
    • ipv4_confirm() (INT_MAX):
      • A crucial step. The conntrack entry created earlier in PREROUTING has been in an "unconfirmed" state all this time.
      • Here, the kernel calls __nf_conntrack_confirm() to officially insert this entry into the global hash table. From now on, subsequent return packets for this connection can be correctly identified.

Through this flow, you should understand: conntrack is the eyes of NAT, and hook priorities ensure the eyes work before the hands and feet.


Deep Dive: How the Kernel Silently Rewrites Headers

Above we covered the flow; now let's peel back the surface and see how the kernel physically modifies the packet.

The core of NAT's implementation lies in net/netfilter/nf_nat_core.c. To support both IPv4 and IPv6, the kernel abstracts two key structures:

  • nf_nat_l3proto: Handles layer 3 (IP layer) protocol logic.
  • nf_nat_l4proto: Handles layer 4 (TCP/UDP, etc.) protocol logic.

Both of these structures contain a crucial function pointer: manip_pkt() (Manipulate Packet). This is the place that literally wields the scalpel to modify IP and TCP/UDP headers.

Let's look at a TCP protocol's manip_pkt() implementation, found in net/netfilter/nf_nat_proto_tcp.c:

static bool tcp_manip_pkt(struct sk_buff *skb,
const struct nf_nat_l3proto *l3proto,
unsigned int iphdroff, unsigned int hdroff,
const struct nf_conntrack_tuple *tuple,
enum nf_nat_manip_type maniptype)
{
struct tcphdr *hdr;
__be16 *portptr, newport, oldport;
/* TCP 连接跟踪能保证至少有 8 字节的头部 */
int hdrsize = 8;

/* 如果这是个 ICMP 错误包里包含的 TCP 片段,可能头部不全。
* 但这里我们先按正常包处理,如果是完整包就修正 hdrsize */
if (skb->len >= hdroff + sizeof(struct tcphdr))
hdrsize = sizeof(struct tcphdr);

/* 动刀前的准备:确保这块内存可写。
* 如果是只读的(比如被 clone 过的),内核会在这里做 COW,
* 这是一个昂贵的操作。 */
if (!skb_make_writable(skb, hdroff + hdrsize))
return false;

hdr = (struct tcphdr *)(skb->data + hdroff);

Next is determining which port to change. maniptype tells us whether this is Source NAT (SNAT) or Destination NAT (DNAT):

/* 根据 maniptype 决定修改源端口还是目的端口 */
if (maniptype == NF_NAT_MANIP_SRC) {
/* 修改源端口:从 tuple 的 src 里取新端口 */
newport = tuple->src.u.tcp.port;
portptr = &hdr->source;
} else {
/* 修改目的端口:从 tuple 的 dst 里取新端口 */
newport = tuple->dst.u.tcp.port;
portptr = &hdr->dest;
}

Port modification seems simple—just one assignment—but its side effect is checksum invalidation. The TCP header has a checksum, and the IP header has one too (although only the TCP logic is shown here, the IP checksum is handled by l3proto). We must recalculate them.

To calculate the checksum, we need to keep the old port:

oldport = *portptr;
*portptr = newport;

if (hdrsize < sizeof(*hdr))
return true;

/* 重新计算校验和:
* l3proto->csum_update 会处理 IP 层相关的伪头部校验和更新
* inet_proto_csum_replace2 专门处理 TCP/UDP 的校验和字段更新
* 这里用了 "2" 是因为替换的是 2 字节 (16 bit) 的端口 */
l3proto->csum_update(skb, iphdroff, &hdr->check, tuple, maniptype);
inet_proto_csum_replace2(&hdr->check, skb, oldport, newport, 0);
return true;
}

These three lines of code—oldport saving the old value, *portptr assigning the new one, and csum_replace2 invoking the update—are the micro-level embodiment of NAT technology.

In this section, we saw how the kernel uses carefully orchestrated hook priorities to make conntrack and NAT interlock like gears, and how manip_pkt precisely performs a sleight of hand within the bitstream. While all this is happening, the user process is still sleeping in recv(), completely unaware that its packet has just been secretly rewritten by the kernel.