4.2 Receiving IPv4 Packets
In the previous section, we covered the "registration" process of IPv4 — the kernel hangs ip_packet_type on the global protocol list and sets ip_rcv() as the callback function. When the NIC delivers a packet and the Ethernet type is 0x0800, the kernel heads straight for this function.
There is a very intuitive assumption here: since this is a "receive function," it should be responsible for unpacking the packet, processing it, and passing it up the stack, right?
Exactly the opposite.
The true identity of ip_rcv() is more like a gatekeeper. It doesn't care whether the payload is TCP or UDP; it cares about exactly one thing: is this a valid IPv4 packet? If yes, let it through; if no, drop it. The real "processing" work is actually handed off to the next runner in the relay, ip_rcv_finish(). And sandwiched between these two runners is an extremely critical checkpoint — Netfilter hooks.
Entering the Reception Hall
Let's look at the specific workflow of this "gatekeeper."
First, the function signature:
int ip_rcv(struct sk_buff *skb, struct net_device *dev,
struct packet_type *pt, struct net_device *orig_dev)
{
struct iphdr *iph;
When ip_rcv() gets the skb, the network layer has just stripped off the Ethernet header. Now, skb->data points to the IPv4 header. First, we need to extract the header and see if it looks "normal."
Step 1: Header Format Sanity Checks
The IPv4 header has a strict format. The RFC specifies that the header must be at least 20 bytes, and the version number must be 4.
In the kernel, the ihl field of struct iphdr represents the header length, but there's a catch — its unit is not bytes, but 4 bytes (32 bits).
So, for a standard 20-byte header, ihl should be 5 (5 x 4 = 20).
If someone sends a packet where ihl is less than 5, it means the packet can't even piece together a basic header; or if version is not 4 — for example, this is an IPv6 packet that ended up in the IPv4 slot.
For these malformed packets, the kernel's attitude is straightforward: drop them immediately and update the IPSTATS_MIB_INHDRERRORS statistic.
/* 提取 IP 头部 */
iph = ip_hdr(skb);
/* 检查:头部长度够不够?版本号对不对? */
if (iph->ihl < 5 || iph->version != 4)
goto inhdr_error;
Step 2: Checksum
After passing the format check, the next step is the "anti-counterfeit mark" — the checksum. Per RFC 1122 requirements, a host must perform a header checksum on every received datagram and silently drop packets that fail the check. There is no need to send an error message back, because it would be useless anyway — the link layer has a problem of its own.
The kernel uses ip_fast_csum() to do this. Note that this calculation applies only to the IP header, not the subsequent data payload.
/* 校验和计算:失败则返回非 0 */
if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl)))
goto inhdr_error;
Step 3: Let It Through or Intercept It?
If the packet is healthy, ip_rcv()'s job is basically done. Next, it calls the NF_HOOK macro.
This line of code is the bridge connecting "security check" and "sorting":
return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, skb, dev, NULL,
ip_rcv_finish);
Let's expand on this a bit, because NF_HOOK is everywhere in the network stack.
Netfilter Hooks: You can think of this as a "security checkpoint socket" inside the kernel. The kernel allows you to insert your own code at five key points of a packet's journey (such as iptables, nf_conntrack).
We are at the NF_INET_PRE_ROUTING hook point here — meaning the packet has just entered the stack and no routing decision has been made yet.
The logic of NF_HOOK is simple:
- Check if anyone has registered a callback function at this point.
- If so, call them.
- If no one has registered, or if the registered callback returns "accept" (
NF_ACCEPT), continue by calling the function specified by the last argument — which, in this case, isip_rcv_finish().
If the callback function returns "drop" (NF_DROP), the packet's life ends here, and ip_rcv_finish() will never be called.
There is also a return value called NF_STOLEN, which means the hook function "stole" the packet, and the subsequent legitimate path is no longer executed.
For the sake of a smooth explanation, let's temporarily assume there are no Netfilter rules interfering, the packet gets a green light all the way, and goes straight into ip_rcv_finish().
The Real Work Begins: ip_rcv_finish()
ip_rcv only "looks," while ip_rcv_finish actually "does."
The core task of this section is to determine the fate of this packet: keep it for ourselves, or forward it for someone else?
The sole basis for this answer is the routing table. Before the routing lookup, all we have is an SKB; after the lookup, the SKB will have a crucial new attachment — dst_entry.
The code logic looks like this:
static int ip_rcv_finish(struct sk_buff *skb)
{
struct iphdr *iph = ip_hdr(skb);
struct rtable *rt; // rtable 是路由表项的结构体
/* 如果 SKB 上还没挂 路由结果,就去查一下表 */
if (!skb_dst(skb)) {
int err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
iph->tos, skb->dev);
if (unlikely(err)) {
/* 这里有个特殊情况:Reverse Path Filter (RPF) */
if (err == -EXDEV)
NET_INC_STATS_BH(dev_net(skb->dev), LINUX_MIB_IPRPFILTER);
goto drop;
}
}
/* ... 省略了统计更新 ... */
return dst_input(skb);
}
Routing Lookup: Determining Fate
ip_route_input_noref() is the core function for routing queries.
It takes three key pieces of information to ask the routing subsystem:
- Destination address (
daddr) - Source address (
saddr) - Type of Service (
tos)
After consulting the table, the routing subsystem binds a dst object to this SKB. This dst object is extremely critical; it not only contains next-hop information, but more importantly, it carries an action directive.
This directive is hidden in the dst->input callback function pointer:
- If it's destined for the local machine: The lookup result sets
dst->inputtoip_local_deliver(). This means: "This one is ours, send it up to the transport layer." - If it needs to be forwarded: The lookup result sets
dst->inputtoip_forward(). This means: "This is just passing through, help send it next door." - If it's multicast (special scenario): Under certain conditions, it will be set to
ip_mr_input(), entering the multicast handling logic.
The final dst_input(skb) actually just executes that function pointer:
static inline int dst_input(struct sk_buff *skb)
{
return skb_dst(skb)->input(skb);
}
This is a classic use of C polymorphism — the routing table queries for "data," but what it returns is "code."
That Weird -EXDEV (RPF)
In the code above, there is a check for err == -EXDEV. This is actually an error from a security mechanism called Reverse Path Filter (RPF).
Its logic goes like this:
"This packet claims it was sent from IP A and came in via interface eth0. But if I check my own routing table, to send a packet to IP A, I should be going out through eth1. Since the incoming path and the return path don't match, there's something wrong with this packet (it might be a spoofed-source attack), so drop it!"
If you have RPF enabled (usually configured in procfs), the kernel will return an -EXDEV error during this table lookup and increment the LINUX_MIB_IPRPFILTER counter.
Handling IP Options
Inside ip_rcv_finish, there is also an easily overlooked piece of logic that handles the options in the IPv4 header.
Remember the ihl field in the header? Normally it's 5 (20 bytes). If it's greater than 5, it means there are additional option fields following it (such as Record Route, Timestamp, etc.).
Although IP options are rarely used in modern networks (because they cause router performance degradation), the kernel still has to support them:
/* 如果头部长度大于 5,说明有选项,需要解析 */
if (iph->ihl > 5 && ip_rcv_options(skb))
goto drop;
ip_rcv_options() reads out these options and performs necessary processing (such as updating routing records). If the option format is wrong, it will return non-zero here, causing the packet to be dropped.
The Dust Settles
After going through all the checks above:
- The header format is fine.
- The checksum passed.
- The route was found (deciding whether to receive locally or forward).
- The IP options (if any) have been processed.
At this point, the packet is finally handed off to the next stop.
- If it's local delivery: It enters
ip_local_deliver(), where fragment reassembly is handled (if applicable), and finally the TCP or UDP data is thrown to the transport layer. - If it's forwarding: It enters
ip_forward(), the TTL is decremented by 1, the checksum is recalculated, and then a table lookup sends it out through the egress NIC. - If it's multicast: It enters
ip_mr_input(), handed over to the Multicast Forwarding Cache (MFC) for processing.
It looks like a lot of steps, but under a gigabit NIC driver, this entire process runs in microseconds. Every packet is like a part on an assembly line, strictly passing through every checkpoint seamlessly. If any link goto drop, this packet simply vanishes from the world, leaving only the cold counter increments in /proc/net/snmp.
In the next section, we'll look at how packets judged to be "forwarded" go through TTL decay, checksum recalculation, and are finally sent out of this host.