Skip to main content

ch06_4

6.4 IPv4 Multicast Rx Path

In the previous section, we sorted out the IGMP protocol and the mechanism for a host joining a multicast group, and we also saw that historical "maximum 20 groups" limitation.

Now, we need to shift our perspective from "host" to "router."

When a multicast packet arrives at the network interface, if this machine is not just a host but a configured multicast router (with CONFIG_IP_MROUTE enabled and running pimd or mrouted), its journey will be completely different from that of a regular host.

In this section, we dive deep into the IPv4 Multicast Rx Path. We won't just skim the surface; we'll trace every function call to see exactly how the kernel hands the packet off to the correct queue step by step.


Entry Point: The First Turn in Route Lookup

Remember in Chapter 4, "Receiving IPv4 Multicast Packets," we briefly mentioned that multicast packets are handled by ip_route_input_mc(). At that time, we only cared about "how to find the routing table entry."

Now we need to add a bit of detail: when the kernel has multicast routing enabled, this function makes a crucial tweak.

When allocating and initializing the rtable object (the routing table entry), it doesn't just set flags—it also points the object's input callback function—the pointer that determines where the packet goes next—to ip_mr_input().

This line of code sits right there. You might not notice it normally, but once you are a router, the packet's fate is handed over from this point:

/* 在 ip_route_input_mc() 中的逻辑示意 */
if (CONFIG_IP_MROUTE)
rth->dst.input = ip_mr_input;

What does this mean? It means this packet is no longer simply a matter of "receive and deliver locally." It is now a candidate for forwarding.


Core Function: ip_mr_input()

Let's enter the world of ip_mr_input(). This function is the main entry point for a multicast router receiving packets.

First, it performs some basic checks.

int ip_mr_input(struct sk_buff *skb)
{
struct mfc_cache *cache;
struct net *net = dev_net(skb->dev);

It first checks if the packet is destined for the local machine. Note that ip_mr_input() plays a dual role: it handles both forwarding and local delivery. This is because the router itself might also be a member of a certain multicast group.

int local = skb_rtable(skb)->rt_flags & RTCF_LOCAL;
struct mr_table *mrt;

Next is a very important safety check, specifically targeting the forwarding logic.

/* Packet is looped back after forward, it should not be
* forwarded second time, but still can be delivered locally.
*/
if (IPCB(skb)->flags & IPSKB_FORWARDED)
goto dont_forward;

Translated into plain English: "I've already forwarded this packet once. Don't forward it again, or we'll end up in an infinite loop." A multicast router sends packets out, and sometimes that packet comes right back in through another interface (for example, due to a bridge or VLAN configuration). If the kernel doesn't mark IPSKB_FORWARDED, it will see the packet, look up the table again, and forward it again—an endless cycle.

Only packets that haven't been marked are eligible to enter the forwarding logic below.


Route Table Lookup: ipmr_rt_fib_lookup()

Now we need to look up the table. Although we saw in the previous section that the kernel has many tables (policy routing), under normal configurations, ipmr_rt_fib_lookup() usually just returns net->ipv4.mrt, the default multicast routing table.

mrt = ipmr_rt_fib_lookup(net, skb);
if (IS_ERR(mrt)) {
kfree_skb(skb);
return PTR_ERR(mrt);
}
if (!local) {

If the table lookup fails, the packet is dropped outright. If the packet is destined for the local machine (local is true), there is a separate handling flow later on, but right now we are focusing on the forwarding logic.


Router Alert Option: IGMP's "Express Lane"

This part of the code handles a detail of the IGMP protocol, but it is key to communication between the router and userspace daemons (pimd/mrouted).

IGMPv3 and some IGMPv2 implementations set a Router Alert Option (IPOPT_RA) in the IPv4 header when sending JOIN or LEAVE messages. It's like stamping "Urgent" on an envelope, telling the kernel: "Don't follow the normal path; show this directly to the router daemon."

The kernel checks for this option:

if (IPCB(skb)->opt.router_alert) {

If it is set, ip_call_ra_chain() is called. This function traverses the registered Raw Socket list, finds the socket belonging to the multicast routing daemon, and stuffs the packet into it.

if (ip_call_ra_chain(skb))
return 0;
}

But the real world is harsh, and not all devices play by the rules.

There is a brilliant piece of commentary in the code that points directly to some historical "pitfalls":

} else if (ip_hdr(skb)->protocol == IPPROTO_IGMP) {
/* IGMPv1 (and broken IGMPv2 implementations sort of
* Cisco IOS <= 11.2(8)) do not put router alert
* option to IGMP packets destined to routable
* groups. It is very bad, because it means
* that we can forward NO IGMP messages.
*/

Some older devices (like Cisco IOS versions prior to 11.2.8) or IGMPv1 devices don't set the Router Alert option at all. This is problematic. Without this option, the kernel might treat these IGMP messages as normal data. The routing daemon wouldn't receive them at all, and therefore couldn't maintain the multicast tree.

So the kernel has to "smuggle" things in:

struct sock *mroute_sk;

mroute_sk = rcU_dereference(mrt->mroute_sk);
if (mroute_sk) {
nf_reset(skb); /* 清除 Netfilter 跟踪,避免干扰 */
raw_rcv(mroute_sk, skb);
return 0;
}
}
}

It directly grabs mrt->mroute_sk—that kernel-reserved socket copy we established during initialization—and forcefully sends the packet to userspace via raw_rcv().

This logic guarantees: no matter how old the device is, no matter how poorly the standard is implemented, the routing daemon must receive the IGMP messages.


Cache Lookup: ipmr_cache_find()

With the IGMP messages handled, let's get back to the packet itself.

A packet has arrived, and we need to decide where to forward it. The core of multicast routing is a cache table (MFC, Multicast Forwarding Cache). This table tells the kernel: "Packets from source S to group G should be forwarded out of interfaces A, B, and C."

Right now we only have a single packet, so we need to query the table:

cache = ipmr_cache_find(mrt, ip_hdr(skb)->saddr, ip_hdr(skb)->daddr);
if (cache == NULL) {

The hash key for this lookup function is source address + group address. In multicast routing, this is the golden combo: a multicast stream is uniquely identified by (S, G).

If a match is found (cache != NULL), everyone is happy, and we jump straight to forwarding. But what if it isn't found?

This is the most complex and interesting part of multicast routing: Cache Miss handling.


Proxy Mechanism on a Miss

If an exact match isn't found, the kernel doesn't give up. It tries a fuzzy match.

This is to support the Multicast Proxy feature (although the original book doesn't go into detail, the code logic is right here).

It first checks if the incoming physical interface (skb->dev) has a slot in our virtual interface table (vif_table):

int vif = ipmr_find_vif(mrt, skb->dev);

if (vif >= 0)
cache = ipmr_cache_find_any(mrt, ip_hdr(skb)->daddr, vif);
}

If this physical interface is indeed a valid multicast virtual interface (VIF), it checks for a "wildcard source" cache rule (*_any).

If it still can't find a match after this:

if (cache == NULL) {
int vif;

At this point, the kernel has to face reality: I really don't know where to send this packet.


Handling Unknown Traffic: The Unresolved Cache Queue

When there is no record in the routing table, the kernel can't just throw the packet in the trash. It has to count on the routing daemon sitting in userspace (mrouted) to quickly learn this route.

But before the daemon can react, the packet is already on the wire.

The kernel's strategy is: take it in first, then ask for help.

First, if the packet was originally destined for the local machine, it must be delivered locally first without delay:

if (local) {
struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC);
ip_local_deliver(skb);
if (skb2 == NULL)
return -ENOBUFS;
skb = skb2;
}

Here, skb_clone() is used because the original skb might still be needed for forwarding or triggering subsequent flows. If the clone fails, it's an immediate error.

Next, the kernel starts preparing to create an "unresolved entry":

read_lock(&mrt_lock);
vif = ipmr_find_vif(mrt, skb->dev);
if (vif >= 0) {

If the incoming interface is a valid VIF, ipmr_cache_unresolved() is called.

This function does three major things:

  1. Allocate an empty shell cache entry. It calls ipmr_cache_alloc_unres() to allocate a mfc_cache structure.

    Look at this allocation function:

    static struct mfc_cache *ipmr_cache_alloc_unres(void)
    {
    struct mfc_cache *c = kmem_cache_zalloc(mrt_cachep, GFP_ATOMIC);

    if (c) {
    skb_queue_head_init(&c->mfc_un.unres.unresolved);

    It initializes a queue. What is this queue for? To store the packets that have been detained because the kernel doesn't know the route.

    Then comes the deadline:

    c->mfc_un.unres.expires = jiffies + 10*HZ;
    }
    return c;
    }

    10 seconds. This is the kernel's ultimatum to the userspace daemon. If the daemon doesn't tell the kernel how to route this path within 10 seconds via setsockopt(), this cache entry will be automatically destroyed by the kernel.

    The periodic cleanup is handled by ipmr_expire_timer, which regularly scans the mfc_unres_queue queue and purges expired entries.

  2. Throw the packet into the holding cell. After creating the entry, the kernel hangs the current packet that triggered the "miss" onto this entry's unresolved queue. Note that this queue can't grow indefinitely. Typically, the kernel only allows about 3 packets to be temporarily stored (the code logic in ipmr_cache_unresolved reflects this; older packets are dropped if the limit is exceeded), preventing an attacker from blowing up memory.

  3. Alert userspace. This is the most critical step. The kernel calls ipmr_cache_report() to construct a special message: IGMPMSG_NOCACHE.

    This is not a real IGMP protocol message, but an internally defined kernel message type. It is sent to the userspace daemon via that previously reserved mroute_sk socket:

    "Hey, I see traffic from source S to group G, but I don't know how to forward it. Here's the packet header info. Look up your table and come back to fill the gap!"

    Upon receiving this message, the userspace daemon checks its own multicast routing table (for routes learned via DVMRP or PIM, for example). If it finds a match, it calls setsockopt() with the MRT_ADD_MFC command to fill this route back into the kernel.

    This brings us back to the configuration flow we discussed in the previous section.

    Once the kernel receives the MRT_ADD_MFC, ipmr_mfc_add() is called. That "unresolved" entry instantly becomes a "resolved" valid route, and the detained packets in the queue are immediately released for forwarding.


Cache Hit: ip_mr_forward()

If the preceding ipmr_cache_find() successfully found a cache entry, or if userspace has already filled the gap, the code arrives here:

read_lock(&mrt_lock);
ip_mr_forward(net, mrt, skb, cache, local);
read_unlock(&mrt_lock);

ip_mr_forward() is the star of the next section. It is responsible for duplicating the packet based on the information in cache and sending the copies out on the corresponding virtual interfaces.

If this packet also requires local delivery (local is true), after forwarding is complete, it calls ip_local_deliver() to pass it up to the upper protocol stack:

if (local)
return ip_local_deliver(skb);

return 0;

Summary: The Truth of the Flow

Stringing this entire flow together, you'll find that the kernel's multicast receive path is very much like an experienced but strict traffic cop:

  1. Check credentials: Does it carry a Router Alert? If so, send it straight down the express lane to the routing daemon.
  2. Check the blacklist: Has it already made a full loop? If so, stop immediately.
  3. Check the map: Look up (S, G) in the MFC cache.
  4. Not found?:
    • If it's for me, take it in first.
    • If it's for me to forward, detain it first, send an urgent telegram to the back office (userspace), and wait 10 seconds to see if they come back to fill the gap.
  5. Found: Hand it to ip_mr_forward() for distribution.

This mechanism ensures that the kernel doesn't need to run complex and bloated routing protocols (like PIM) in kernel space. It offloads the heavy decision-making logic to userspace, keeping itself responsible only for fast forwarding and cache management.

In the next section, we'll dive into the called ip_mr_forward() to see exactly how "distribution" is implemented, and how TTL thresholds prevent multicast traffic from flooding the network.