ch06_5
6.5 The ip_mr_forward() Method
At the end of the previous section, we mentioned that after ip_mr_input() handles all the heavy lifting—complex table lookups, cache miss handling, and notifying the user-space daemon—it passes the baton to ip_mr_forward(). If ip_mr_input is the "brain" responsible for scheduling and decision-making, then ip_mr_forward is the "muscle" that does the heavy lifting. Its task is singular: based on the instructions in the MFC cache, copy and forward the packet to all the appropriate virtual interfaces (VIFs).
This process sounds simple, but in practice it is full of pitfalls: TTL checks, interface validation, special handling for (*,G) and (S,G), and even the awkward scenario where a packet sent by the router itself loops back.
Let's pop the hood on this function and see how it works.
Initial Checks and Statistics Update
Right off the bat, the function pulls out two key variables: true_vifi and vif.
static int ip_mr_forward(struct net *net, struct mr_table *mrt,
struct sk_buff *skb, struct mfc_cache *cache,
int local)
{
int psend = -1;
int vif, ct;
int true_vifi = ipmr_find_vif(mrt, skb->dev);
vif = cache->mfc_parent;
There is a subtle but important distinction here:
true_vifi: Represents the physical interface the packet actually arrived on. It is obtained by looking up the VIF table using the packet'sskb->dev.vif: Represents the parent interface recorded for this entry in the MFC cache (mfc_parent). In other words, the routing entry's expectation of where this packet "should" have come from.
If these two values do not match, a series of checks will be triggered later. Before that, however, the kernel does a quick favor—updating statistics:
cache->mfc_un.res.pkt++;
cache->mfc_un.res.bytes += skb->len;
Handling the Special Case of (*, G)
Immediately after, the code performs a special preliminary check for (*, G) type multicast routes.
if (cache->mfc_origin == htonl(INADDR_ANY) && true_vifi >= 0) {
struct mfc_cache *cache_proxy;
/* For an (*,G) entry, we only check that the incomming
* interface is part of the static tree.
*/
cache_proxy = ipmr_cache_find_any_parent(mrt, vif);
if (cache_proxy &&
cache_proxy->mfc_un.res.ttls[true_vifi] < 255)
goto forward;
}
This code handles a specific (*, G) scenario. (*, G) means "I don't care who you are (any source address), as long as you're heading to group G, I'll forward you according to this rule."
The logic here is a bit tricky: it looks for a "proxy" cache (cache_proxy). If this proxy cache exists, and the TTL threshold on the current incoming interface (true_vifi) is less than 255 (meaning forwarding is allowed), it jumps directly to the forward label to get to work. This is an optimization path that allows the shared tree of (*, G) to be forwarded directly under specific conditions, without going through stricter matching logic.
Strict Incoming Interface Check (Wrong VIF)
If the special path above wasn't taken, we arrive at the most famous "identity verification" step in multicast routing.
/*
* Wrong interface: drop packet and (maybe) send PIM assert.
*/
if (mrt->vif_table[vif].dev != skb->dev) {
Translated into plain English, this line says: The routing entry says you should have arrived on eth0, but you popped up on eth1 instead.
This is the so-called WRONGVIF error. This situation is quite common—it could be due to network topology changes, or someone might have misconfigured something. The kernel handles this very rigorously, dividing it into two cases:
Case 1: A Locally-Sent Packet Looped Back (The Loopback Nightmare)
if (rt_is_output_route(skb_rtable(skb))) {
/* It is our own packet, looped back.
* Very complicated situation...
*
* The best workaround until routing daemons will be
* fixed is not to redistribute packet, if it was
* send through wrong interface. It means, that
* multicast applications WILL NOT work for
* (S,G), which have default multicast route pointing
* to wrong oif. In any case, it is not a good
* idea to use multicasting applications on router.
*/
goto dont_forward;
}
Note the tone of the comment here—"Very complicated situation...". This is an edge case that even kernel developers find tricky: a multicast packet sent by the router itself loops back through the wrong interface due to some routing misconfiguration.
It's like mailing a letter to yourself, only for the delivery route to go in a huge circle and finally get shoved back in through the "inbox" slot. The kernel's attitude toward this is: I'm not getting involved (goto dont_forward). It drops the packet directly to avoid broadcast storms or logical deadlocks. The comment also quips that until the routing daemon is fixed, this is the best stopgap measure—while also implying that running multicast applications on a router is a bad idea to begin with.
Case 2: Genuinely Received on the Wrong Interface (PIM Assert)
If it's not a local loopback, then the packet truly arrived at the wrong door. At this point, the kernel logs the error and considers whether to start an argument (send an Assert message).
cache->mfc_un.res.wrong_if++;
if (true_vifi >= 0 && mrt->mroute_do_assert &&
/* pimsm uses asserts, when switching from RPT to SPT,
* so that we cannot check that packet arrived on an oif.
* It is bad, but otherwise we would need to move pretty
* large chunk of pimd to kernel. Ough... --ANK
*/
(mrt->mroute_do_pim ||
cache->mfc_un.res.ttls[true_vifi] < 255) &&
time_after(jiffies,
cache->mfc_un.res.last_assert + MFC_ASSERT_THRESH)) {
cache->mfc_un.res.last_assert = jiffies;
ipmr_cache_report(mrt, skb, true_vifi, IGMPMSG_WRONGVIF);
}
goto dont_forward;
}
There is a series of conditional checks here, resembling a "three-question argument":
- Is the interface valid? (
true_vifi >= 0) - Are asserts allowed? (
mrt->mroute_do_assert) - Is it the PIM-SM protocol, or is the TTL reasonable? There is a long comment here explaining that PIM-SM uses Asserts when switching from an RPT (Shared Tree) to an SPT (Shortest Path Tree), so we can't rigidly check whether the packet falls on the outgoing interface. To avoid pulling half of
pimd(the user-space PIM daemon) into the kernel, a compromise check is made here.
If all conditions are met, and enough time has passed since the last argument (last_assert)—exceeding a certain time threshold (MFC_ASSERT_THRESH)—the kernel calls ipmr_cache_report().
It's as if the router is shouting at the user-space daemon via IPC: "Hey! A packet came in from a random interface! Deal with it!" (The message type is IGMPMSG_WRONGVIF``dont_forward.)
Forwarding Loop and TTL Thresholds
If the incoming interface check passes, or if we jumped here from the (*, G) special path, we arrive at the forward label. This is the core of the forwarding logic.
First, it updates the incoming interface statistics (pkt_in, bytes_in), and then gets to work.
forward:
mrt->vif_table[vif].pkt_in++;
mrt->vif_table[vif].bytes_in += skb->len;
/*
* Forward the frame
*/
The following code has different processing logic for (*, *), (*, G), and (S, G). Let's look at the most special case first, (*, *):
if (cache->mfc_origin == htonl(INADDR_ANY) &&
cache->mfc_mcastgrp == htonl(INADDR_ANY)) {
if (true_vifi >= 0 &&
true_vifi != cache->mfc_parent &&
ip_hdr(skb)->ttl >
cache->mfc_un.res.ttls[cache->mfc_parent]) {
/* It's an (*,*) entry and the packet is not coming from
* the upstream: forward the packet to the upstream
* only.
*/
psend = cache->mfc_parent;
goto last_forward;
}
goto dont_forward;
}
(*, *) is typically a wildcard rule used to forward packets upstream. If the packet didn't come from upstream (true_vifi != cache->mfc_parent) and the TTL hasn't decayed to zero, it throws a copy upstream (psend is set to the parent interface) and jumps directly to the send logic.
Core Forwarding Loop: Iterating the VIF Table
For normal (S, G) or (*, G) entries, the kernel iterates over all virtual interfaces (vif_table) recorded in the MFC entry to decide which ports to send to.
for (ct = cache->mfc_un.res.maxvif - 1;
ct >= cache->mfc_un.res.minvif; ct--) {
/* For (*,G) entry, don't forward to the incoming interface */
if ((cache->mfc_origin != htonl(INADDR_ANY) ||
ct != true_vifi) &&
ip_hdr(skb)->ttl > cache->mfc_un.res.ttls[ct]) {
if (psend != -1) {
struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC);
if (skb2)
ipmr_queue_xmit(net, mrt, skb2, cache,
psend);
}
psend = ct;
}
}
This loop is the essence of multicast distribution. It counts down from maxvif to minvif. For each potential outgoing ct, it does two things that determine life or death:
- Is it a U-turn? If it is
(*, G), it absolutely must not send the packet back out the interface it came in on (ct != true_vifi), otherwise this becomes an infinite loop. - Is the TTL sufficient? This is the famous TTL threshold check. Each interface has a threshold (
ttls[ct]), and only when the packet's TTL (Time To Live) is greater than this value is it allowed to go out this interface. This is the most effective mechanism for controlling multicast scope and preventing network-wide flooding.
If the conditions are met, and we already have a pending interface to send to (psend != -1), it means we need to send multiple copies. So the kernel skb_clone()s a new skb, calls ipmr_queue_xmit() to send the packet for the previous interface, and then records the current interface (ct) into psend, preparing to send it in the next loop iteration or after the loop ends.
Final Wrap-up: Local Delivery and Destruction
After the loop ends, we might still be holding the index of the last unsent interface (psend), or we might not have any at all. At the last_forward label:
last_forward:
if (psend != -1) {
if (local) {
struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC);
if (skb2)
ipmr_queue_xmit(net, mrt, skb2, cache, psend);
} else {
ipmr_queue_xmit(net, mrt, skb, cache, psend);
return 0;
}
}
dont_forward:
if (!local)
kfree_skb(skb);
return 0;
}
The logic here handles the "local delivery" flag.
- If
psend != -1(there is somewhere to send):- If
localis true (meaning the local host is also one of the receivers), we mustclonea packet to send to thepsendinterface. This is because the originalskbmight still need to be handed to the upper-layer protocol stack (although we don't see code directly delivering to the local machine in this function—it's usually handled in the upper-layerip_mr_input—keeping the skb here is a safety measure). - If
localis false, meaning it's pure forwarding, we can just give the original skb directly toipmr_queue_xmit(), saving a clone and returning immediately.
- If
Finally, if we reach dont_forward, it means the packet wasn't forwarded. If the local host also doesn't want it (!local), then we can only kfree_skb(skb)—its mission is over.
With this, ip_mr_forward()'s task is complete. It acts as a cold yet efficient distribution center, copying multicast flows to every corner of the network based on strict rules (TTL, incoming interface matching).
So what does ipmr_queue_xmit() do? It is responsible for actually pushing the packet into the transmit queue, and handling any potential tunnel encapsulation. That is the topic of the next section.