Skip to main content

6.2 Multicast Forwarding Cache (MFC)

In the previous section, we compared mr_table to a dispatch center—holding an interface table and an unresolved queue. But you might have noticed a detail: the actual "forwarding decision" logic hasn't appeared yet.

A dispatch center alone isn't enough; we also need a core decision-making mechanism. In multicast routing, this mechanism is the MFC (Multicast Forwarding Cache).

This is arguably the most frequently "touched" data structure in the entire multicast subsystem—every incoming multicast packet has to ask it for directions.


The Cache Array: mfc_cache_array

Let's look at the big picture first. In the kernel's eyes, the most critical part of a multicast routing table (mr_table) is an array:

  • mfc_cache_array: This is an array with a length of 64 (i.e., MFC_LINES).
  • Contents: It holds individual mfc_cache objects, which are the actual routing entries.

Why an array? Because speed matters. The array index is a hash value.

MFC_HASH() The kernel uses the MFC_HASH macro to calculate the index. It takes only two parameters: the multicast group address and the source IP address. Feed these two values in, and it calculates which slot in mfc_cache_array this packet should land in.

There's a slightly confusing point here—the routing table itself.

By default, when we haven't touched any advanced policy routing, there is only one multicast routing table in the entire IPv4 network namespace (net).

  • Its reference in the kernel is net->ipv4.mrt.
  • It is created during ipmr_rules_init().
  • When you call ipmr_fib_lookup() to look up the table, it usually just hands you this table straightforwardly.

But if you enable IP_MROUTE_MULTIPLE_TABLES (multicast multi-table support), things get complicated: multiple tables will exist, and ipmr_fib_lookup() must rely on policy rules (fib rules) to decide which table to use. However, whether it's one table or many, the place where the lookup ultimately lands is always mfc_cache_array.


The Core Structure: mfc_cache

Now let's zoom in and look at the mfc_cache structure itself. It is the smallest unit of a cache entry, defined in include/linux/mroute.h:

struct mfc_cache {
struct list_head list;
__be32 mfc_mcastgrp; // 组播组地址
__be32 mfc_origin; // 源地址
vifi_t mfc_parent; // 入接口(数据包从哪个虚拟接口进来的)
int mfc_flags; // 标志位

union {
// 情况 A:还没解析出来的条目
struct {
unsigned long expires; // 过期时间
struct sk_buff_head unresolved; // 未解析队列(存数据包)
} unres;

// 情况 B:已经解析出来的条目
struct {
unsigned long last_assert;
int minvif;
int maxvif;
unsigned long bytes; // 统计:转发的字节数
unsigned long pkt; // 统计:转发的包数
unsigned long wrong_if; // 统计:接口错误的次数
unsigned char ttls[MAXVIFS]; // TTL 阈值表
} res;
} mfc_un;

struct rcu_head rcu;
};

This structure has an interesting design, hiding two different faces.

Breaking Down the Key Members

First, let's look at the fields that exist regardless of the state:

  • mfc_mcastgrp & mfc_origin: These are the two values we used for hashing—the group address and the source address. These two keys uniquely identify a multicast flow.

  • mfc_parent: This is the incoming interface. Note that multicast routing is source-oriented, so we must know which virtual interface (VIF) the packet originally arrived from to prevent loops and duplicate forwarding.

  • mfc_flags: Flags marking special attributes of this entry. There are two common ones:

    • MFC_STATIC: This is a "static" entry. It means this wasn't dynamically learned by a routing daemon (like mrouted or pimd), but was manually hardcoded into the kernel by an administrator.
    • MFC_NOTIFY: This is a notification bit. If set, it means that when this route changes, the kernel needs to notify user space via Netlink. You can check the code in rt_fill_info() and ipmr_get_route(), where this bit is examined.

Two Faces: unres and res

The most critical part of the structure is that union. It acts as the state switch for the entire MFC mechanism.

You can think of it as Schrödinger's box:

  • If the route hasn't been resolved yet, it is unres (unresolved).
  • If the route has been determined, it is res (resolved).

State 1: Unresolved (unres)

When a packet arrives for the first time and the kernel hasn't found a corresponding forwarding rule in the cache, this entry is in the unres state.

  • expires: We can't wait forever; there must be an expiration time.
  • unresolved: This is a queue (sk_buff_head). The kernel temporarily stuffs those "lost" packets into this queue to wait for the routing daemon to rescue them.

State 2: Resolved (res)

Once the routing daemon paves the way, this entry instantly switches to the res state. At this point, it's filled with data ready for work:

  • Statistics: bytes, pkt, wrong_if. These tell the administrator "how well this route is doing its job."
  • ttls[MAXVIFS]: This is a very important small array. It records the TTL threshold for each virtual interface. As we'll see when discussing forwarding later, whether a packet can be sent out from a specific interface depends entirely on comparing the packet's TTL against the value in this array.

The Real-World Dilemma: When the Cache Misses

Theoretically, the perfect flow is: packet arrives -> look up MFC -> cache hit -> forward. But reality is often: packet arrives -> look up MFC -> cache miss.

What should the kernel do then? Simply dropping the packet obviously won't work, as the connection would never be established. This is where the ipmr_cache_unresolved() function comes in to handle this awkward moment.

Its logic is highly pragmatic, even a bit "humble":

  1. Find a slot: First, create a new (or find an existing) unresolved entry in mfc_cache_array.
  2. Queue up: Hang this lost packet on the entry's unresolved queue.
  3. Call for backup: Through the mroute_sk socket, send a message (IGMPMSG_NOCACHE) to the user-space daemon, meaning: "Boss, there's a packet here that doesn't know where to go, come take a look."

However, kernel memory is limited, so we can't just queue things up mindlessly.

⚠️ Pitfall Warning: Only 3 Slots Available

Look at this code in net/ipv4/ipmr.c:

static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi, struct sk_buff *skb)
{
...
// 检查未解析队列的长度
if (c->mfc_un.unres.unresolved.qlen > 3) {
kfree_skb(skb); // 队列满了,直接丢弃
err = -ENOBUFS; // 返回 "No buffer space available"
} else {
...

The logic here is strict: if there are already 3 lost packets from the same flow waiting in the queue, the moment a 4th packet arrives, the kernel drops it immediately and returns -ENOBUFS.

Why 3? It's a protective mechanism. If the daemon crashes or reacts too slowly, the kernel can't let an unresolved queue grow infinitely and exhaust system memory. So it only gives you a buffer of 3 packets. If you still haven't resolved the route after 3 packets, stop queuing and drop packets to stay alive.

This is the true face of the MFC: it is both a fast lookup table and a state machine equipped with a "buffer queue" and an "expiration mechanism."

At this point, we have dissected the two most core components of the multicast routing table—the dispatch center (mr_table) and the decision cache (mfc_cache). But having the tables without anyone driving them isn't enough. In the next section, we'll look at the user-space role that pushes all of this forward: the multicast router.