Skip to main content

6.0 Introduction: How Do You Send One Letter to a Whole Crowd?

Here's a counterintuitive problem in the networking world: if you want to send the exact same email to a hundred people simultaneously, what do you do?

The most straightforward approach—and the dumbest—is to send it a hundred times. Each letter is independently encapsulated, independently routed, and flies across every corner of the network until it reaches those hundred different mailboxes. Logically, this works, but in terms of bandwidth, it's a disaster. Imagine an NFL live stream where the server has to run a separate network cable to every single online user—even if we melted down every fiber optic cable on the planet, it wouldn't be enough.

We need a mechanism that allows the network to understand "this letter belongs to a certain group," and then replicate it only where necessary.

This is multicast.

It sounds like a perfect solution, but it introduces a problem far more complex than unicast: how do we manage this dynamic "group"? In the unicast world, we only need to care about "where does this letter go?"; in the multicast world, routers must constantly track "who wants to listen," "who doesn't want to listen anymore," and "who is sending." If managed poorly, multicast packets will flood the entire network like a tidal wave, overwhelming switches and routers to death.

The old network stack was helpless against this. We need a whole new language that allows hosts and routers to negotiate membership, and allows routers to build a distribution topology.

Our mission in this chapter is to break down how this system operates within the Linux kernel. We will see how the IGMP protocol manages the "membership roster," how the kernel maintains that special multicast routing table, and what kind of magic a packet goes through when it enters the multicast world to ultimately be delivered to thousands of receivers.

Don't rush to the code just yet. Let's start with the most basic handshake protocol—when you say "I want to join," what exactly happens behind the scenes in the network?


6.1 The IGMP Protocol

IGMP is the cornerstone of IPv4 multicast. As long as you want to play with multicast in the IPv4 world, whether you are a host or a router, you can't escape this protocol. As for IPv6, it uses a different protocol called MLD (Multicast Listener Discovery), which is based on ICMPv6—we'll save that for Chapter 8.

IGMP's job is very pure: establish and manage multicast membership. Although the task is simple, IGMP itself has gone through three versions of iteration as requirements evolved. Each version patched the logical loopholes of its predecessor.

IGMPv1: The Most Primitive Intuition

The earliest IGMPv1 (RFC 1112) was extremely simple—so simple that it only had two message types:

  1. Host Membership Report: Used by a host to shout "I want to join this group!".
  2. Host Membership Query: Used by a router to ask "Is anyone on this LAN still listening to this group?".

The logic is very intuitive:

  • When a host wants to join a group, it sends a Report.
  • To maintain the membership list, routers periodically send a Query. This query is usually sent to 224.0.0.1 (IGMP_ALL_HOSTS, the "all hosts" address).
  • To prevent query messages from spreading across the entire internet, the TTL of Query messages is hard-locked to 1. This means it can never leave the local LAN—this is for security, and also to reduce noise.

IGMPv2: Solving the Awkwardness of Leaving

v1 had a very practical problem: silence means leaving.

If you are a host and you shut down or unplug your network cable, you can't send an "I'm out" message. The router can only rely on a timeout to determine that you have left. This is passive and inefficient.

IGMPv2 (RFC 2236) patched this gap. It expanded the message types, adding three new messages, the most critical of which is Leave Group (0x17).

  • Membership Query (0x11): This is no longer just a single broadcast; it has two subtypes:
    • General Query: Same as before, asking "which groups have listeners?".
    • Group-Specific Query: Asking "is anyone still in this specific group?"—usually sent because someone sent a Leave, prompting the router to specifically confirm.
  • Version 2 Membership Report (0x16): The report in v2 format.
  • Leave Group (0x17): This is the proactive leave message. The host politely tells the router "I'm not listening anymore," and the router can immediately stop forwarding data for that group to this network segment, instead of foolishly waiting for a timeout.

⚠️ Warning IGMPv2 did not abandon v1. For backward compatibility with legacy devices, a v2 router must be able to understand v1 Report messages (RFC 2236, section 2.1). This kind of historical baggage is everywhere in network protocols.

IGMPv3: I Don't Want to Listen to That Guy

By v3 (RFC 3376, later updated by RFC 4604), the protocol became much more fine-grained. v3 introduced a powerful feature: source filtering.

In the v2 era, you could only choose "I want to listen to channel 224.1.1.1." But if you only trusted data from a specific source, or particularly hated the noise from a certain source, you had no way to reject it. v3 changed this: when joining a group, you can specify an include list of source addresses, or explicitly say "anyone except so-and-so" (exclude).

This sounds great, but it means the kernel's Socket API had to be extended accordingly (RFC 3678), and the application layer had to cooperate with the changes as well.

The Kernel Perspective: igmp_heard_query()

When we look at these things at the kernel level, they are no longer just concepts in RFCs, but actual code execution.

Routers send a query to 224.0.0.1 approximately every two minutes. When a host receives this IGMP_HOST_MEMBERSHIP_QUERY message, how does the kernel react?

The code logic is here (net/ipv4/igmp.c):

/* 当接收到 IGMP 查询时的核心处理逻辑 */
int igmp_rcv(struct sk_buff *skb)
{
// ... 省略头部检查 ...

switch (ih->type) {
case IGMP_HOST_MEMBERSHIP_QUERY:
// 路由器在问:“谁还在?”
igmp_heard_query(in_dev, skb);
break;
// ... 其他消息类型处理 ...
}
}

This igmp_heard_query() method is the trigger for the kernel to say "I'm listening." It resets the host's timer and prepares to send a Report. This ensures that as long as there is one live host on the network segment, the router will know the group exists.

💡 Note The kernel's IPv4 IGMP implementation is mainly concentrated in three files:

  • net/core/igmp.c (core logic)
  • include/linux/igmp.h (kernel-internal header file)
  • include/uapi/linux/igmp.h (userspace interface definitions)

That covers the protocol layer. Now let's take a step deeper.

The host says "I want to join," the router hears it, and then what? The router needs a place to record these things—not only who is listening, but also where to forward data packets when they arrive.

This place is the multicast routing table.


The Multicast Routing Table

If IGMP is "raising your hand to sign up," then the multicast routing table is the "roll caller" holding the roster.

In the kernel, this table isn't a piece of paper, but a structure called mr_table. It is the core brain of IPv4 multicast routing. Let's see what it looks like:

struct mr_table {
struct list_head list;
#ifdef CONFIG_NET_NS
struct net *net;
#endif
u32 id;
struct sock __rcu *mroute_sk;
struct timer_list ipmr_expire_timer;
struct list_head mfc_unres_queue;
struct list_head mfc_cache_array[MFC_LINES];
struct vif_device vif_table[MAXVIFS];
. . .
};

The code isn't long, but every field hides a mechanism. Let's break them down one by one:

1. Context and Identity

  • net: This is a pointer to the network namespace. By default, it is init_net (the initial namespace). If you are doing containerized networking, this field is critical—it ensures that the multicast tables of different containers are isolated. (We'll dive into namespaces in Chapter 14; for now, just know it's used for isolation).
  • id: The ID number of this table. In single-table mode, it is usually RT_TABLE_DEFAULT (253).

2. The Kernel-Userspace Handshake: mroute_sk

  • mroute_sk: This pointer is quite interesting. It points to a userspace socket reference reserved by the kernel.

    There is a very crucial interaction logic here:

    • When a userspace multicast routing daemon (like mrouted or pimd) starts, it calls setsockopt(), passing in the MRT_INIT command.
    • After receiving this command, the kernel initializes this mroute_sk pointer.
    • When the daemon exits, it calls setsockopt() passing in MRT_DONE, and the kernel sets this pointer to NULL.

    Why do this? Because the kernel itself doesn't run routing protocols. The kernel is only responsible for forwarding; policy decisions (how to forward) are calculated by userspace daemons. Once calculated, they are passed to the kernel via setsockopt or ioctl. Conversely, when the kernel encounters a packet it doesn't know how to forward, it also uses this socket to push messages (via sock_queue_rcv_skb()) back to the daemon for processing.

3. Handling the "Don't Know How to Forward" Queue

  • ipmr_expire_timer: This is a timer. Think about it—what if the daemon tells the kernel "there is a route," but never provides the complete information, or it expires after being provided? This timer is used for garbage collection—it periodically scans and clears out those useless "unresolved" entries.
  • mfc_unres_queue: This is the unresolved queue. When the kernel receives a multicast packet but can't find a match in the routing table (cache miss), it doesn't drop the packet directly. Instead, it hangs the packet (or request) on this queue, waiting for the userspace daemon to come to the rescue.

4. Core Data Area

  • mfc_cache_array: The multicast forwarding cache. This is a hash array with 64 slots (MFC_LINES). It is the true "snapshot" of multicast routing. We will focus on this structure in the next section—it determines where the next hop for a data packet is.
  • vif_table[MAXVIFS]: The virtual interface table. This is an array that can hold up to 32 (MAXVIFS) vif_device objects. The so-called "virtual interface" could be a real physical NIC, or it could be an IPIP tunnel. Regardless of what it is, in the eyes of multicast routing, they are all just ports that can emit signals. Entries in this array are added by vif_add() and removed by vif_delete().

Summary

Now you can think of mr_table as a dispatch center:

  • It holds an interface list (vif_table), knowing what exits it has.
  • It holds a routing cache (mfc_cache_array), remembering which path each data packet should take.
  • It also keeps a first-aid kit (mfc_unres_queue) and a phone (mroute_sk), specifically for handling situations it can't figure out.

But this still isn't enough. Having a table alone doesn't cut it—how do we match packets when they arrive? This is when the real star of the show takes the stage—the MFC (Multicast Forwarding Cache).