Skip to main content

ch12_8

12.8 Mesh Networking (802.11s)

In the previous section, we were marveling at high-speed "point-to-point" transmission mechanisms like A-MPDU and Block Ack, thinking this was the ultimate destination for wireless networking.

But reality hits fast—once you step out of that room with its几十-meter coverage range, or enter a harsh environment without wiring, that so-called "high-speed" AP becomes useless. You face a more primitive, thorny problem: how do we make wireless signals relay hop-by-hop to their destination?

This is exactly what 802.11s sets out to solve. It is no longer satisfied with the centralized "star" topology; instead, it builds a truly decentralized "mesh" world.


From Ad Hoc to Mesh: Topology Evolution

Before 802.11s, if you didn't want to use an AP, your only option was IBSS (also known as Ad Hoc mode). But Ad Hoc has a fatal weakness: it requires "full mesh" connectivity. It's like a group of people shouting in a dark room—everyone must be able to hear everyone else directly to ensure network connectivity. This is almost impossible in the real world. Once a node moves into a corner or gets blocked by a wall, the network fragments.

Mesh emerged to solve this exact "fragility."

You can think of a Mesh as a mature community gossip network:

  • In Full Mesh, everyone is friends with everyone else, and gossip spreads across the network instantly (this is the ideal state).
  • In Partial Mesh (more common), you only know a few neighbors, but you trust them to pass the word along.

Back in Linux kernel 2.6.26 (2008), the open80211s project pushed draft 802.11s support into the wireless stack. This was thanks to sponsorship from OLPC (One Laptop Per Child) and various vendors, as well as the early code written by Luis Carlos Cobo and Javier Cardona at Cozybit. They made Linux the first general-purpose operating system to natively support Mesh at the kernel level.


HWMP Protocol: The Layer 2 Routing Heart

Since it's a "mesh," a packet traveling from A to C might need to pass through B. How does B know where to forward the packet? Normally, we'd rely on IP routing (Layer 3), but 802.11s is a Layer 2 protocol—it doesn't care about IP at all. Instead, it invented its own routing protocol: HWMP (Hybrid Wireless Mesh Protocol).

Why "Hybrid"? Because it combines two radically different routing strategies, oscillating between "lazy" and "proactive" approaches:

  1. On-Demand Routing — "Find it when you need it"

    • It doesn't maintain a routing table during idle times. Only when you need to send data to someone and realize you have no route (mesh_path_lookup fails) does it initiate a PREQ (Path Request).
    • It's like needing to borrow money from someone, suddenly realizing you don't have their number, and standing at the village entrance shouting: "I'm looking for Zhang San! Does anyone know where Zhang San is?"
    • The PREQ broadcasts throughout the Mesh (IEEE80211_STYPE_ACTION). Every node it passes through notes "who is looking for Zhang San," until the shout finally reaches Zhang San's ears.
  2. Proactive Routing — "Tree-based broadcasting"

    • Some nodes are "busy hubs" (traffic aggregation points) that everyone wants to reach. Having everyone do on-demand lookups for them is too slow.
    • This node crowns itself the Root Mesh Point and periodically sends out a RANN (Root Announcement): "I am the root, connect to me."
    • Upon receiving this, other nodes proactively establish a "tree-shaped" path pointing to the root.

Path Discovery and Caching: The PREQ/PREP Duet

Let's zoom in and see how a packet finds its way through the Mesh.

Step 1: Lost and Shouting (PREQ) Suppose node A wants to send data to node D. A first checks its mesh_path table (defined in net/mac80211/mesh.h).

  • No route? Triggers hwmp_preq_frame_process() and sends a PREQ broadcast packet.
  • At this point, A can't just stand idle—data still needs to flow. A stuffs these unsent packets into a buffer called frame_queue (every mesh_path object has one).
  • Note that this buffer is very small, holding a maximum of only 10 packets (MESH_FRAME_QUEUE_LEN). If it fills up, additional packets are dropped—nothing more can be done.

Step 2: Response and Return (PREP) The PREQ hops along until it finally reaches D. D receives it and realizes: "Hey, they're looking for me." It then unicasts a PREP (Path Reply) back along the reverse path the PREQ took. The PREP carries path information and makes its way back to A. Upon receiving the PREP, A finally establishes a complete route to D (stored in the mesh_path structure).

Step 3: Releasing the Backlog Remember those packets stuffed into frame_queue earlier? As soon as the route is established, mesh_path_tx_pending() is called, flushing those 10 backlogged packets out all at once along the newly born route. It's like a newly unclogged pipe flowing freely again.

Step 4: If the Route Breaks (PERR) Wireless environments are turbulent. An intermediate node B might lose power or suffer interference. B will emit a PERR (Path Error), notifying everyone: "Stop sending stuff my way, the road ahead is broken." Upon receiving the PERR, A might have to start over and send another PREQ.


The Cost of Routing: Airtime Metric

You might ask: if there are two paths to D, which one does HWMP choose? It doesn't just look at the lowest hop count; it looks at the Airtime Metric.

This is a highly pragmatic metric. It calculates: roughly how many microseconds of channel time will it cost to transmit a packet over this link? The calculation logic lives in airtime_link_metric_get() and looks roughly like this:

开销 = (数据包长度 + 固定开销) / 传输速率
  • If your signal is good (high rate), the overhead is small, and the Metric is low (better).
  • If your signal is poor (low rate, high packet error rate), the overhead is large, and the Metric is high (worse).

Therefore, HWMP tends to choose the path that "takes an extra hop but is fast on every hop," rather than the path that "only takes two hops but crawls like a snail on each one."


Hands-on: Building a Mesh with iw

With all that said, how do we turn a wireless NIC into a Mesh node on Linux?

First, you need hardware that supports Mesh mode (many older NICs, or drivers with incomplete firmware support, will trip you up here). Then, forget the antiquated iwconfig and bring out the modern standard tool: iw.

1. Create the Mesh Interface

# 把 wlan0 设置为 Mesh Point 模式
iw wlan0 set type mesh
# 或者简写为 mp
iw wlan0 set type mp

2. Join the Mesh Network Just like setting an SSID for Wi-Fi, a Mesh network also has an ID.

iw wlan0 mesh join "my-mesh-ID"

As long as two or more nodes execute the same mesh join command (and are within RF range), they will start handshaking, exchanging PREQ/PREP, and establishing routes.

3. Verify the Status Now you can look down on the network like a god:

# 查看有哪些邻居(1-hop邻居)
iw wlan0 station dump

# 查看路由表(整个 Mesh 的路径)
iw wlan0 mpath dump

In the mpath dump output, you can see the NEXT_HOP (next hop) and METRIC (cost). This is the direct result of the HWMP algorithm's hard work.


Every Rose Has Its Thorn: The Cost of Mesh

Mesh sounds beautiful (wide coverage, strong self-healing, no wiring required), but don't rush to throw away your Ethernet cables—it has obvious side effects:

  1. Broadcast Storms: HWMP's PREQs are broadcast, and route maintenance involves broadcasting too. In large Mesh networks, management frames can consume more than half the bandwidth, leaving your user data to scrape by in the cracks.
  2. Performance Bottlenecks: Every hop introduces latency and throughput degradation. Traveling 5 hops from A to B, your gigabit NIC might only achieve 100 Mbps speeds, and you'll have to endure extremely high ping values.
  3. Driver Hell: Although the kernel supports it, many vendors' wireless drivers simply haven't implemented the Mesh callback functions, or their implementations are buggy. You might find that your NIC can connect to an AP just fine, but entering Mesh mode causes a kernel panic.

Summary and Preview

In this section, we stepped out of the comfort zone of a single AP and entered a more complex, dynamic distributed world. We evolved from the "full-mesh kindergarten" of IBSS to the "multi-hop mature society" of Mesh. We saw how HWMP uses PREQ/PREP to perform routing tricks at Layer 2, and how Linux encapsulates these complex protocols into a few simple commands via mac80211 and iw.

But whether wired or wireless, Linux's network stack ultimately runs on hardware. In the next chapter, we will shift our gaze from the radio waves in the air to a higher-performance, more expensive hardware world—InfiniBand and RDMA. There, latency will be measured in microseconds, and the kernel's role will undergo a massive transformation.