5.6 FIB Alias: When a Destination Has Multiple Identities
In the previous section, we discussed the general structure of FIB Tables and fib_info. You might think everything looks perfect—one route, one fib_info, recording the gateway, device, and metric, clear and straightforward.
But real-world networks always like to throw edge-case puzzles at us.
Imagine this scenario: you need to reach a destination, say 192.168.1.10. Most of the time, you only care about whether you can get there. But with more granular QoS (Quality of Service) control, you might want to route packets differently based on their TOS (Type of Service) field—for example, sending voice traffic (low-latency TOS) over an expensive dedicated line, while routing file download traffic (low-cost TOS) over the regular public internet.
If the kernel cloned a complete fib_info (including all that bulky data like nexthops and metrics) for every slightly different route, memory would be wasted very quickly.
The Linux kernel's solution is the FIB Alias (fib_alias). This is a classic "factor out the common parts" design: keep the unchanging, heavy structure (fib_info) around, and attach the varying, lightweight attributes (TOS, priority, type) as small hooks.
6.1 Not Just TOS
The opening paragraph mentioned TOS, but the scope of aliases is actually broader. When multiple route entries point to the exact same destination (or the same subnet) and they go through the same gateway using the same output interface, with the only differences being:
- Different TOS values
- Different priorities
- Different route types (e.g.,
RTN_UNICASTvsRTN_PROHIBIT)
...then creating a brand new fib_info would be too extravagant. We only need to create a lightweight fib_alias that points to the already existing fib_info.
Let's manually create a set of aliases to see this in action. The following commands create 3 routes to 192.168.1.10, all with the gateway 192.168.2.1, differing only in the TOS field:
ip route add 192.168.1.10 via 192.168.2.1 tos 0x02
ip route add 192.168.1.10 via 192.168.2.1 tos 0x04
ip route add 192.168.1.10 via 192.168.2.1 tos 0x06
In the kernel's eyes, the "physical attributes" (gateway, device) of these three routes are completely identical—there's no need to store three copies. The single fib_info they actually share is like an apartment shared by three siblings; while the fib_alias is like each person's keychain, engraved with their own personalized identifier.
6.2 The Structure: fib_alias
Let's look at the structure definition of fib_alias. It's very compact, which is exactly the point of its existence—saving memory:
struct fib_alias {
struct list_head fa_list; // 链表节点,挂载到 fib_node 的别名列表上
struct fib_info *fa_info; // 指向那个共享的 fib_info 对象
u8 fa_tos; // TOS 值,区分流量的关键
u8 fa_type; // 路由类型 (RTN_UNICAST, RTN_PROHIBIT 等)
u8 fa_state; // 状态标志
struct rcu_head rcu; // RCU 机制使用的回调头
};
There's a historical detail worth noting here (for the kernel archaeology enthusiasts): in earlier kernel versions (before 2.6.39), fib_alias also contained a fa_scope field. Later, developers realized that scope is an inherent attribute of a route and shouldn't be differentiated in an alias, so they moved it into the fib_info structure.
6.3 Diagramming the Sharing Mechanism
To cement this relationship in your mind, let's look at a diagram (Figure 5-3).
In this scenario, we have three fib_alias objects. Each holds a different fa_tos (e.g., 0x02, 0x04, 0x06), representing three different routing policies. But notice their fa_info pointers—they all point to the exact same fib_info object.
This is like three OS threads (aliases) referencing the same physical memory page (fib_info). To prevent this "shared property" from being prematurely freed, fib_info internally maintains a reference counter, fib_treeref.
Because three routes are using it, the fib_treeref value in the diagram shows 3. If you delete one of the routes (say, the rule for tos 0x04), the kernel will simply decrement the counter and free the corresponding fib_alias, while that bulky fib_info will continue to hang around in memory until the counter reaches zero.
6.4 Inside the Kernel: fib_table_insert()
Just looking at structures isn't thrilling enough. Let's dive into the kernel code and see how it handles this sharing when you type that ip route add command.
All of this happens in the fib_table_insert() method. This is the main entry point for adding route entries. Let's assume we've already added the first route with TOS 0x02, and we're now adding a second route with TOS 0x04.
int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
{
struct trie *t = (struct trie *) tb->tb_data;
struct fib_alias *fa, *new_fa;
struct list_head *fa_head = NULL;
struct fib_info *fi;
// ... 后续逻辑展开 ...
}
Step 1: Try to find an existing fib_info
The code first tries to create (or find) a fib_info. This step is crucial—it's the logical core of the deduplication.
fi = fib_create_info(cfg);
Inside fib_create_info(), the kernel first allocates a fib_info, then immediately calls fib_find_info() to search the hash table: "Hey, is there already a fib_info with the exact same configuration (same gateway, device, etc.)?"
struct fib_info *fib_create_info(struct fib_config *cfg)
{
struct fib_info *fi = NULL;
struct fib_info *ofi;
// ...
fi = kzalloc(sizeof(*fi) + nhs*sizeof(struct fib_nh), GFP_KERNEL);
if (fi == NULL)
goto failure;
// ... 初始化 fi ...
link_it:
ofi = fib_find_info(fi); // 去哈希表里搜
That fib_find_info() is a detective. If it finds a perfect match (the old fib info), the kernel decides: "Since one already exists, the fi we just kmalloc is redundant garbage."
So, the code marks the new fi as dead (fib_dead = 1), frees its memory directly, increments the reference count (fib_treeref) of the found ofi by 1, and finally returns ofi to the caller.
if (ofi) {
fi->fib_dead = 1; // 标记新分配的那个为废弃
free_fib_info(fi); // 释放掉那个多余的
ofi->fib_treeref++; // 给那个幸存的老兵计数+1
return ofi; // 返回共享的对象
}
// ...
}
In our example, because the gateway and device of the second route (TOS 0x04) are exactly the same as the first one (TOS 0x02), fib_find_info() hits successfully. We reuse the same fib_info.
Step 2: Check if a new Alias is needed
Now that we have the fi (which is actually the reused old ofi), the kernel proceeds with the TRIE insertion logic. It needs to hang an alias on the corresponding leaf node.
The key here is: even though the fib_info is the same, because the TOS is different, we definitely need a new fib_alias object. The kernel confirms this via fib_find_alias():
l = fib_find_node(t, key);
fa = NULL;
if (l) {
fa_head = get_fa_head(l, plen);
// 在别名列表里找,看有没有一模一样的 TOS 和 Priority
fa = fib_find_alias(fa_head, tos, fi->fib_priority);
}
If the fa pointer is not NULL, and both the TOS and priority match exactly, it means you're adding a completely duplicate route, and the kernel might just update it or throw an error (depending on configuration). But in our scenario, tos is 0x04, which differs from the existing 0x02, so fib_find_alias() returns NULL, telling the kernel: "There's no duplicate here, you can add a new one."
Step 3: Create and mount the Alias
Finally, the kernel grabs a new fib_alias from the Slab Allocator, points its fa_info pointer to that shared fib_info, and hooks it onto the linked list.
new_fa = kmem_cache_alloc(fn_alias_kmem, GFP_KERNEL);
if (new_fa == NULL)
goto out;
new_fa->fa_info = fi; // 指向那个共享的 fib_info
new_fa->fa_tos = tos; // 设置自己的独特属性
// ... 插入到 fa_list 链表中 ...
6.5 Echoes from This Section
At this point, we've fully explored the static storage structure of the FIB—from Tables to TRIEs, from fib_info to fib_alias.
fib_tableis the skeleton.fib_infois the muscle (storing the entity data).fib_aliasis the skin (storing the differentiated attributes).
This design is extremely efficient. When you configure a complex BGP routing table with tens of thousands of routes, many of which differ only in priority, this sharing mechanism can save a considerable amount of memory.
But for now, the routes are still static.
A router doesn't just need to store routes; it also needs to "talk." When it discovers a suboptimal path, it proactively sends a message to its neighbor: "Hey, don't go through me, that other gateway is faster." This is the ICMPv4 Redirect.
In the next section, we'll look at how the kernel dynamically corrects paths using ICMP Redirect messages when a "suboptimal" routing situation occurs.