Skip to main content

5.3 FIB Info: The "ID Card" of a Routing Entry

In the previous section, we mentioned that once fib_lookup() fills in fib_result, it has essentially completed its historical mission. The most important pointer in the result structure—the one pointing to fib_info—is the real boss.

Now let's zoom in. If fib_table is the entire routing book, then fib_info is a specific "signpost" within it. It defines how a packet should travel: which door (device) to exit from, how urgent it is (priority), who mandated it (protocol), and how far it can go (scope).

This is a highly detailed "ID card."


struct fib_info: Uncovering the Core Parameters

In the kernel, this structure is the embodiment of a routing entry. We can think of it as a large parameter set packed with all the information a router needs to make decisions.

Let's look at the code first to get a feel for how "plump" it is:

struct fib_info {
struct hlist_node fib_hash;
struct hlist_node fib_lhash;
struct net *fib_net;
int fib_treeref;
atomic_t fib_clntref;
unsigned int fib_flags;
unsigned char fib_dead;
unsigned char fib_protocol;
unsigned char fib_scope;
unsigned char fib_type;
__be32 fib_prefsrc;
u32 fib_priority;
u32 *fib_metrics;
#define fib_mtu fib_metrics[RTAX_MTU-1]
#define fib_window fib_metrics[RTAX_WINDOW-1]
#define fib_rtt fib_metrics[RTAX_RTT-1]
#define fib_advmss fib_metrics[RTAX_ADVMSS-1]
int fib_nhs;
#ifdef CONFIG_IP_ROUTE_MULTIPATH
int fib_power;
#endif
struct rcu_head rcu;
struct fib_nh fib_nh[0];
#define fib_dev fib_nh[0].nh_dev
};

At first glance, it's nothing but fields—don't panic. We can break them down into three categories: lifecycle, attributes, and performance metrics.


Lifecycle: Pointers of Life and Death

The first two fields, fib_hash and fib_lhash, are hash table nodes that allow the kernel to quickly locate this "ID card." The following fields determine when this card should be reclaimed.

  • fib_net: Points to the Network Namespace. This is what makes containerization possible; each container has its own network stack, and naturally, its own fib_info pool.
  • fib_treeref: This is a reference count, but it counts "how many fib_aliass point to me." Remember the fib_alias we discussed in the previous section? If multiple routes differ only in TOS or priority, they can share the same fib_info. For each such alias, this count increments by 1. It increments in fib_create_info() and decrements in fib_release_info().
  • fib_clntref: This is another reference count, more like a "client reference." It also increments on creation and decrements in fib_info_put(). If it drops to 0, it means no one is using this routing entry, and free_fib_info() will be called to clean up the scene.
  • fib_dead: This is a death flag. If you want to free a fib_info, you must set this bit to 1 first. If you dare to call free_fib_info() without setting it, the kernel will refuse to free it—it considers the object still alive. It's like slapping a "canceled" label on an object to prevent accidental deletion.

Attributes: Who, Where, and How Far

The following fields define the essence of the route.

fib_protocol: Who Mandated It?

fib_protocol tells the kernel "who" added this route. This is crucial because the source determines the level of trust and how it is handled.

If you type a command in user space without any modifiers:

ip route add 192.168.1.0/24 dev eth0

The kernel assumes this is RTPROT_BOOT (boot configuration) by default.

If you explicitly add a modifier:

ip route add 192.168.1.0/24 dev eth0 proto static

It becomes RTPROT_STATIC (static configuration by an administrator).

Besides these, there are a few other regulars:

  • RTPROT_KERNEL: Added by the kernel itself. For example, the local loopback route (127.0.0.0/8) is configured by the kernel at startup—no need for you to worry about it.
  • RTPROT_REDIRECT: Rarely seen in IPv4, mostly used in IPv6, indicating a route triggered by an ICMP Redirect message.
  • RTPROT_RA: Used for IPv6 Router Advertisements—don't confuse it with the Router Alert Option.

Of course, advanced routing daemons like ZEBRA and XORP also have their own identifiers (such as RTPROT_XORP) when adding routes. All of these definitions are in include/uapi/linux/rtnetlink.h.


fib_scope: How Wide Is the Reach?

Scope is a concept related to "distance." It tells us exactly how far away this destination is, or how far a packet can travel.

You can view them using ip address show or ip route show. The main types are:

  • host (RT_SCOPE_HOST): Right on this machine. The most typical example is 127.0.0.1, which never leaves the network interface.
  • link (RT_SCOPE_LINK): Right on this wire. Only hosts directly connected to the same switch or cable can receive it.
  • global (RT_SCOPE_UNIVERSE): Globally reachable. This is the default for the vast majority of routes—goes anywhere.
  • site (RT_SCOPE_SITE): This is IPv6's territory (detailed in Chapter 8).
  • nowhere (RT_SCOPE_NOWHERE): Does not exist. Don't go there.

If you add a route without specifying a scope, the kernel follows these rules:

  1. If it's a unicast route via a gateway → global.
  2. If it's a directly connected unicast or broadcast → link.
  3. If it's a local route → host.

fib_type: Is the Road Open or Blocked?

This is a key added to fib_info after kernel 3.7. Previously, it only existed in fib_alias and was moved here for differentiation.

The most common type is RTN_UNICAST (a normal unicast route). But there's an interesting type called RTN_PROHIBIT.

You can add a roadblock like this:

ip route add prohibit 192.168.1.17 from 192.168.2.103

This command means: strictly forbid going from 103 to 17. If someone insists on taking that path, the kernel won't silently drop the packet; it will politely reply with an ICMPv4 "Packet Filtered" (ICMP_PKT_FILTERED) message, telling the sender "this road is closed."

How is this mechanism implemented? We'll dive into the details when we discuss fib_props later.


fib_prefsrc: The Biased Source Address

Sometimes you want to force a specific source address, even if it isn't natively assigned to that link. fib_prefsrc is used to store this "biased" choice.


fib_priority: Who Has the Final Say?

Priority—you can call it Metric, or you can call it Preference. The lower the number, the higher the priority (0 is the highest).

You have three ways to express the exact same thing on the command line (setting priority to 5):

ip route add 192.168.1.10 via 192.168.2.1 metric 5
ip route add 192.168.1.10 via 192.168.2.1 priority 5
ip route add 192.168.1.10 via 192.168.2.1 preference 5

To the kernel, these three commands are identical. Also, make sure not to confuse this with the fib_metrics we'll discuss later—even though it's called metric on the command line, that field has absolutely nothing to do with the complex metrics in the fib_metrics array.


Performance Metrics: MTU, RTT, and That Bunch of Arrays

fib_metrics is an array storing 15 (RTAX_MAX) path performance-related parameters.

To make the code look cleaner (and to be a bit lazy), the kernel uses macros to give aliases to commonly used positions in the array:

  • fib_mtu: Path MTU.
  • fib_window: TCP window size.
  • fib_rtt: Round-Trip Time.
  • fib_advmss: Suggested MSS.

During initialization, these metrics point to dst_default_metrics (in net/core/dst.c). Many are TCP-specific, such as initcwnd (initial congestion window).

You can set these manually. For example, if you want to increase the initial congestion window:

ip route add 192.168.1.0/24 initcwnd 35

Or if you want to lock down the MTU:

# 这里的 lock 很关键
ip route add 192.168.1.0/24 mtu lock 800

Why add lock? Without lock, the kernel's Path MTU Discovery mechanism might silently change your MTU if it detects a smaller MTU along the path. Once locked, the kernel calls dst_metric_locked() for a check, sees that it's locked, and returns immediately without making any changes.

Let's look at the logic in the kernel code (net/ipv4/route.c):

static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
{
// ...
/* 只有没上锁的 MTU 才允许更新 */
if (dst_metric_locked(dst, RTAX_MTU))
return;
// ...
}

Multipath Routing: All Roads Lead to Rome

fib_nhs: How Many Next Hops?

fib_nhs records the Number of Hops. When CONFIG_IP_ROUTE_MULTIPATH is not enabled, it can only be 1.

With multipath routing enabled, a single route can correspond to multiple next hops. This brings redundancy, load balancing, and even improved security.

fib_dev & fib_nh[0]: Who Is the First Stop?

fib_dev is simply a macro pointing to fib_nh[0].nh_dev—the network device corresponding to the first next hop.

If you configure multipath, you can write it like this on the command line:

ip route add default scope global nexthop dev eth0 nexthop dev eth1

In this case, fib_nh becomes an array with two egress interfaces attached. The kernel will throw packets to one of them based on an algorithm (such as hashing or weighting).


fib_props: Making "Prohibited" Resonate

Earlier we mentioned the RTN_PROHIBIT type, which causes the kernel to send back an ICMP "Packet Filtered" message.

This mechanism is implemented using an array called fib_props.

This array is defined in net/ipv4/fib_semantics.c and has 12 elements (RTN_MAX), with each element corresponding to a route type.

The structure looks like this:

struct fib_prop {
int error; /* 错误码 */
u8 scope; /* 作用域 */
};

For a normal unicast route (RTN_UNICAST), error is 0 (no problem), and scope is RT_SCOPE_UNIVERSE (globally reachable).

But for a prohibited route (RTN_PROHIBIT), the configuration is:

const struct fib_prop fib_props[RTN_MAX + 1] = {
// ...
[RTN_PROHIBIT] = {
.error = -EACCES, /* 拒绝访问 */
.scope = RT_SCOPE_UNIVERSE,
},
// ...
};

How does it work?

When you send data to 192.168.1.17 on the Rx path (receive/forward path) and happen to have a prohibit rule configured, the flow looks like this:

  1. fib_lookup() starts the table lookup.
  2. It finds a matching leaf node in the FIB TRIE and calls the check_leaf() method.
  3. check_leaf() obtains the type of fib_alias (fa_type) and uses it to look up the fib_props array.
  4. It discovers that .error is -EACCES (non-zero).
  5. The error code propagates back up, and ultimately fib_lookup() returns an error.
  6. The upper-layer caller ip_error() receives this -EACCES, and based on this error code, constructs an ICMP Destination Unreachable message with code ICMP_PKT_FILTERED, throws it back, and drops the packet.

This is why a "soft firewall" like ip route add prohibit can generate ICMP messages—it's hooked directly into the route lookup path.


Caching: Results Can Be Stored, Too

Route lookup is hard work, especially when the routing table is large. To save some effort, the kernel caches the lookup results.

Note that the cache mentioned here is not the deprecated "IPv4 Routing Cache" (that's old history from before 3.6), but rather a Next Hop-based cache.

Where Is It Cached?

The cache lives right inside the fib_nh (next hop) structure.

  • Rx path (receive and forward): Results are cached in the nh_rth_input field of fib_nh.
  • Tx path (local transmit): Results are cached in the nh_pcpu_rth_output field of fib_nh.

Both fields are essentially rtable structures containing the destination information we need.

When Is It Not Cached?

If you are using multipath routing, Realms (traffic classification), or the packet is not unicast, the kernel gets cautious and avoids caching directly in fib_nh to prevent different rules from interfering with each other.

Who Handles the Caching?

There is a dedicated function called rt_cache_route() (defined in net/ipv4/route.c). Whether you are receiving or sending packets, it is responsible for stuffing fib_result into the cache slot.

The Per-CPU Performance Magic

Notice the pcpu in the name nh_pcpu_rth_output? This means each CPU has its own copy of the cache.

This is for performance—when multiple CPUs are transmitting packets concurrently, there's no need to fight over the same lock; everyone plays in their own sandbox without interfering with each other. This is one of the reasons why Linux's forwarding performance remains robust under high throughput.


Summary

In this section, we thoroughly dissected the fib_info structure, the "mother of all routing entries."

From the lifecycle management of reference counts, to the routing attributes described by fib_protocol and fib_scope; from the TCP performance parameters wrapped in fib_metrics, to the route interception mechanism implemented by the fib_props array.

There is one detail worth savoring here: how the kernel maps route types to specific behaviors through a table lookup (fib_props). This "data-driven" design pattern is ubiquitous in the kernel—instead of hardcoding if (type == PROHIBIT), it looks up a configuration array.

Now we have a complete fib_info, as well as a next hop-based cache. But there is still a key role we've only briefly touched upon—that is fib_nh (the next hop) itself.

In the next section, we'll dive inside the fib_nh structure to see exactly what kind of address and device the kernel holds in its hands when it decides to send a packet to the "next stop." We'll also look at how a mechanism called FIB Nexthop Exception secretly corrects the MTU or next hop of a specific path without disturbing the global routing table.