Skip to main content

5.2 Performing Lookups in the Routing Subsystem

In the previous section, we figured out what the FIB is—it's not a simple table, but rather the "treasure map" the kernel uses to determine a packet's fate. We have the map, but nobody has read it yet.

When a packet arrives at the network card and enters the protocol stack, the kernel must make a decision: Is this for me? Should it be forwarded to someone else? Or should it just be thrown in the trash?

This decision process is route lookup.

This is a high-frequency operation—every packet, whether incoming or outgoing, must go through this. Before kernel version 3.6, to save time, this process was split into two steps: first check the "route cache," and only if that missed, check the "routing table" (FIB). Now that the caching mechanism is gone, we face the core directly: fib_lookup().

Core Call: fib_lookup()

This function is the brain of route lookup. Its task is simple: take the clues you provide (destination address, etc.), search the FIB, fill the results into a struct if found, and wave goodbye (return 0).

The function prototype looks like this:

int fib_lookup(struct net *net, const struct flowi4 *flp, struct fib_result *res);

Here, two key players make their debut: the clue provider flowi4 and the result container fib_result.

The Clue: flowi4

flowi4 is essentially a "lookup request form." You can't go to the kernel empty-handed and ask for directions; you have to fill out a form.

The most important fields on this form include:

  • Destination address
  • Source address
  • Type of Service (TOS)
  • ...and a few others

The kernel uses these fields as the "Key" to match against the FIB. For IPv6, there is a corresponding flowi6, both defined in include/net/flow.h.

The Result: fib_result

If fib_lookup() returns successfully, the fib_result struct it brings back is the spoils of the entire lookup operation. This struct holds all the clues about what to do next.

What does the lookup process look like?

The kernel takes your flowi4 request form and first checks the Local table (to see if it's destined for the local machine). If nothing is found, it checks the Main table (to see if it needs to be forwarded). As long as there's a hit in either table, the lookup is considered successful.

But that's only half the story. Finding the path is one thing; we still need the "legs" to walk it.

From FIB to Route Cache Entry: The Birth of rtable

After a successful lookup, whether receiving or sending a packet, the kernel constructs a dst_entry object (Destination Entry, or dst cache). You can think of it as a "routing slip"—it says "once you have this slip, here's who you look for next."

This dst_entry is usually embedded within a larger struct called rtable. The rtable is the actual routing entry that gets attached to the packet (SKB).

First, let's look at the core of dst_entry:

struct dst_entry {
...
int (*input)(struct sk_buff *);
int (*output)(struct sk_buff *);
...
};

Notice those two function pointers: input and output. These are what we're really after once we finish fib_lookup(). Depending on the lookup result, different handler functions are hooked to these two pointers. The packet acts like a fool—holding the routing slip, it simply calls these two functions, and the rest of the journey unfolds automatically.

Now let's look at the outer shell—rtable:

struct rtable {
struct dst_entry dst;

int rt_genid;
unsigned int rt_flags;
__u16 rt_type;
__u8 rt_is_input;
__u8 rt_uses_gateway;

int rt_iif;

/* Info on neighbour */
__be32 rt_gateway;

/* Miscellaneous cached information */
u32 rt_pmtu;

struct list_head rt_uncached;
};

Every field here has a purpose. Let's break them down.

rt_flags: Special Annotations on the Routing Slip

This is a set of flags telling the kernel that this route has "special circumstances." Common ones include:

  • RTCF_BROADCAST: The destination is a broadcast address. ip_route_input_slow() and __mkroute_output() set this.
  • RTCF_MULTICAST: The destination is a multicast address. ip_route_input_mc() sets this.
  • RTCF_DOREDIRECT: This is critical. Setting this flag means the kernel should send an ICMP Redirect message back to the sender, telling it "you took the wrong path, there's a shortcut."
    • Trigger conditions are strict: The input device and output device must be the same, and send_redirects must be enabled in Procfs. This is set in __mkroute_input().
  • RTCF_LOCAL: The destination is local. When set, the packet must be sent up to the upper protocol stack instead of being forwarded.

Other Key Fields

  • rt_is_input: If 1, this is an "incoming route"; if 0, it's outgoing.
  • rt_uses_gateway:
    • If 1: The next hop is a gateway.
    • If 0: Directly connected route.
  • rt_iif: Input interface index. This is which port the packet came in from.
  • rt_gateway: The IP address of the next-hop gateway.
  • rt_pmtu: Path MTU. This caches the smallest MTU value along the path to prevent the packet from being "dismembered" halfway.

A brief interlude: Before kernel 3.6, there was also an rt_spec_dst field here. It was later removed because it became redundant with the introduction of the fib_compute_spec_dst() method. This was primarily used for special cases like ICMP replies—when replying, the source address must be used as the destination address.

The Fork in the Road: input/output Callbacks

Once the rtable is built, the dst.input and dst.output hooks are given their true mission. This is the ultimate goal of route lookup: deciding the function call chain.

  • Incoming packets:
    • If destined for the local machine: inputip_local_deliver().
    • If it needs to be forwarded: inputip_forward().
  • Outgoing packets:
    • If originating from the local machine: outputip_output().
  • Multicast packets:
    • inputip_mr_input() (under specific conditions).
  • Error handling:
    • If it's RTN_PROHIBIT (route prohibited): inputip_error().

See? This is much more intuitive than staring at a ip route list—in code, routing selection is simply "picking a function."

Digging into fib_result

As mentioned above, fib_lookup() fills the results into fib_result. Now let's dissect this struct. It holds the raw data from the lookup process, not yet processed into a rtable.

struct fib_result {
unsigned char prefixlen;
unsigned char nh_sel;
unsigned char type;
unsigned char scope;
u32 tclassid;
struct fib_info *fi;
struct fib_table *table;
struct list_head *fa_head;
};
  • prefixlen: Subnet prefix length (i.e., the netmask).
    • Range 0~32. If it's a default route (0.0.0.0/0), this is 0.
    • This is determined in the check_leaf() method.
  • nh_sel: Next-hop selector.
    • If it's a single-path route, this is 0.
    • If Multipath Routing is enabled, there might be several next hops, and this value is the index of the selected one.
  • type: This is the most important field. It directly determines the packet's fate.
    • RTN_UNICAST: Normal unicast, to be forwarded or directly connected.
    • RTN_LOCAL: Destined for the local machine.
    • RTN_BROADCAST: Broadcast.
    • RTN_MULTICAST: Multicast.
    • RTN_UNREACHABLE: Unreachable, will trigger an ICMP Destination Unreachable message.
    • ... (there are 12 types in total).
  • scope: The route's scope (distance).
  • fi (fib_info): Pointer to the core information of the routing entry. This contains the actual next-hop information (fib_nh). We'll dive into this in the next section.
  • table: Pointer to the FIB table where the lookup occurred (Local or Main).
  • fa_head: Pointer to the fib_alias list.
    • This is an optimization. If multiple routes are identical except for TOS or priority, they can share the same fib_info and be linked together via the fib_alias list.

Summary

In this section, we peeled back the layers of route lookup in the kernel like an onion.

From the user-space ip route command to the lines of kernel code, the gap spans more than just data structures—it's the dividing line between "decision" and "execution." fib_lookup() is the decision-maker, filling out the fib_result; while rtable and its callback functions are the executors, taking the instructions and pushing the packet forward.

But we left a major cliffhanger: what exactly is hidden behind that pointer to fib_info in fib_result? Why is the next-hop information stored in a separate struct? In the next section, we'll dive deeper into the lower levels of the FIB to see how that frequently referenced "routing table" is actually organized.