Skip to main content

5.5 Policy Routing: Choices Beyond the Map

In the previous section, we discussed how fib_nh acts like a dutiful guide, holding a sticky note with the outgoing interface and gateway address, directing packets on how to leave the kernel. We built a very intuitive mental model: the destination determines the route. As long as you know where you want to go (the destination IP), the routing table can tell you how to get there.

This mental model works 99% of the time. But as an engineer digging into low-level mechanisms, you might encounter that remaining 1%.

Imagine this scenario: you have a machine with two network cables plugged in.

  • One is eth0, connected to the internal network, used for management, and free of charge.
  • The other is eth1, connected to the external network, billed by traffic, and very expensive.

Now you're running a program that needs to download data from 10.0.0.1. Following the usual routine, the kernel checks the routing table, finds that 10.0.0.1 is reachable via eth0, and sends all the traffic out that way. But you don't want that. You have your own logic: even though the destination address is the same, if this traffic is generated by a system backup, I want it to take the expensive eth1 (because it's fast); if it's just regular browsing, then I'll use eth0.

If we only look at the destination, traditional routing tables are helpless. They only see the target and don't ask about the origin.

This is exactly the problem that Policy Routing solves. It makes routing decisions no longer solely based on "where am I going," but also on "who am I," "what am I doing," or even "what protocol am I using." Before introducing this mechanism, let's first look at how things work in its absence—a world without Policy Routing.

Life Without Policy Routing: Two Tables

When the kernel configuration option CONFIG_IP_MULTIPLE_TABLES is not enabled, the kernel's routing world is very simple: there are only two tables.

  1. Local Table (RT_TABLE_LOCAL, ID 255) This is the kernel's "private territory." It contains only routes for local IP addresses (such as 127.0.0.1 or the IP you assigned to eth0). This table is highly sensitive—only the kernel itself can add entries to it. If an administrator (User) tries to use ip route to stuff entries into the Local table, they will be rejected. This table determines "which addresses belong to me."

  2. Main Table (RT_TABLE_MAIN, ID 254) This is our "world map." The vast majority of routes you configure via the ip route add command reside in this table. It determines "if an address isn't mine, where should I throw it."

This initialization process happens in the fib4_rules_init() method of net/ipv4/fib_frontend.c.

A Historical Footnote: In kernels prior to 2.6.25, these two tables were still global variables: ip_fib_local_table and ip_fib_main_table. Back then, the code was full of logic directly accessing these two variables. Later, kernel developers realized this was too inflexible—if you wanted to add a table, you had to modify the code and recompile. So they refactored it, consolidating all table operations into the fib_get_table() method. Regardless of whether you have Policy Routing enabled, or how many tables you have, everyone uses fib_get_table(net, table_id) to get the table pointer.

This "unified access" approach is like turning "dedicated drawers" into a "numbered locker system"—no matter how many lockers there are, the action of using a key to unlock them is exactly the same.

When Policy Routing is Enabled: 255 Maps

When you enable CONFIG_IP_MULTIPLE_TABLES, the world changes.

The kernel is no longer limited to the Local and Main tables; it supports up to 255 routing tables. At boot time, three tables are initialized by default:

  • Local (255)
  • Main (254)
  • Default (253)

(Note: Regarding the specific use of the Default table and its detailed interaction with the Policy Routing rule set fib_rules, we will dive deep in Chapter 6. For now, let's focus on the management mechanism of the "tables" themselves.)

The question now is: with so many tables, how do we put things into them?

As a kernel engineer, you're certainly familiar with the ip route command. But do you know what it looks like in the kernel's eyes? It's a Netlink message.

1. Adding and Deleting Routes: ip route add/del

When you type:

ip route add 192.168.1.0/24 dev eth0

Your userspace tool (iproute2) actually sends an RTM_NEWROUTE message to the kernel via a Netlink socket.

The kernel side catches this with the inet_rtm_newroute() method (located in net/ipv4/fib_frontend.c).

  • It parses the parameters you brought (destination subnet, outgoing interface, priority, etc.).
  • It creates the corresponding fib_info and fib_alias.
  • It hangs them in the hash or TRIE structure of the corresponding FIB table (the Main table by default).

When you type ip route del ..., the flow is similar, except the message type becomes RTM_DELROUTE, and the kernel hands it over to inet_rtm_delroute(), which is responsible for removing the corresponding entry from the FIB table.

Here is a counter-intuitive detail worth pausing to think about: A route doesn't always mean "allow passage."

You can configure it like this:

ip route add prohibit 192.168.1.17 from 192.168.2.103

This command adds a "prohibition order" to the routing table. When the kernel looks up a route and matches this entry, it not only won't forward the packet, but will drop it directly and reply with an ICMP "Packet Filtered" error message. This is extremely useful in firewall or policy control scenarios—the routing table itself acts as a rule set.

Viewing Routes: ip route show This corresponds to the RTM_GETROUTE message, handled by inet_dump_fib().

  • By default, ip route show only looks at the Main table.
  • If you want to see the Local table, you must explicitly specify: ip route show table local.

2. The Old-School Approach: route add/del

Although the ip command is the current standard, the route command still exists. Its kernel interface is a completely different path—IOCTL.

  • route add sends the SIOCADDRT IOCTL.
  • route del sends the SIOCDELRT IOCTL.

Both IOCTLs are handled by the ip_rt_ioctl() method (also in net/ipv4/fib_frontend.c). This is an interface left over for compatibility with ancient network tools. Although functionally similar to Netlink, inside the kernel, the IOCTL processing path is typically more rigid than Netlink's.

3. Dynamic Routing Protocols: BGP, OSPF, etc.

Besides administrators typing commands by hand, the other main source of routing table data is routing daemons. These are heavy-duty software programs (like Quagga, Bird, Zebra) running on backbone routers. They implement complex protocols like BGP and OSPF.

These processes run in the background, chatting with neighbor routers via protocols. As soon as they detect a change in the network topology (like a fiber optic cable getting cut), they immediately call the Netlink API, flooding the kernel with RTM_NEWROUTE or RTM_DELROUTE messages to instantly update the FIB tables. To the kernel, it doesn't care whether these routes were typed in manually by an administrator or calculated by the OSPF protocol—they all ultimately end up as fib_info structures, hanging in the exact same tables.

Exceptions and Fine-Tuning: Returning to FIB Exceptions

At the beginning of this section, we mentioned that although we are discussing "table"-level management, we must not forget the FIB nexthop exception from the previous section.

  • If the next hop changed due to an ICMP Redirect.
  • Or if the MTU changed due to Path MTU Discovery.

These changes will not touch that massive, shared FIB routing table. They will only modify the small hash table (exception table) attached to the specific fib_nh header. This is an excellent isolation design: don't let special cases pollute global rules. If the path discovered by PMTU were directly modified in the global Main table, then all traffic heading to that subnet might incorrectly apply an unverified MTU value, which would be a disaster. Through the exception mechanism, the kernel only makes fine-tuned adjustments on "the specific flow that actually needs it."

Summary

In this section, we pulled our perspective back from the microscopic fib_nh to the macroscopic FIB Tables architecture.

We learned:

  1. Dual-Table Mode: Without Policy Routing, the kernel only recognizes the Local and Main tables.
  2. Unified Access: Through fib_get_table(), the kernel abstracted table operations, laying the foundation for multi-table support.
  3. User Interface: The Netlink messages (like RTM_NEWROUTE) behind the ip route command are how administrators and routing daemons manipulate the FIB.
  4. Routes as Policy: Route entries are not just for navigation; they can also be prohibitions like prohibit.

The FIB architecture is now fairly clear: we have tables, entries, next hops, and a fine-tuning mechanism for next hops. But as we mentioned in the opening "dual-NIC" scenario, having tables alone isn't enough—we also need a set of rules to decide "which table to check and when."

That is the main character of the next chapter—FIB Rules. That is where the true soul of Policy Routing lies.