Skip to main content

7.5 Quick Reference

At this point, we have dissected the skeleton (core structures), blood (protocol interactions), and muscles (state machine) of the neighbor subsystem.

Now, it's time to set aside the scalpel and open up a heavy anatomy atlas.

This section isn't meant to be "read"—it's meant to be "consulted." We lay out the important methods, macros, and structures discussed in this chapter so you don't have to scour the kernel source code later for that one line of invocation. You'll find that many names that flashed by in the code flow now have a clear place here.

Note

The core neighbor code hides in:

  • net/core/neighbour.c
  • include/net/neighbour.h
  • include/uapi/linux/neighbour.h

The ARP (IPv4) nest is in:

  • net/ipv4/arp.c
  • include/net/arp.h
  • include/uapi/linux/if_arp.h

The NDISC (IPv6) turf is in:

  • net/ipv6/ndisc.c
  • include/net/ndisc.h

Methods

We start with the core methods—the heartbeat of the neighbor subsystem.

Table Management and Initialization

void neigh_table_init(struct neigh_table *tbl)

This is the first key to starting the engine. It calls neigh_table_init_no_netlink() to complete all initialization work for the neighbor table, then conveniently hangs this table (tbl) on the global neighbor table list (neigh_tables). Without this step, the kernel doesn't even know this table exists.

void neigh_table_init_no_netlink(struct neigh_table *tbl)

The grunt worker of neigh_table_init. It handles the dirty work—allocating memory, initializing the hash table, setting parameters—but it doesn't hang itself on the global list. That's the older brother's (neigh_table_init) job.

int neigh_table_clear(struct neigh_table *tbl)

This is the cleaning lady. When a neighbor table is no longer needed (e.g., during module unloading), this method is responsible for releasing all associated resources.

Birth and Death of Neighbor Entries

struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device *dev)

Allocates a new neighbour object. Don't assume this is just a simple kzalloc—internally, it keeps an eye on gc_thresh's mood. If the table is too crowded, it might trigger a round of garbage collection before allocating memory for you. We discussed this mechanism in detail in the "Creation and Release" section.

struct neighbour *__neigh_create(struct neigh_table *tbl, const void *pkey, struct net_device *dev, bool want_ref)

The true creator. neigh_alloc only gives you an empty shell, but __neigh_create uses the pkey (L3 address) and dev (device) to fully construct the neighbor object and insert it into the hash table. If you pass in want_ref=true, it thoughtfully increments the reference count to prevent it from being reclaimed by someone else before you can use it.

struct neighbour *__neigh_lookup(struct neigh_table *tbl, const void *pkey, struct net_device *dev, int creat)

The detective. It checks the hash table to see if the neighbor corresponding to pkey exists. The creat parameter determines what to do if it's not found: if it's 1, it directly calls neigh_create to fabricate one on the spot; if it's 0, it throws its hands up and returns NULL.

Conversations with User Space

int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)

The Netlink service receptionist. When you type ip neigh add ..., the kernel eventually ends up here. It processes RTM_NEWNEIGH messages, turning user space's intentions (like adding a static ARP entry) into kernel data structures.

int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)

Same as above, but it handles RTM_DELNEIGH—corresponding to when you type ip neigh del ....

Garbage Collection and Timers

int neigh_forced_gc(struct neigh_table *tbl)

The forced eviction team. This is a synchronous garbage collection method. It ruthlessly kicks out all neighbor entries that are not in a permanent state (NUD_PERMANENT) and have a reference count of 1.

Its process is decisive: it first sets the neighbor's dead flag to 1, then calls neigh_cleanup_and_release() to send it on its way. Remember what we mentioned earlier? When memory pressure reaches a certain point, neigh_alloc wakes up this eviction team to make room. If it successfully tears down at least one entry, it returns 1; otherwise, it returns 0.

void neigh_periodic_work(struct work_struct *work)

This is an asynchronous cleaner that periodically does the chores. Unlike forced_gc, it's not as violent, gently cleaning up expired entries.

static void neigh_timer_handler(unsigned long arg)

Every neighbor entry has its own countdown timer. This timer handler is the reaction to hearing the alarm—usually used to detect if the neighbor has gone down (e.g., transitioning from REACHABLE to STALE, or starting to send probe packets).

Sending and Probing Logic

void neigh_probe(struct neighbour *neigh)

"Hello, is anyone there?" This method pulls a data packet from the neighbor's arp_queue queue (if there is one), then calls the corresponding solicit() method (like ARP's arp_solicit) to send a probe. It also takes the opportunity to increment the probe counter and free the packet used for probing.

neigh_hh_init(struct neighbour *n, struct dst_entry *dst)

For performance, the kernel needs to cache L2 headers. This method initializes the neighbor's (n) hardware header cache (hh_cache) based on the routing cache entry (dst). With this, subsequent packet transmissions don't need to calculate the header every time—a simple memcpy does the job.

static int neigh_blackhole(struct neighbour *neigh, struct sk_buff *skb)

The black hole. This isn't an astrophysical black hole, but a network dead end. It directly drops the packet and returns -ENETDOWN (network unreachable). This is usually used as a fallback strategy for the neighbor's output callback—when everything else fails, at least it prevents a kernel panic.


ARP-Specific Methods

Below are the "dialects" unique to the IPv4 realm.

void __init arp_init(void)

The ARP module's main() function. It is called during system startup and does a series of odd jobs:

  1. Initializes the ARP table (arp_tbl).
  2. Registers arp_rcv, telling the kernel "hand me any received ARP packets."
  3. Creates various entries under /proc.
  4. Registers sysctl parameters (the switches you see under /proc/sys/net/ipv4/).
  5. Registers the network device notifier arp_netdev_event so that ARP can react when NIC state changes.

int arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)

The packet reception entry point. Although logically we talk about "handling ARP requests," in the kernel, everything starts with arp_rcv catching an Ethernet frame of type 0x0806. It handles some validity checks (like whether the packet length is sufficient) and then hands it off to arp_process for processing.

int arp_process(struct sk_buff *skb)

The real brain. This is what we spent a lot of time dissecting earlier. It parses the ARP header and decides whether to send a reply or update the local cache. The stories happening here include: passive learning, handling ARP requests, handling ARP replies, and triggering neigh_update.

int arp_constructor(struct neighbour *neigh)

The constructor. When a new neighbor entry is created (specifically for the ARP table), the kernel calls this function to initialize the ARP-specific parts, such as setting the appropriate neigh_ops callbacks.

void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)

"I'll go ask." This method is responsible for sending an ARP request (ARPOP_REQUEST). Before sending, it does a bunch of mental preparation (checking various flags, deciding whether to use unicast or broadcast probes), and finally calls arp_send to throw the packet out.

void arp_send(...)

A convenience wrapper function. It calls arp_create to craft an ARP packet based on parameters like IP address and MAC address, then calls arp_xmit to send it.

void arp_xmit(struct sk_buff *skb)

The final kick. It calls NF_HOOK (Netfilter hooks), and if the firewall lets it through, hands it to the dev_queue_xmit driver for transmission.

struct arphdr *arp_hdr(const struct sk_buff *skb)

The lazy macro. It helps you calculate the offset and directly hands you the ARP header pointer (struct arphdr) from within skb. Stop calculating skb->data + sizeof(struct ethhdr) yourself—use this.

int arp_mc_map(__be32 addr, u8 *haddr, struct net_device *dev, int dir)

Multicast address translation. Mapping an IP address to a multicast MAC address isn't a simple arithmetic problem. For Ethernet, it calls ip_eth_mc_map; for InfiniBand, it calls something else. This function is that interpreter.

int arp_fwd_proxy(...) / int arp_fwd_pvlan(...)

The judge. These are used to determine whether the current device should enable Proxy ARP or Proxy ARP PVLAN. We mentioned proxy NDP in the NDISC section, and ARP has similar logic here—the router answers on behalf of the target host, saying "I am them."


NDISC-Specific Methods

Moving to IPv6, things get slightly more modern.

int ndisc_rcv(struct sk_buff *skb)

The NDISC main entry point. Although it's called ndisc, it actually sits on top of the ICMPv6 protocol. When you receive an ICMPv6 packet of type Neighbor Solicitation (NS) or Neighbor Advertisement (NA), icmpv6_rcv hands the packet over to ndisc_rcv.

static void ndisc_recv_ns(struct sk_buff *skb)

Specifically handles NS (Neighbour Solicitation) packets. The logic here corresponds to the "handle request" branch in ARP's arp_process, but it's much more complex—it also has to consider DAD (Duplicate Address Detection).

static void ndisc_recv_na(struct sk_buff *skb)

Specifically handles NA (Neighbour Advertisement) packets. It receives the neighbor's reply and updates the state machine.

static void ndisc_recv_rs(struct sk_buff *skb)

Handles RS (Router Solicitation). This is the host shouting "Are there any routers? I need to get online."

static void ndisc_router_discovery(struct sk_buff *skb)

Handles RA (Router Advertisement). The router says "I'm here, and here's how I'm configured..." This is where those parameters are processed.

int ndisc_constructor(struct neighbour *neigh)

Corresponds to ARP's arp_constructor. The callback when initializing an IPv6 neighbor entry, setting the operation set specific to ndisc.

void ndisc_solicit(struct neighbour *neigh, struct sk_buff *skb)

"I'll go ask (IPv6 version)." It calls ndisc_send_ns to send a Neighbor Solicitation.

int ndisc_mc_map(const struct in6_addr *addr, char *buf, struct net_device *dev, int dir)

Same as arp_mc_map, but for IPv6 multicast addresses. On Ethernet, it calls ipv6_eth_mc_map.


Macros

The kernel loves using macros to encapsulate those tedious but logically fixed checks. Below are the ones you'll frequently run into when reading the neighbor subsystem code.

ARP Behavior Control

These macros are usually used to check the switch states under /proc/sys/net/ipv4/conf/.

IN_DEV_PROXY_ARP(in_dev)

Checks if proxy_arp is enabled. It doesn't just look at the specific device's configuration; it also checks the settings of all and default.

IN_DEV_ARPFILTER(in_dev)

Checks arp_filter. This switch determines whether the kernel strictly filters ARP responses based on subnet matching. When enabled, if the requested ARP address is not within the receiving interface's subnet, the kernel might pretend it didn't hear it.

IN_DEV_ARP_ACCEPT(in_dev)

Checks arp_accept. This determines whether the kernel accepts "gratuitous ARP" (unsolicited advertisements sent without being asked). It's typically used in load balancing or high-availability failover scenarios.

IN_DEV_ARP_IGNORE(in_dev)

A very decisive macro. It returns the value of arp_ignore. This value controls the kernel's response level to ARP requests:

  • 0: Respond to any IP owned by the local machine, regardless of the interface (default).
  • 1: Respond only if the target IP is configured on the receiving interface (preventing multi-interface confusion).
  • Higher values are stricter, even to the point of not responding at all.

IN_DEV_ARP_ANNOUNCE(in_dev)

Corresponds to arp_announce. It controls how we choose the source IP address when sending ARP requests:

  • 0: Use any address on the local machine's interfaces.
  • 1: Try to use an address in the subnet where this network interface resides.
  • 2: Always use the primary address of this interface. This is to avoid being blacklisted by neighbors in complex routing environments because of a weird source IP.

IN_DEV_SHARED_MEDIA(in_dev)

Checks shared_media. If enabled, the kernel assumes different media types (like Ethernet and PPP) share the same IP space, which affects the subnet mask calculation logic.

Generic Operations

neigh_hold()

The "hold on tight" macro. It increments the neighbor object's reference count (refcnt). This prevents the neighbor from being suddenly freed by someone else while you're operating on it. This is crucial in concurrent environments.


neigh_statistics Structure

Finally, let's look at the neighbor subsystem's "scoreboard."

As we mentioned at the beginning of this chapter, both ARP and NDISC export statistics via procfs (in /proc/net/stat/arp_cache and /proc/net/stat/ndisc_cache, respectively). This data is read from the neigh_statistics structure inside the kernel.

Let's see what these fields represent and where in the kernel they get incremented.

struct neigh_statistics {
unsigned long allocs; /* 已分配的邻居数量 */
unsigned long destroys; /* 已销毁的邻居数量 */
unsigned long hash_grows; /* 哈希表扩容次数 */
unsigned long res_failed; /* 解析失败的次数 */
unsigned long lookups; /* 查询次数 */
unsigned long hits; /* 查询命中次数(在 lookups 中)*/
unsigned long rcv_probes_mcast; /* 接收到的多播探测 (IPv6) */
unsigned long rcv_probes_ucast; /* 接收到的单播探测 (IPv6) */
unsigned long periodic_gc_runs; /* 周期性 GC 执行次数 */
unsigned long forced_gc_runs; /* 强制 GC 执行次数 */
unsigned long unres_discards; /* 因解析失败而丢弃的包数 */
};

Field Details

  • allocs: Incremented by one every time neigh_alloc() successfully allocates. If this value skyrockets, it means there are a massive number of new hosts communicating wildly on your network, or someone is launching an attack.

  • destroys: Incremented by one each time neigh_destroy() is called. Under normal circumstances, it should maintain some kind of dynamic balance with allocs.

  • hash_grows: When the hash table gets too crowded (linked lists are too long), the kernel calls neigh_hash_grow() to expand. This number records the expansion count. If it's high, it indicates your network environment is very large or active.

  • res_failed: Resolution failures. neigh_invalidate() increments this count when it completely gives up on a neighbor.

  • lookups / hits: This is a pair of performance metrics. lookups is the total number of calls to neigh_lookup, and hits is the number of times it was found directly in the hash table. The closer hits / lookups is to 1, the higher the cache hit rate, and the smoother the network.

  • rcv_probes_mcast / ucast: These two are IPv6-specific. They record how many NS (Neighbor Solicitation) messages were received. Separating unicast and multicast statistics helps you troubleshoot network conditions—for example, if they're all unicast probes with no replies, the network might be unidirectionally blocked.

  • periodic_gc_runs / forced_gc_runs: Garbage collection activity. If forced_gc_runs is high, it means memory pressure is severe, and the kernel is continuously and violently cleaning up to make room.

  • unres_discards: The most tragic statistic. When __neigh_event_send finds that a neighbor hasn't been resolved, and the queue is full or the packet wasn't meant to wait anyway, it can only drop the packet and record it here.


7.6 Chapter Echoes

Finishing this chapter, the "mysterious veil" beneath the L3 layer in the Linux network stack has finally been lifted.

Many network tutorials will tell you that "networks use IP addresses for routing," which is true. But they usually forget one crucial thing: on that final hop, the IP address is actually useless. NICs don't recognize IPs at all; they only recognize MACs.

The neighbor subsystem is the bridge that translates "address languages."

The first understanding we built in this chapter is the cost of "discovery." When you ping an IP, you're actually having two conversations: one is a broadcast asking "who has this IP?", and the other is a unicast saying "I want to send data to you." If this step goes wrong, the symptom is "network unreachable," but underneath, it might just be that an entry in the ARP table has aged out.

The second understanding is the fragility of "trust." As we saw when discussing NUD (Neighbor Unreachability Detection), the kernel must constantly suspect whether a neighbor is still alive. From REACHABLE to STALE, and then to PROBE, this state machine is like a paranoid gatekeeper. This paranoia is necessary—because in a LAN, unplugging a cable doesn't require submitting a request, and plugging one in doesn't require saying hello. The kernel can only rely on continuous probing to maintain consistency with reality.

Remember the question from the beginning—why is IPv6's NDISC so much more complex than IPv4's ARP, even though they do the same thing? The answer is now clear: because ARP was designed for a simple interconnected network, while NDISC was designed for a complex network full of routers, auto-configuration, and security concerns. Every Flag in NDISC (Router, Override, Solicited) is essentially there to patch the various security and logic vulnerabilities exposed by the ARP protocol in the early Internet.

And ultimately, all these complex mechanisms—queries, caching, timeouts, probes, garbage collection—are encapsulated in those two simplest system calls: connect() and sendmsg(). As a user, you just write down the destination IP, and all the dirty work is silently done behind the scenes by the massive and precise mechanism described in this chapter.

Only by understanding the neighbor subsystem do you truly understand how a "LAN" operates in the kernel's eyes.

In the next chapter, we'll turn our gaze further away—routing. If the neighbor subsystem solves the question of "who is the next hop?", then the routing system solves the question of "which direction should we go?" That is the true decision center of the network stack.


Exercises

Exercise 1: Understanding

Question: During the initialization of the Linux kernel's neighbor subsystem, creating a new neighbor entry triggers a synchronous garbage collection mechanism. Under what condition regarding the current number of entries in the neighbor table will the kernel forcibly trigger synchronous garbage collection (neigh_forced_gc)?

Answer and Analysis

Answer: When the number of neighbor entries is greater than gc_thresh3 (default 1024), or when the number of entries is greater than gc_thresh2 (default 512) and the time since the last flush exceeds 5 seconds.

Analysis: Based on the logic in the neigh_alloc() method, the kernel checks the entry count before allocating a new neighbor. The code logic is: if (entries >= tbl->gc_thresh3 || (entries >= tbl->gc_thresh2 && time_after(now, tbl->last_flush + 5 * HZ))). If the entry count still exceeds gc_thresh3 after triggering garbage collection, the allocation fails. This ensures that the neighbor table can automatically clean up old entries under memory pressure, while also limiting the performance overhead caused by frequent scans.

Exercise 2: Understanding

Question: The kernel uses the struct neighbour structure to manage neighbor nodes. When a packet needs to be sent but the neighbor's link-layer address (such as a MAC address) has not yet been resolved, in which member queue of the structure are the pending packets temporarily stored? Once resolution is complete, in which member does the kernel cache the resolved L2 header to accelerate the encapsulation of subsequent packets?

Answer and Analysis

Answer: Packets are temporarily stored in arp_queue; the L2 header is cached in hh (hh_cache).

Analysis: During address resolution (usually in the NUD_INCOMPLETE state), the neighbor entry uses arp_queue (an SKB queue) to cache packets waiting to be sent, preventing packet loss. Once resolution succeeds, the kernel stores the constructed L2 header in hh (struct hh_cache). On subsequent transmissions, the kernel can directly copy this header without needing to re-resolve or look it up, significantly improving forwarding performance.

Exercise 3: Application

Question: Suppose you are troubleshooting a complex network fault: Server A has two interfaces, eth0 (192.168.1.10/24) and eth1 (192.168.2.10/24). You enable Proxy ARP on eth0 to allow Server B (192.168.1.20) to access Server C (192.168.2.20). At this point, a large number of ARP requests might cause Server A's ARP table to rapidly expand. To reduce the impact of ARP processing on normal traffic, what mechanism (involving proxy_timer and proxy_queue) does the kernel use to optimize Proxy ARP handling?

Answer and Analysis

Answer: The kernel uses a delayed processing mechanism, caching Proxy ARP request packets in proxy_queue and utilizing proxy_timer to batch-process them after a random delay (up to proxy_delay).

Analysis: In a Proxy ARP scenario, a host might receive a massive number of ARP requests. Processing every request immediately could consume significant CPU resources and lead to queue overflow. The kernel implements a delay strategy through neigh_proxy_process() and proxy_timer: request packets are placed into proxy_queue, and replies are sent after waiting for a random amount of time. This mechanism gives the real IP owner (the host that actually owns the IP) a chance to respond first, while also smoothing out the Proxy ARP processing load. This is critical in high-traffic gateway or load-balancing scenarios.

Exercise 4: Thinking

Question: In the IPv6 Neighbor Discovery Protocol (NDISC), when a host configures a new IPv6 address, it must perform "Duplicate Address Detection" (DAD). In the kernel, once the DAD process starts, the IPv6 address is assigned a specific state flag (such as IFA_F_TENTATIVE). If a data packet must be sent before DAD completes (i.e., while the address is still in the Tentative state) and the kernel has "Optimistic DAD" enabled, what is the fundamental difference in the kernel's behavior compared to when this feature is not enabled?

Answer and Analysis

Answer: When Optimistic DAD is not enabled, a Tentative state address cannot be used for communication, and packet transmission is blocked or restricted; when Optimistic DAD (RFC 4429) is enabled, the kernel allows using the Tentative address as a source address to send packets before DAD completes. Although this violates strict duplicate-free guarantees, it significantly reduces connection establishment delay during network startup.

Analysis: This question tests a deep understanding of the IPv6 address state machine and performance optimizations. Standard DAD requires waiting for verification to pass before using the address, which introduces a bidirectional delay in connection establishment (especially when RTT is high). addrconf_dad_start() sets the IFA_F_TENTATIVE flag. Optimistic DAD assumes the probability of conflict is extremely low, thereby taking the risk of using the address early. This requires special handling in the protocol stack (such as restricting responses to neighbor solicitations) and represents an engineering trade-off of the "reliability first" principle, reflecting the kernel's design considerations in high-performance scenarios.


Key Takeaways

The Linux kernel uniformly manages the IPv4 ARP and IPv6 NDISC protocols through the neighbor subsystem, with the core responsibility of dynamically resolving Layer 3 IP addresses into Layer 2 MAC addresses. To balance resolution efficiency with system resources, the kernel uses a hash table to store neighbor entries and sets gc_thresh thresholds (soft limits trigger garbage collection, hard limits directly reject new connections) to prevent table overflow. When creating a neighbor entry, the kernel calls a specific constructor based on the protocol type (IPv4/IPv6), automatically identifies and handles special addresses like broadcast and multicast that don't need resolution, and works with the neigh_ops interface to adapt the sending behavior of different network device types (such as Ethernet or point-to-point devices).

On the sending path (like ip_finish_output2), packets query the neighbor table. If an entry doesn't exist or the state is unreachable, the kernel doesn't just drop the packet; it temporarily stores it in the arp_queue queue and triggers the resolution mechanism. For IPv4, this is achieved through broadcast or unicast ARP Requests; for IPv6, it's achieved through ICMPv6 Neighbor Solicitation (NS) messages. To optimize the network environment, the kernel carefully selects the source IP based on the arp_announce parameter when sending requests, and supports sending unicast probes first to reduce broadcast traffic. On the receiving path, the kernel utilizes a "passive learning" mechanism—whether it receives a request or a response, it casually records the sender's MAC address to update the neighbor table.

IPv6's NDISC protocol is more rigorous than ARP. It implements multiple functions including router discovery, prefix discovery, and address resolution through ICMPv6. To guarantee address uniqueness, IPv6 strictly requires performing Duplicate Address Detection (DAD) when configuring an address (i.e., marking it as Tentative state), sending a special NS message with an empty source address to confirm there are no conflicts before officially enabling it. NDISC's Neighbor Advertisement (NA) messages carry flags like Router, Solicited, and Override, which not only communicate resolution results but also precisely control neighbor cache state update strategies (such as whether to forcibly overwrite old caches), thereby building a neighbor discovery system that is more secure and controllable than plain ARP.

To cope with unreliable network environments, the neighbor subsystem introduces a comprehensive state machine (NUD states) to manage the lifecycle of entries. Entries transition from the initial INCOMPLETE state to the REACHABLE state after resolution. As time passes and they go unused, they age to the STALE, DELAY, or even PROBE states. The kernel decides whether to send data directly or perform address verification first based on these states. This "trust but verify" strategy, combined with timers and random timeout mechanisms, ensures that in highly concurrent and dynamically changing network topologies, link-layer mappings can remain efficient while also failing fast and recovering quickly, preventing the creation of network black holes.