Skip to main content

3.2 The "Swiss Army Knife" of IPv6: ICMPv6

In the previous section, we navigated the world of IPv4. While ICMPv4 is certainly important, within the IPv4 architecture, it plays somewhat of a "patch" role—ARP handles address resolution, IGMP manages multicast, and ICMPv4 is mainly left with error reporting and diagnostics.

But in the IPv6 world, things have fundamentally changed. ICMPv6 is no longer just a supporting character; it has become the core of the entire stage. Without it, an IPv6 network can barely take a single step.

It's like firing your dedicated nanny, driver, and butler, and replacing them all with a single all-purpose butler—who handles error reporting (like ICMPv4), takes over ARP's job (Neighbor Discovery), and manages IGMP's responsibilities (Multicast Listener Discovery). This is both the elegance of its design and the complexity of its implementation.

3.2.1 More Than Just a v6 Version of v4

ICMPv6 is defined in RFC 4443 (if you look at the code comments, you might see RFC 1885 or RFC 2463—these are older versions superseded by 4443). If you search the kernel source directly, you'll find two main implementation files:

  • net/ipv6/icmp.c
  • net/ipv6/ip6_icmp.c

Just like ICMPv4, ICMPv6 is compiled directly into the kernel. Want to build it as a module? No dice.

Despite the similar name, ICMPv6 carries far more responsibilities than v4. In addition to retaining the original error reporting and diagnostic functions (Ping and Traceroute), it has taken over two extremely important domains:

  1. ND (Neighbour Discovery): Completely replaces ARP. Instead of broadcasting to ask "Who is 192.168.1.1?", you use ICMPv6 messages to get the job done.
  2. MLD (Multicast Listener Discovery): Takes over IGMP's job, managing multicast group membership.

This also explains why the ping_rcv() method mentioned in the previous section appears in both files—because to support Ping, the kernel needs to handle both IPv4 and IPv6 echo requests.

3.2.2 Beneath the Surface: ICMPv6 Initialization

The ICMPv6 initialization process is very similar to IPv4—essentially "same bones, different skin"—but there are subtle differences in the details.

The core actions of initialization remain the same two steps:

  1. Register the protocol handler (telling the kernel "hand me packets with protocol number 58").
  2. Create a control socket (leaving a backdoor for the kernel itself to send messages).

Let's look at the first step, defining the protocol structure:

static const struct inet6_protocol icmpv6_protocol = {
.handler = icmpv6_rcv,
.err_handler = icmpv6_err,
.flags = INET6_PROTO_NOPOLICY | INET6_PROTO_FINAL,
};

There are two flags here worth noting:

  • INET6_PROTO_NOPOLICY: This means ICMPv6 does not require an IPsec policy check. This is a very sensible security design—if you are handling network-layer error reports, you absolutely do not want to drop the error report itself just because IPsec verification failed, otherwise you would never know what actually went wrong with the network.
  • INET6_PROTO_FINAL: This flag implies it is one of the final handlers.

Then comes the registration, which is done via inet6_add_protocol():

int __init icmpv6_init(void)
{
int err;
...
if (inet6_add_protocol(&icmpv6_protocol, IPPROTO_ICMPV6) < 0)
goto fail;
return 0;
}

The protocol number IPPROTO_ICMPV6 has a value of 58. This means that when the IPv6 layer sees a Next Header field of 58, it will directly toss the packet to icmpv6_rcv().

The second step is allocating a "dedicated transmitter" for each CPU:

static int __net_init icmpv6_sk_init(struct net *net)
{
struct sock *sk;
...
for_each_possible_cpu(i) {
err = inet_ctl_sock_create(&sk, PF_INET6,
SOCK_RAW, IPPROTO_ICMPV6, net);
...
net->ipv6.icmp_sk[i] = sk;
...
}

The logic here is exactly the same as v4: it iterates over all possible CPUs, creating a Raw Socket for each one. Why do we do this? Once again, to reduce lock contention. If all CPUs shared a single Socket to send ICMP error messages, the send queue would become a massive bottleneck under high concurrency. Now, each CPU has its own private stash (icmp_sk)—whoever sends the message uses their own queue, with zero interference.

3.2.3 Header Structure: Separating Errors from Information

The ICMPv6 header structure looks very similar to v4, but there is a key "bit manipulation" logic you need to understand.

The structure definition (include/uapi/linux/icmpv6.h):

struct icmp6hdr {
__u8 icmp6_type;
__u8 icmp6_code;
__sum16 icmp6_cksum;
...
};

These three fields are standard: type, code, and checksum.

But the key lies in the interpretation rules for icmp6_type. RFC 4443 defines a very clever dividing line:

  • High bit is 0 (0 ~ 127): These are Error Messages.
  • High bit is 1 (128 ~ 255): These are Informational Messages.

To make this easy to check, the kernel defines a mask ICMPV6_INFOMSG_MASK (which is essentially 0x80). With a single bitwise AND operation, you can know whether the packet in your hands is good news or bad news.

Common Message Types Quick Reference:

Type ValueKernel MacroCategoryMeaning
1ICMPV6_DEST_UNREACHErrorDestination Unreachable
2ICMPV6_PKT_TOOBIGErrorPacket Too Big (MTU issue)
3ICMPV6_TIME_EXCEEDErrorTime Exceeded (similar to v4's TTL Exceeded)
4ICMPV6_PARAMPROBErrorParameter Problem
128ICMPV6_ECHO_REQUESTInfoPing Request
129ICMPV6_ECHO_REPLYInfoPing Reply
133NDISC_ROUTER_SOLICITATIONInfoRouter Solicitation (part of the ND protocol)
134NDISC_ROUTER_ADVERTISEMENTInfoRouter Advertisement
135NDISC_NEIGHBOUR_SOLICITATIONInfoNeighbor Solicitation (similar to ARP Request)
136NDISC_NEIGHBOUR_ADVERTISEMENTInfoNeighbor Advertisement (similar to ARP Reply)

You'll notice that the second half of the table is entirely made up of ND (Neighbor Discovery) protocol messages. This once again confirms the point made at the beginning: in IPv6, ICMP is the carrier for Neighbor Discovery.

3.2.4 The Receive Path: Dispatch Logic in icmpv6_rcv()

When an ICMPv6 packet arrives at the network card and is unpacked by the IPv6 layer, it ultimately lands in the hands of icmpv6_rcv(). This function resides in net/ipv6/icmp.c, and its flowchart (Figure 3-4) shows a clear "funnel-style" processing approach.

Step 1: Security Check and Statistics

As soon as a packet comes in, we do two things:

  1. Update the counter: Increment ICMP6_MIB_INMSGS (total received ICMPv6 messages).
  2. Check credentials: Verify the checksum.

If the checksum is wrong, the packet is definitely corrupted. The kernel updates the ICMP6_MIB_INERRORS counter and then simply drops the packet (kfree_skb). Note that icmpv6_rcv() does not return an error code here; it always returns 0, just like its IPv4 predecessor icmp_rcv().

Step 2: Type Dispatch

With the checksum verified, we move on to the type. Unlike the massive icmp_pointers dispatch table in ICMPv4, ICMPv6 uses a giant switch(type) statement for dispatch.

Here's a detail: when each specific type is processed, the kernel uses the ICMP6MSGIN_INC_STATS_BH macro to update the corresponding statistic under /proc/net/snmp6. For example, if an Echo Request is received, Icmp6InEchos increments; if a Neighbor Solicitation is received, Icmp6InNeighborSolicits increments.

Step 3: Who Handles What?

Let's look at the key branches in this switch statement:

1. Echo Request (ICMPV6_ECHO_REQUEST) Someone is pinging you.

  • Handler: icmpv6_echo_reply().
  • Behavior: Constructs an Echo Reply packet and sends it back.

2. Echo Reply (ICMPV6_ECHO_REPLY) This is your reply to someone else's ping.

  • Handler: ping_rcv().
  • Note: This is the "dual-stack" code mentioned earlier. It lives in net/ipv4/ping.c and handles Ping replies for both IPv4 and IPv6.

3. Packet Too Big (ICMPV6_PKT_TOOBIG) This is an IPv6-specific and extremely important message.

  • Trigger condition: A packet you sent was too large, and the MTU of some intermediate link couldn't accommodate it.
  • Handling:
    1. First calls pskb_may_pull() to ensure the packet contains enough data to read.
    2. Then calls icmpv6_notify(), which ultimately triggers raw6_icmp_error() to notify the relevant Raw Sockets of the error.

4. Neighbor Discovery (ND) Messages This is a group of messages, including types 133 through 137.

  • Handler: All handed off to ndisc_rcv() (located in net/ipv6/ndisc.c).
  • Importance: We will discuss this function in detail in Chapter 7, as it is the core of IPv6 address resolution.

5. Multicast Listener Discovery (MLD) Messages

  • ICMPV6_MGM_QUERY: Query, handed to igmp6_event_query().
  • ICMPV6_MGM_REPORT: Report, handed to igmp6_event_report().
  • Note: We will dive into the details of MLD in Chapter 8.

6. Default Branch (Catch-all Logic) What happens if an incoming packet is an "unknown type," or belongs to a less common category (like Mobile IPv6 related messages)?

Look at this switch default code—it's quite interesting:

default:
LIMIT_NETDEBUG(KERN_DEBUG "icmpv6: msg of unknown type\n");

/* informational */
if (type & ICMPV6_INFOMSG_MASK)
break;

/*
* error of unknown type.
* must pass to upper level
*/
icmpv6_notify(skb, type, hdr->icmp6_code, hdr->icmp6_mtu);
}

The logic is as follows:

  • If it is an informational message (high bit is 1), it simply calls break. This means the kernel silently ignores it. Why? Because receiving an unknown informational message usually doesn't affect network operation; it's just extra noise.
  • If it is an error message (high bit is 0), it cannot be ignored. It must call icmpv6_notify() to pass the error up to the upper layers. This complies with RFC 4443: even unknown error messages must be reported to upper-layer protocols (like Raw Sockets) for handling, in case the upper-layer protocol needs to take special action.

3.2.5 The Send Path: Timing and Limitations of icmpv6_send()

Sending ICMPv6 messages relies mainly on two functions:

  1. icmpv6_send(): The generic send function, used for sending various error messages (unreachable, time exceeded, etc.).
  2. icmpv6_echo_reply(): Dedicated to replying to Ping requests.

The prototype of icmpv6_send() looks very similar to the IPv4 version:

static void icmpv6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info)
  • skb: The original "culprit" packet that triggered this error.
  • type: The ICMP type you want to send (e.g., ICMPV6_TIME_EXCEED).
  • code: The specific error code.
  • info: Extra information, most commonly the MTU value.

Scenario 1: Hop Limit Exceeded (IPv6's version of TTL Exceeded)

When a router forwards an IPv6 packet, the Hop Limit is decremented by 1 at each hop. Once it reaches 0, the packet must be dropped and the sender notified.

The code is in ip6_forward() (net/ipv6/ip6_output.c):

if (hdr->hop_limit <= 1) {
/* Force OUTPUT device used as source address */
skb->dev = dst->dev;
icmpv6_send(skb, ICMPV6_TIME_EXCEED, ICMPV6_EXC_HOPLIMIT, 0);
IP6_INC_STATS_BH(net, ip6_dst_idev(dst), IPSTATS_MIB_INHDRERRORS);

kfree_skb(skb);
return -ETIMEDOUT;
}

This step is critical; without it, data packets would spin infinitely in routing loops until they exhausted the network bandwidth.

Scenario 2: Fragment Reassembly Timeout

If a destination host receives a bunch of fragments but doesn't collect them all within the specified time, or if a few went missing in transit, it gives up waiting.

The code is in ip6_expire_frag_queue() (net/ipv6/reassembly.c):

void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq,
struct inet_frags *frags)
{
...
icmpv6_send(fq->q.fragments, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);
...
}

At this point, it tells the sender: "I'm giving up; this packet's reassembly timed out."

Scenario 3: Port Unreachable (UDP Scenario)

This logic is exactly the same as in ICMPv4. If you send a packet to a closed UDP port, the kernel looks up the Socket; if it can't find one and the checksum is correct, it sends a Port Unreachable back.

The code is in __udp6_lib_rcv():

sk = __udp6_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
if (sk != NULL) {
...
}
...
if (udp_lib_checksum_complete(skb))
goto discard;

UDP6_INC_STATS_BH(net, UDP_MIB_NOPORTS, proto == IPPROTO_UDPLITE);
icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_PORT_UNREACH, 0);

Scenario 4: Packet Too Big — The Key to PMTU

This is one of the biggest differences between IPv6 and IPv4. In IPv4, if you send a packet that is too large, the router will fragment it (unless the DF bit is set to 1). But in IPv6, routers are forbidden from fragmenting. Fragmentation is the sender's own responsibility.

So, when an IPv6 router finds a packet larger than the outgoing MTU, its only option is to drop it and send a ICMPV6_PKT_TOOBIG message to the sender, telling it: "The road is narrow, the car is too big; bring a smaller one next time (MTU is xxx)."

Note the difference from v4 here:

  • IPv4: Sends ICMP_DEST_UNREACH + ICMP_FRAG_NEEDED.
  • IPv6: Sends the independent ICMPV6_PKT_TOOBIG type.

The code is in ip6_forward():

if ((!skb->local_df && skb->len > mtu && !skb_is_gso(skb)) ||
(IP6CB(skb)->frag_max_size && IP6CB(skb)->frag_max_size > mtu)) {
/* Again, force OUTPUT device used as source address */
skb->dev = dst->dev;
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
IP6_INC_STATS_BH(net, ip6_dst_idev(dst), IPSTATS_MIB_INTOOBIGERRORS);
IP6_INC_STATS_BH(net, ip6_dst_idev(dst), IPSTATS_MIB_FRAGFAILS);
kfree_skb(skb);
return -EMSGSIZE;
}

The mtu variable is passed as the info parameter to icmpv6_send, and ultimately gets filled into the MTU field of the ICMPv6 message. When the sender receives this value, it adjusts its Path MTU accordingly.

Pre-send Checks: Don't Turn "Error Reports" into "Error Bombs"

Before actually calling icmpv6_send(), the kernel performs many checks, the most critical of which is preventing an "error avalanche."

Imagine a scenario where a network cable is cut, causing massive numbers of packets to fail forwarding. If the router sent an ICMP unreachable message for every single packet, the returning ICMP messages could trigger further congestion, potentially leading to an "ICMP storm."

To prevent this, icmpv6_send() supports rate limiting. It calls icmpv6_xrlim_allow() to check if messages are being sent too frequently. Under the hood, this function actually calls the same generic IPv4 inet_peer_xrlim_allow().

However, rate limiting is not always enabled.

In the following three situations, icmpv6_send() will skip the rate limit check:

  1. Sending informational messages, such as Echo Reply. Because these are usually not errors, they won't trigger a storm.
  2. PMTU Discovery related messages (ICMPV6_PKT_TOOBIG). Because this message must be delivered promptly; if it gets dropped by rate limiting, the sender will never know the correct MTU, and the connection will break completely.
  3. Messages sent to the Loopback device (the machine sending to itself). Since it all stays in memory anyway, congestion is a non-issue.

In addition to rate limiting, there is a hard rule: ICMPv6 error messages must not exceed 1280 bytes (the minimum IPv6 MTU, i.e., IPV6_MIN_MTU). This ensures the error report itself can traverse the network smoothly. RFC 4443 requires that error messages include as much of the original offending packet as possible, but absolutely must not exceed the minimum MTU to do so. If the original packet is too long, the kernel will ruthlessly truncate it.

3.2.6 A Special Access Channel: ICMP Sockets (Ping Sockets)

In the past, if you wanted to ping someone, your program needed root privileges. Because creating a Raw Socket (socket(PF_INET, SOCK_RAW, ...)) requires the CAP_NET_RAW capability. This is why the /bin/ping program traditionally has the setuid root bit set.

But around 2011, the Linux kernel introduced a new security mechanism: ICMP Sockets (also known as Ping Sockets).

This is a new protocol type, IPPROTO_ICMP, but it is not a Raw Socket; rather, it is a special type of Datagram Socket.

Changes in Creation Method:

  • Old way: socket(PF_INET, SOCK_RAW, IPPROTO_ICMP) — requires root.
  • New way: socket(PF_INET, SOCK_DGRAM, IPPROTO_ICMP) — does not necessarily require root.

IPv6 is the same; you can use SOCK_DGRAM combined with IPPROTO_ICMPV6.

How Is It Implemented?

The code mainly lives in net/ipv4/ping.c (the IPv6 portion actually calls the code here too, which is why it's called dual-stack).

When a user attempts to create this type of Socket, the kernel checks:

  1. Is the user's group within the range defined by /proc/sys/net/ipv4/ping_group_range?

This procfs entry defaults to 1 0, which means "no one can use it." If you want to allow users with a GID of 1000 to use Ping, you can run:

echo 1000 1000 > /proc/sys/net/ipv4/ping_group_range

If you want to allow everyone to use it (except pure root), you can set the upper limit to the maximum value:

echo 0 2147483647 > /proc/sys/net/ipv4/ping_group_range

Security Checks

The ping_supported() function ensures this Socket can only be used to send standard Echo Requests:

static inline int ping_supported(int family, int type, int code)
{
return (family == AF_INET && type == ICMP_ECHO && code == 0) ||
(family == AF_INET6 && type == ICMPV6_ECHO_REQUEST && code == 0);
}

In other words, you cannot use this mechanism to send ICMP Redirect or Destination Unreachable messages; it can only be used to Ping. This gives ordinary users the right to diagnose the network without handing them a weapon for DoS attacks.

This is the secret behind why many modern Linux distributions can achieve "rootless Ping."

3.2.7 Chapter Echoes

Looking back at ICMPv6, you'll find it is far more than just a version upgrade of ICMPv4.

In IPv6's design philosophy, complexity is carefully managed. ARP was eliminated, IGMP was merged, and all neighbor interaction logic was unified into the ICMPv6 layer. This makes the protocol stack itself "thinner"—you only need to maintain a transport layer (UDP/TCP) and a network layer (IPv6), with all the miscellaneous chores handed off to ICMPv6.

But precisely because of this consolidation, the ICMPv6 code has become quite complex. It has to handle network-layer error reporting (like time exceeded, unreachable), link-layer interactions (NDP), and transport-layer assistance (MLD). If ICMPv6 goes down, the IPv6 network is effectively paralyzed.

Remember that "intuition" we mentioned at the beginning of this chapter—that Ping is a tool for testing connectivity? Now you should realize that in IPv6, Ping tests much more than just "whether the path is clear." It tests whether the entire ICMPv6 subsystem is alive, because without ICMPv6, your neighbor table might be empty, your routes might be wrong, and your MTU might be mismatched.

In the next chapter, we will dive into the IPv4 network layer implementation. Although we have discussed many ICMP details, those belong to the "control plane." The real "data plane"—how actual user data packets are routed, fragmented, and forwarded—is the most vast engineering expanse of the network layer.