Skip to main content

10.2 IKE (Internet Key Exchange)

In the previous section, we mentioned that IPsec is not just a kernel module—it's a joint operation spanning both user space and kernel space. The kernel is only responsible for "execution"—that is, taking the keys and encrypting or decrypting packets. But before execution, someone needs to negotiate the keys and establish the rules. This "diplomatic negotiation" task is exactly what IKE (Internet Key Exchange) is all about.

The main players in this negotiation are user-space daemons. In the Linux world, there are several contenders: the veteran Openswan (and its derivative libreswan), the powerful strongSwan, and racoon from the BSD camp (part of the Kame project). Among these, strongSwan and Openswan are the most actively maintained options today.

Security Association (SA): The Goal of the Negotiation

Regardless of which software we use, they all ultimately do one thing: establish a Security Association (SA).

We can think of an SA as a "contract"—it specifies in detail how communication between two IP addresses should be encrypted, which algorithms to use, what the keys are, and how long this contract remains valid.

But before diving into the contract details, we need to understand how the "contract" is located. In the kernel, an SA is uniquely identified by two parameters:

  1. Destination address: Who the packet is sent to.
  2. SPI (Security Parameter Index): A 32-bit number.

This SPI is critical. However, the analogy of "SPI is a unique number" is slightly misleading: it is not globally unique like a port number. Instead, it must be combined with the destination address (plus the protocol number ESP/AH) to uniquely identify an SA. It's like an office building (destination IP) with many companies (different SAs) inside—the SPI is the specific "room number." A room number alone isn't enough to find someone; we must combine it with "which building" to pinpoint the location.

With the SA in place, the kernel knows exactly which key to use to unlock an encrypted packet. If these parameters don't match, the kernel simply drops the packet.

How Are Keys Negotiated?

The process of establishing an SA is essentially both parties aligning their parameters:

  • Which encryption algorithm to use? (AES? 3DES?)
  • Which authentication algorithm to use? (HMAC-SHA1? HMAC-SHA256?)
  • What are the actual keys?
  • How long is this key valid? (Key lifetime)

There are two ways to finalize these parameters:

  1. Manual Key Exchange: Administrators manually type commands on both sides to hardcode the keys. This is secure (as long as we don't stick the password on the monitor), but it's cumbersome and doesn't support automatic key rotation. It's rarely used today.
  2. IKE Protocol: Automatically negotiated by a daemon. This is the mainstream approach.

The pluto daemon in Openswan and the charon daemon in strongSwan are dedicated to this exact job. They talk to the peer over UDP port 500, and once an agreement is reached, they write the negotiated SAs and policies into the kernel via the Netlink XFRM interface.

Back to the "room number" analogy: The role of the IKE protocol is like two building administrators sitting down for a meeting and agreeing, "From now on, when we need to communicate with each other, go to room 1001 (SPI=1001) for the secret code." Once confirmed, the administrators register this information in the logbook of the gate security guard (the kernel XFRM framework).

IKEv1 vs IKEv2: The Burden of History

Although many legacy systems still use IKEv1, frankly, if it weren't for backward compatibility with those relics (like macOS's built-in client, which is based on old code), we should have fully migrated to IKEv2 long ago.

Why? Because IKEv1 is honestly a bit "clunky."

Message Exchange Efficiency: IKEv1 is too verbose. To establish an IPsec SA, IKEv1 requires 9 messages (two phases: Main Mode and Quick Mode). IKEv2 streamlines this process down to just 4 messages.

Reliability Design:

  • IKEv1: No acknowledgment mechanism. What happens if a packet is lost? It can only blindly retransmit, and there's a very annoying race condition—if both sides think they lost a packet and start retransmitting simultaneously, it can cause chaos.
  • IKEv2: Learned from past mistakes. Every request must have a response. The responsibility for retransmission falls entirely on the initiator, avoiding the risk of both sides "colliding."

Major Functional Differences: IKEv2 didn't just simplify the process; it also neatly resolved many issues that IKEv1 handled awkwardly:

  • NAT Traversal: IKEv1 required additional patches to support NAT-T (NAT Traversal), whereas IKEv2 has this support built in natively.
  • Traffic Selectors: In IKEv1, the subnet configurations on both sides had to match exactly—not a single difference allowed. IKEv2 supports automatic narrowing; as long as one side's subnet is a subset of the other's, it works. This makes configuration much less of a headache.
  • EAP Authentication: IKEv1 uses XAUTH for user authentication, which has questionable security. IKEv2 supports EAP (Extensible Authentication Protocol), allowing us to first verify the server's identity with a certificate, and then perform user authentication with a relatively weaker password (like EAP-MSCHAPv2)—this ensures we're connecting to the real server while still allowing users to use simple passwords.

⚠️ Warning Avoid using IKEv1's Aggressive Mode whenever possible. Although it's convenient to configure (no need to input an ID), it transmits the response hash in plaintext over the network, making it highly vulnerable to offline dictionary attacks. strongSwan even jokingly refers to a daemon using this mode as weakSwan.

The Two Phases of IKE

Since the protocol is still in use, we need to look at how it actually works. We'll focus primarily on IKEv1 here (because its phase division is clearer—once we understand it, IKEv2's merged mode is easy to grasp).

Phase 1: Main Mode This is the trust-building phase. Both parties verify each other's identity here (who are you?) and compute a shared session key using the Diffie-Hellman algorithm.

  • Authentication can be based on RSA/ECDSA certificates or on a PSK (Pre-Shared Key). While PSK is simple, the "pre-shared" nature means that if someone changes the password without syncing it, or if the password is too weak, the entire link is compromised.
  • If Phase 1 succeeds, both parties establish an ISAKMP SA (also called an IKE SA). This is a management channel; all subsequent negotiations travel over this channel and are protected by it.

Phase 2: Quick Mode With the management channel in place, it's time to get down to business. In this phase, both parties negotiate the specific IPsec SA parameters used for transmitting data (encryption algorithms, keys, etc.).

  • In IKEv2, this phase is merged into the IKE_AUTH exchange, establishing the first CHILD_SA.

The reason IKEv1 needs 9 messages (6 in Phase 1, 3 in Phase 2) is that it must secretly negotiate keys on an insecure network (DH exchange), mutually verify identities (authentication), and prevent Man-in-the-Middle (MITM) attacks. Every step must be taken with extreme caution.


IPsec and Cryptography

While the daemon is busy shaking hands in user space, the kernel is also doing extremely heavy lifting—encryption and decryption. The backbone supporting this is the Linux Kernel Crypto API.

Two Eras of Stacks

In the history of the Linux kernel, there have been two main IPsec implementations:

  1. KLIPS: An ancient implementation that predates netfilter. It supported OCF (Open Cryptography Framework), which had the advantage of supporting asynchronous calls.
  2. Netkey (XFRM): The "native" implementation introduced starting with the 2.6 kernel, developed by Alexey Kuznetsov and David S. Miller. It directly uses the kernel's Crypto API.

Modern kernels default to using Netkey (the XFRM framework). Although the early Crypto API mostly used synchronous calls (blocking while waiting for encryption results), today's implementation is highly advanced.

Asynchronous Cryptography and Hardware Acceleration

Encryption is a CPU-intensive task. If we're handling gigabit VPN traffic, the CPU could max out just computing hashes. This is where hardware acceleration cards come into play.

Hardware devices typically can't block and wait—they issue an encryption request and move on to other tasks, relying on an interrupt to notify the CPU when the hardware finishes. Therefore, they must use an asynchronous API.

  • acrypto: The kernel has an asynchronous cryptography layer specifically designed to handle this.
  • pcrypt: A powerful tool for multi-core environments. It can distribute encryption requests across multiple CPU cores in parallel while guaranteeing in-order completion (this is critical for IPsec because IPsec has an anti-replay mechanism; out-of-order delivery causes packet drops). In certain scenarios, enabling pcrypt can boost IPsec performance several times over.

AEAD: The Pinnacle of Efficiency

Starting with the Linux 2.6.25 kernel (2008), the XFRM framework began supporting AEAD (Authenticated Encryption with Associated Data) algorithms, the most typical representative being AES-GCM.

Why is this important? In the past, encryption and authentication were calculated separately. We would run AES encryption first, then compute HMAC-SHA1 authentication. This meant the data had to be scanned twice. With an AEAD algorithm like AES-GCM, a single operation simultaneously completes both encryption and integrity verification. If our CPU supports the AES-NI instruction set (basically standard on Intel and AMD server-grade CPUs), this operation is nearly "free"—extremely fast with minimal CPU overhead.

If we're building a high-performance VPN, we must check our algorithm configuration. If we're still using 3DES + HMAC-SHA1, it's like putting 87-octane gas in a Ferrari—completely wasting the kernel's capabilities.


In the next section, we'll dive into the kernel and see how, once the user-space daemon hands over the keys, the kernel's XFRM framework takes over the show.