Skip to main content

14.12 PPPoE Header — Pinning the Protocol onto Ethernet

In the previous section, we walked through the two phases of PPPoE — Discovery and Session — like watching two people shake hands and greet each other before starting a conversation.

But that was just the protocol-level flowchart. Inside the kernel, those "handshakes" and "conversations" ultimately have to become bits packed into Ethernet frames.

In this section, we shift our perspective from "protocol flow" to "kernel implementation." We look at how the kernel defines that 6-byte header, how it processes these packets, and most importantly — when the user-space pppd daemon makes a call, how the kernel strings it all together.

14.12.1 That 6-Byte Header: pppoe_hdr

Let's set aside the complex processing logic for now. The first step for any protocol is defining its "ID card." In the Linux kernel, PPPoE's ID card looks like this:

struct pppoe_hdr {
#if defined(__LITTLE_ENDIAN_BITFIELD)
__u8 ver : 4;
__u8 type : 4;
#elif defined(__BIG_ENDIAN_BITFIELD)
__u8 type : 4;
__u8 ver : 4;
#else
#error "Please fix <asm/byteorder.h>"
#endif
__u8 code;
__be16 sid;
__be16 length;
struct pppoe_tag tag[0];
} __packed;

This structure is short, but every single bit in it has a purpose.

Let's break it down:

  1. ver (4 bits): Version number. RFC 2516 explicitly states this must be 1.
  2. type (4 bits): Type number. Likewise, RFC 2516 mandates this must be 1.
    • Note the bitfield handling here: ver and type are packed into a single byte. The kernel uses conditional compilation to handle endianness (__LITTLE_ENDIAN_BITFIELD vs __BIG_ENDIAN_BITFIELD), meaning the layout of this field in memory depends on your machine's architecture — a common pitfall when writing cross-platform code.
  3. code (1 byte): The packet types we mentioned in the previous section. 0x09 (PADI), 0x07 (PADO), 0x19 (PADR), 0x65 (PADS), 0xa7 (PADT).
  4. sid (2 bytes): Session ID. Once the Discovery phase ends and an ID is assigned, every subsequent packet must carry this "pass."
  5. length (2 bytes): The length here is a bit nuanced — it refers to the length of the PPPoE payload only, excluding the PPPoE header itself, let alone the preceding Ethernet header.
  6. tag[0]: This is a zero-length array (Flexible Array Member), followed by TLV (Type-Length-Value) formatted tags. This is where information is exchanged during the Discovery phase, such as "my AC-Name" or "the Service-Name I want."

To give you a mental picture, this structure in memory (from a packet capture perspective) looks roughly like this:

  • First 4 bytes: Ver/Type + Code + Session ID.
  • Next 2 bytes: Length.
  • Beyond that: Payload or Tags.

14.12.2 Kernel Boot Preparation: pppoe_init()

With the protocol defined, the kernel needs to know: "when an Ethernet frame arrives with a Type of 0x8863 or 0x8864, who should handle it?"

That's the job of pppoe_init(), located in drivers/net/ppp/pppoe.c.

The core action is registering two protocol handlers:

static struct packet_type pppoes_ptype __read_mostly = {
.type = cpu_to_be16(ETH_P_PPP_SES), // 0x8864
.func = pppoe_rcv, // 会话包处理函数
};

static struct packet_type pppoed_ptype __read_mostly = {
.type = cpu_to_be16(ETH_P_PPP_DISC), // 0x8863
.func = pppoe_disc_rcv, // 发现包处理函数
};

static int __init pppoe_init(void)
{
int err;

// 向内核注册协议处理器
dev_add_pack(&pppoes_ptype);
dev_add_pack(&pppoed_ptype);

// ... 其他初始化代码 ...

return 0;
}

There's a detail worth savoring here: why two separate handlers?

Because during the Discovery phase, the kernel has no idea who you are, and you don't have a Session ID. At that point, you're like a headless fly, broadcasting PADI everywhere. By the Session phase, you have a Session ID, the link is established, and the processing logic is completely different.

dev_add_pack() is an old friend we've seen in previous chapters. It hooks these two handlers into the global protocol processing chain. Later, when a network interface card (NIC) receives a packet and sees the type is ETH_P_PPP_DISC, it hands it straight to pppoe_disc_rcv.

Beyond that, pppoe_init() also handles two miscellaneous tasks:

  1. Exporting a procfs interface: In /proc/net/pppoe, you'll see the current session list (Session ID, MAC address, device name).
  2. Registering a notification chain: By calling register_netdevice_notifier(&pppoe_notifier). This way, if a NIC is suddenly unplugged or brought down, the PPPoE module gets notified immediately and tears down the session instead of waiting around.

14.12.3 PPPoX Sockets: The Generic Wrapper Layer

This is a design that can easily make your head spin.

PPPoE runs over Ethernet. But to seamlessly interface with the PPP protocol stack, the Linux kernel introduced something called PPPoX (PPP over Anything). It's a generic Socket family (AF_PPPOX) that supports not only PPPoE but also other protocols like PPTP.

This generic structure looks like this:

struct pppox_sock {
/* struct sock must be the first member of pppox_sock */
struct sock sk;
struct ppp_channel chan;
struct pppox_sock *next; /* 用于哈希表 */

union {
struct pppoe_opt pppoe;
struct pptp_opt pptp;
} proto;
__be16 num;
};

There are a few key points here:

  1. struct sock sk must be the first field. This is a common trick in the kernel networking stack, allowing a struct sock * pointer to be directly cast to a struct pppox_sock * for use.
  2. union { ... } proto: This is where the generic nature comes in. If it's PPPoE, it uses proto.pppoe; if it's PPTP, it uses proto.pptp.
  3. struct ppp_channel chan: This is the channel to the PPP core layer. PPPoE is only responsible for transporting the packets; the actual PPP negotiation (LCP, IPCP, etc.) is handed off to the PPP core via this channel.

When we discuss PPPoE, what we care about is proto.pppoe. It contains a pppoe_opt structure, and the most critical field inside is pa (pppoe_addr), which holds the "call record" for our session:

struct pppoe_addr {
sid_t sid; /* Session identifier */
unsigned char remote[ETH_ALEN];/* Remote address (对方 MAC) */
char dev[IFNAMSIZ]; /* Local device to use (比如 eth0) */
};

These three fields uniquely identify a PPPoE session: which NIC you're on, who you're talking to (MAC), and which ID you're using.


14.12.4 User-Space Connection: From socket() to connect()

Everything above was just laying the groundwork with kernel data structures. The real meat is how user space uses these things.

Typically, we don't write our own code to call these Sockets. Instead, we use pppd (PPP daemon) paired with the rp-pppoe plugin. When you run a dial-up script, the flow looks like this:

Step 1: Create the Socket (socket())

pppd calls socket(AF_PPPOX, SOCK_STREAM, PX_PROTO_OE).

This triggers the pppoe_create() method inside the kernel. Its main job is to allocate memory and attach a series of callback functions:

static int pppoe_create(struct net *net, struct socket *sock)
{
struct sock *sk;

sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, &pppoe_sk_proto);
if (!sk)
return -ENOMEM;

sock_init_data(sock, sk);

sock->state = SS_UNCONNECTED;
sock->ops = &pppoe_ops; // <--- 关键:挂上操作集

sk->sk_backlog_rcv = pppoe_rcv_core;
sk->sk_state = PPPOX_NONE;
// ... 设置 family, protocol 等 ...

return 0;
}

Note this line: sock->ops = &pppoe_ops;. pppoe_ops is a massive function pointer structure that defines this Socket's behavior:

static const struct proto_ops pppoe_ops = {
.family = AF_PPPOX,
.owner = THIS_MODULE,
.release = pppoe_release,
.bind = sock_no_bind, // PPPoE 不需要 bind
.connect = pppoe_connect, // <--- 连接时调用这个
.sendmsg = pppoe_sendmsg,
.recvmsg = pppoe_recvmsg,
.ioctl = pppox_ioctl,
// ...
};

Step 2: Initiate the Connection (connect())

With the Socket created, user space then calls connect(), passing the information obtained during the Discovery phase (Session ID, peer MAC, interface name) to the kernel.

This triggers pppoe_connect() in the kernel. There's a substantial block of logic here, so let's break it down piece by piece.

1. Preventing Duplicate Connections

struct sock *sk = sock->sk;
struct sockaddr_pppox *sp = (struct sockaddr_pppox *)uservaddr;
struct pppox_sock *po = pppox_sk(sk);
// ...

// 如果已经是 Connected 状态,且 Session ID 不为 0(说明是会话阶段),
// 那就别折腾了,直接返回 EBUSY。
if ((sk->sk_state & PPPOX_CONNECTED) &&
stage_session(sp->sa_addr.pppoe.sid))
goto end;

2. Binding to a Network Interface

Since this is PPPoE, it must run on a specific physical (or virtual) device.

if (stage_session(sp->sa_addr.pppoe.sid)) {
// 根据名字找到 net_device
dev = dev_get_by_name(net, sp->sa_addr.pppoe.dev);
if (!dev)
goto err_put;

po->pppoe_dev = dev;
po->pppoe_ifindex = dev->ifindex;
// ...

3. State Check

There's a very practical check here:

// 网卡必须是 UP 状态,否则免谈
if (!(dev->flags & IFF_UP)) {
goto err_put;
}

If the NIC isn't up, you can't dial in.

4. Inserting into the Hash Table

This is a classic operation in network programming for quickly looking up which session a packet belongs to.

write_lock_bh(&pn->hash_lock);

// 以 (sid, remote_mac, ifindex) 为 key 插入哈希表
// 如果重复了,返回 -EALREADY
error = __set_item(pn, po);
write_unlock_bh(&pn->hash_lock);

The kernel maintains a global table. Later, when a sid=123 packet arrives, a quick table lookup tells it which Socket should handle it.

5. Registering the PPP Channel

This is the final step, and the one that bridges the "Ethernet driver" with the "PPP protocol stack."

// 设置通道的头部预留空间(给 PPPoE header 和以太网头)
po->chan.hdrlen = (sizeof(struct pppoe_hdr) + dev->hard_header_len);

// 设置 MTU:物理 MTU 减去 PPPoE 头长度
po->chan.mtu = dev->mtu - sizeof(struct pppoe_hdr);

po->chan.private = sk;
po->chan.ops = &pppoe_chan_ops;

// 向 PPP 核心层注册这个通道
error = ppp_register_net_channel(dev_net(dev), &po->chan);

Once ppp_register_net_channel succeeds, this Socket is officially connected to the PPP framework. From then on, IP packets coming down from upper layers are encapsulated by PPP and then, through the pppoe_chan_ops callback, ultimately turned into Ethernet frames and sent out.


14.12.5 Hands-On Verification: Let's Run It

With the theory out of the way, let's see how to actually operate this in practice.

We typically use the rp-pppoe open-source project (or directly use the plugin from pppd).

Server-Side Example

Suppose you want to simulate a carrier's equipment (an AC) on a Linux machine. You can start the PPPoE server like this:

pppoe-server -I p3p1 -R 192.168.3.101 -L 192.168.3.210 -N 200
  • -I p3p1: Listen on the p3p1 network interface.
  • -L 192.168.3.210: The local IP assigned to the client (server side).
  • -R 192.168.3.101: The starting value of the remote IP assigned to the client.
  • -N 200: Allow up to 200 concurrent sessions.

Client Side (User Side)

The client is typically invoked through a pppd configuration file. The config file looks roughly like this:

plugin rp-pppoe.so
nic-eth0
user "myname@isp"
password "mypassword"

When pppd starts, it calls the PPPOEConnectDevice() function within the rp-pppoe.so plugin. This function completes the Discovery phase in user space (sending PADI, receiving PADO...). Once it obtains the Session ID, it calls the socket() and connect() system calls we just analyzed, passing the baton of the connection to the kernel.

From that point on, the kernel's PPPoE module takes over the data channel.


14.12.6 Wrapping Up

Looking back, in this section we walked through the entire process from structure definition to Socket creation.

If the previous section was like reading a "script" (the protocol flow), this section was like watching the "stage being built" (the kernel implementation).

  • struct pppoe_hdr defines the language of the conversation.
  • pppoe_init hooks up the send and receive handlers in the background.
  • pppoe_connect turns user-space requests into hash table entries and Channel registrations inside the kernel.

This is the panoramic view of PPPoE in the Linux kernel. Although fiber-to-the-home is now ubiquitous and PPPoE is used less than before, its design of "stuffing a point-to-point protocol into an Ethernet broadcast network" remains a brilliantly elegant engineering case study.

In the next chapter, we'll turn our attention to another behemoth of mobile communications — the Android Network Stack — and see how these mechanisms are repackaged and used on smartphones.