Skip to main content

11.1 Sockets

A philosophical question has run throughout Unix history: "everything is a file."

Following this philosophy, network communication should feel as natural as reading and writing a local file: open(), read(), write(), close(). But when you actually dive into the network subsystem implementation, you'll find that the kernel goes to extraordinarily complex lengths behind the scenes to maintain this "file" illusion.

What's even more confusing is that to implement this interface, the kernel maintains two structs with strikingly similar names but completely different responsibilities. Why split a simple communication endpoint in half? This isn't just historical baggage—it's the first real hurdle to understanding the entire Linux Network Stack.

In this section, we'll start by examining how this interface is designed from the user's perspective, then peel back the layer to reveal the "twin" puzzle hidden inside the kernel.

Socket Types

The Socket API supports multiple communication types. We'll briefly survey them here, focusing on the ones we'll explore in depth in this chapter.

Stream Sockets (SOCK_STREAM) Think of this as making a phone call. First, you dial to establish a connection, much like TCP's three-way handshake—if the other party doesn't pick up (busy, powered off), the call won't go through. Once connected, the channel is exclusive and bidirectional. Whatever you say, the other party hears exactly that, and the voice (byte stream) arrives in order with no dropped words. TCP is the poster child for this mechanism. If you can't tolerate data corruption or out-of-order delivery, this is the safe choice.

Datagram Sockets (SOCK_DGRAM) This is like sending a postcard. You drop the postcard in the mailbox and walk away. Neither you nor the post office guarantees when it will arrive, or even that it will arrive at all—it might blow away halfway, arrive three days late, or the recipient might receive two identical copies. The upside is you don't need to call ahead and ask "can I send you a letter now?"; you just send it. UDP follows this model. If you're writing real-time audio/video or games where you can tolerate packet loss but are extremely sensitive to latency, postcards are your go-to.

But remember, the postcard analogy has one imprecise aspect: in a highly reliable LAN environment, UDP rarely drops packets, making it look like a dependable courier service. Don't be fooled by this appearance—once you cross the public internet, it instantly reverts to being a postcard that might get lost.

Raw Sockets (SOCK_RAW) This is a "god mode" interface. It lets you bypass the transport layer and talk directly to the IP layer. If you want to construct your own IP headers, implement your own protocol, or write a tool like ping (which requires ICMP access), this is what you need. Naturally, this privilege usually requires root permissions—after all, the kernel doesn't want you arbitrarily spoofing a source IP to cause trouble.

Other Types Linux also supports a few more specialized types:

  • SOCK_RDM: Provides reliable message delivery, primarily used with TIPC (Transparent Inter-Process Communication).
  • SOCK_SEQPACKET: Similar to SOCK_STREAM in being connection-oriented, but it preserves record boundaries.
  • SOCK_DCCP: Datagram Congestion Control Protocol, a transport protocol combining characteristics of both TCP and UDP.
  • SOCK_PACKET: Considered obsolete in the AF_INET protocol family.

Core Socket API Methods

Userspace programmers work with these functions every day. But from the kernel's perspective, they aren't just function calls—they're entry points into a complex kernel subsystem:

  • socket(): Creates a new Socket. This is far more than simply malloc a block of memory; behind the scenes, it involves protocol family lookup and initialization.
  • bind(): Gives the Socket an "identity" (local port and IP address).
  • send() / recv(): The cornerstones of sending and receiving data.
  • listen(): Puts the Socket into "listening" mode, ready to accept connection requests. Note that only "phone calls" (TCP) need this; "postcards" (UDP) don't.
  • accept(): Pulls a connection from the wait queue and returns a new Socket descriptor. This is the most critical step on the TCP server side.
  • connect(): Initiates a connection. For TCP, this kicks off the three-way handshake; for UDP, it simply sets a default destination address (writing a fixed recipient on the postcard), making it convenient to use send() directly afterward.

In the kernel code (net/socket.c), all of these system calls ultimately converge into the socketcall() method for unified dispatch.

The focus of this book is kernel network implementation, not a userspace API usage guide. So we won't dive into how to write a connect() loop or how to handle EINTR error codes.

What Is a Socket Inside the Kernel?

Merely knowing how to call these APIs is a far cry from truly understanding the network subsystem.

The real trouble hides inside the kernel. When you call socket() in userspace, the kernel doesn't just create an inode like it does when opening a regular file. Instead, it orchestrates a "split": it simultaneously creates two things—struct socket and struct sock.

These two names look like twins, but they belong to entirely different species. Why design it this way? Couldn't we just merge them?

If you feel this design is a bit odd, your intuition is correct. Behind it lies the most core layering philosophy of the Linux Network Stack. In the next section, we'll dive into the source code, put this pair of "twins" on the dissection table, and along the way, take a look at struct msghdr—the "suitcase" for exchanging data between userspace and the kernel.