Skip to main content

11.7 Quick Reference Manual

We've read through the code, studied the protocols — now let's assemble all the scattered pieces back into a single blueprint.

This section is essentially a "cheat sheet." The next time you're wandering around the kernel, trying to figure out how a packet actually gets pulled from the NIC into a socket buffer, this sheet will help you quickly pinpoint which function is doing the heavy lifting.

We've organized these methods by protocol, laying out the most important macros and tables for easy reference.


Core Method Quick Reference

This isn't an exhaustive list — it's the "axles" that truly drive the protocol machinery.

Generic Socket Operations

These handle the socket's own lifecycle and generic attributes. Whether it's TCP, UDP, or DCCP, they all pass through these gates.

  • int sock_create(int family, int type, int protocol, struct socket **res); This is the kernel-side projection of the user-space socket() system call. It does two things: a sanity check (verifying the family and type are valid), and calling sock_alloc() to allocate a struct socket, followed by invoking the protocol family's create method. For IPv4, that's inet_create(). Without this step, nothing else that follows is possible.

  • int sock_map_fd(struct socket *sock, int flags); We have a socket structure, but we need to turn it into a file descriptor to return to the user. This method allocates an fd and fills in the file entry. It's the glue layer for the "everything is a file" philosophy.

  • void sock_hold(struct sock *sk); / void sock_put(struct sock *sk); The lifecycle of kernel objects relies entirely on reference counting. sock_hold "pins it" — increments the count, preventing release during use. sock_put "lets go" — decrements the count, and when it hits zero, triggers release. This is the safety latch in concurrent environments: one missing put means a leak; one missing hold means a panic.

  • bool sock_flag(const struct sock *sk, enum sock_flags flag); struct sock has a pile of flag bits (e.g., SOCK_DEAD indicating the connection is dead). This function checks whether a specific bit is set. Simple, but the most direct way to check connection state.

IP Layer Helpers

  • int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc); This is the parser for "non-ordinary data" passed via ancillary data from user space. When you use sendmsg with control information attached (like specifying a source IP or interface), the kernel parses this msghdr into an ipcm_cookie structure. The latter is the "instruction set" that the IP layer actually understands.

TCP (Transmission Control Protocol)

TCP is the most complex piece, and its function signatures are the longest.

  • struct tcp_sock *tcp_sk(const struct sock *sk); As mentioned earlier, the kernel has a strict inheritance hierarchy. struct sock is the generic base class, struct tcp_sock is the TCP-specific derived class. This macro uses the container_of mechanism to get the embedded tcp_sock pointer from a generic sock pointer. With it, you can access all TCP-private parameters (congestion window, congestion state, etc.).

  • void tcp_init_sock(struct sock *sk); When a socket is first created, it needs initialization. TCP has many private variables (delayed ACK timer, keepalive timer, etc.), and this method sets them to their initial values.

  • struct tcphdr *tcp_hdr(const struct sk_buff *skb); An sk_buff contains data, IP headers, TCP headers. This function helps you skip past the network layer headers and point directly to the start of the TCP header. Don't calculate offsets yourself — use this macro.

  • int tcp_v4_rcv(struct sk_buff *skb); This is the main entry point for TCP packets over IPv4. After the network layer (L3) receives a TCP packet (IP header protocol number is 6), it eventually hands it to this function. It handles checksum verification, looks up the socket hash table, and either queues the packet for reception or feeds it into the state machine.

  • int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t size); User-space send() or write() eventually lands here. It segments user-space data, manages the send queue, triggers retransmission timers, and when necessary pushes packets down to the IP layer.

UDP (User Datagram Protocol)

UDP is much cleaner.

  • struct udphdr *udp_hdr(const struct sk_buff *skb); Same as tcp_hdr, but points to the UDP header.

  • int udp_rcv(struct sk_buff *skb); UDP's main receive entry point. While simpler than TCP (no connection state), it still does checksum verification, finds the corresponding receiver in the socket hash table, and tosses the skb into sk_receive_queue.

  • int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t len); UDP's send entry point. This handles the UDP_CORK option — if set, data accumulates; otherwise it constructs a UDP header and hands it to the IP layer directly. It also handles the MSG_MORE flag we discussed earlier.

SCTP (Stream Control Transmission Protocol)

SCTP's object model is more complex, with Associations and Endpoints.

  • struct sctp_sock *sctp_sk(const struct sock *sk); Like tcp_sk, retrieves the SCTP-specific sctp_sock structure from a generic sock.

  • struct sctp_association *sctp_association_new(...); void sctp_association_free(struct sctp_association *asoc); The core of SCTP is the "association." These two functions create and destroy associations. sctp_association_new initializes the state machine and prepares the Transport Control Block (TCB).

  • void sctp_chunk_hold(struct sctp_chunk *ch); / void sctp_chunk_put(struct sctp_chunk *ch); The basic unit of data in SCTP is called a Chunk. These two functions manage Chunk reference counts. When put reaches 0, sctp_chunk_destroy() is called to actually free the memory.

  • struct sctphdr *sctp_hdr(const struct sk_buff *skb); Locates the SCTP header.

  • int sctp_rcv(struct sk_buff *skb); SCTP's receive orchestrator. It handles not just data but various control Chunks (like INIT, SHUTDOWN). Because SCTP supports multi-homing, the lookup logic here is more involved than TCP.

  • int sctp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t msg_len); SCTP's send interface. It decides which destination address to send to (multi-homing selection), splits data into DATA Chunks, then hands them to the IP layer.

DCCP (Datagram Congestion Control Protocol)

As a latecomer, DCCP's interface naming conventions closely resemble TCP's.

  • static int dccp_v4_rcv(struct sk_buff *skb); DCCP's receive function over IPv4. Although it's a datagram protocol, it introduces TCP-like handshakes and state machines, so this function also has state machine processing logic.

  • int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t len); DCCP's send entry point. It needs to be unreliable like UDP while managing congestion control like TCP (CCID). It decides how to send packets based on CCID characteristics.


Core Macros

Not many macros, but there's one that's the SCTP signature:

  • sctp_chunk_is_data(chunk) SCTP is a protocol where control messages and data messages are interleaved. This macro tells you at a glance whether what you're holding is a data chunk. If 1, it's Payload Data (SCTP_CID_DATA); otherwise it's a control chunk like Init, Ack, etc.

Key Data Tables

These tables are the "vocabulary" of the protocols.

Table 11-1: TCP vs. UDP proto_ops Comparison

This table shows the user-facing (POSIX interface) operation structure proto_ops. Note the differences between TCP and UDP.

prot_ops callbackTCP implementation (inet_stream_ops)UDP implementation (inet_dgram_ops)
releaseinet_releaseinet_release
bindinet_bindinet_bind
connectinet_stream_connectinet_dgram_connect
acceptinet_acceptsock_no_accept
listeninet_listensock_no_listen
sendmsginet_sendmsginet_sendmsg
recvmsginet_recvmsginet_recvmsg
polltcp_polludp_poll

Key point: Note accept and listen. UDP is connectionless, so it has no concept of listen or accept (just use recvfrom directly). If you try to call listen() on a UDP socket, the kernel returns EOPNOTSUPP (operation not supported), because the corresponding function pointer points to sock_no_accept.

Table 11-2: SCTP Chunk Types

SCTP packets aren't called "packets" — they're called Chunks. Here's the dictionary.

Chunk TypeLinux Kernel SymbolValueDescription
Payload DataSCTP_CID_DATA0The actual data payload.
InitiationSCTP_CID_INIT1First step of the handshake, establishing the association.
Initiation AckSCTP_CID_INIT_ACK2Second step of the handshake, confirming the connection.
SACKSCTP_CID_SACK3Selective acknowledgment, telling the peer which packets were received.
HeartbeatSCTP_CID_HEARTBEAT4Heartbeat packet, detecting link liveness (essential for multi-homing).
AbortSCTP_CID_ABORT6Immediately terminate the association — aggressive.
ShutdownSCTP_CID_SHUTDOWN7Graceful close.
Cookie EchoSCTP_CID_COOKIE_ECHO10Send the state Cookie back to the server, preventing SYN flood.
Cookie AckSCTP_CID_COOKIE_ACK11Cookie received, handshake complete.
ASCONFSCTP_CID_ASCONF0xC1Dynamically modify IP addresses (multi-homing feature).

Note: Chunk IDs are not entirely contiguous — there are reserved gaps, and later extensions (like AUTH, FWD_TSN) have values that jump significantly.

Table 11-3: DCCP Packet Types

DCCP's packet types define who it is and what it's doing.

Linux SymbolDescription
DCCP_PKT_REQUESTRequest packet from the client (similar to TCP SYN).
DCCP_PKT_RESPONSEServer's response (similar to TCP SYN/ACK).
DCCP_PKT_DATAPure data packet.
DCCP_PKT_DATAACKMost common. Data packet with piggybacked ACK — DCCP encourages using this.
DCCP_PKT_ACKPure ACK, confirming receipt.
DCCP_PKT_CLOSEREQServer requests close (similar to TCP FIN? Not exactly).
DCCP_PKT_CLOSEClose the connection.
DCCP_PKT_RESETReset the connection (abnormal termination or rejection).
DCCP_PKT_SYNCSequence number synchronization after heavy packet loss.
DCCP_PKT_SYNCACKAcknowledgment for SYNC.

Notice that DCCP has "Control" in its name — it genuinely tries to maintain a decent connection state, even though it doesn't guarantee data delivery.


Chapter Echo

Looking back at this chapter, we dismantled everything from the familiar Socket API down to the protocol stack's lowest levels.

Actually, the design of the kernel networking subsystem isn't mysterious — it fundamentally deals with "differences."

  • The Socket layer (struct socket) handles the "file difference" — disguising network packets as file reads and writes.
  • The Transport layer (struct sock, TCP/UDP) handles the "transport model difference" — some flow like water (TCP), some get tossed like envelopes (UDP), and some want it both ways (SCTP/DCCP).

Remember the dilemma we encountered in the DCCP section? Why does the more advanced DCCP have almost no real-world adoption, while the aging UDP is everywhere? The answer isn't in the code — it's in the ecosystem. NAT devices don't recognize DCCP, ISPs don't support DCCP, trapping DCCP on isolated islands within internal networks. This proves an engineering truth once again: the best protocol is often not the theoretically most perfect one, but the one that can punch through existing infrastructure. That's why UDP remains irreplaceable to this day, and it's the fundamental reason QUIC ultimately chose to build on top of UDP.

In the next chapter, we'll cross the boundary, leaving the host itself to see how neighbors greet each other — that's the territory of ARP and ND, the world of Layer 2 (Link Layer).


Exercises

Exercise 1: Understanding

Question: In the Linux kernel implementation, struct socket and struct sock are two core data structures. Is the following statement correct: struct socket is primarily responsible for interacting with user space (providing the file interface), while struct sock is primarily responsible for interacting with the network layer (L3)? Briefly describe the main differences between the two.

Answer and Analysis

Answer: Correct.

Analysis: As described in the "Creating Sockets" section, struct socket provides the interface to user space (containing the file pointer file and operation callbacks ops), while struct sock represents the network layer (L3) socket interface, containing queues, buffer sizes, and protocol-related callbacks (like sk_receive_queue). The kernel uses this separation to decouple the filesystem view from the network protocol stack view.

Exercise 2: Application

Question: When sending data over UDP, if you want to accumulate data from multiple send calls and assemble them into a large packet in the kernel (to reduce fragmentation), what two mechanisms can you use? Briefly explain their principles based on the kernel code implementation (such as udp_sendmsg).

Answer and Analysis

Answer: Use the UDP_CORK socket option or set the MSG_MORE flag in the sendmsg call.

Analysis: As described in the "Sending Packets with UDP" section, the kernel code checks corkreq = up->corkflag || msg->msg_flags&MSG_MORE. When corkreq is true, the kernel holds the socket lock and calls ip_append_data to cache data into sk_write_queue, until the option is canceled or data is sent without MSG_MORE, finally sending everything via udp_push_pending_frames. This allows the application layer to merge multiple logical data blocks into a single physical packet.

Exercise 3: Thinking

Question: Why does SCTP (Stream Control Transmission Protocol) use a four-way handshake instead of a three-way handshake like TCP? Analyze the design intent from a security perspective.

Answer and Analysis

Answer: To prevent SYN flood attacks.

Analysis: While this chapter focuses mainly on SCTP's implementation details (like Chunks and Associations), as mentioned in the "Setting Up an SCTP Association" section, SCTP uses a State Cookie mechanism. In the four-way handshake, the server doesn't allocate resources but instead sends a COOKIE (INIT-ACK). The client must echo this COOKIE (COOKIE-ECHO). Only after verification does the server establish the association. This mechanism prevents attackers from exhausting server resources by forging massive numbers of INIT requests, making it more resilient than TCP's three-way handshake.

Exercise 4: Understanding

Question: When the kernel receives a UDP packet (__udp4_lib_rcv), if no matching socket is found through the hash table (i.e., no program is listening on the target port), and the packet's checksum is correct, what type of ICMP message does the kernel send back to the sender?

Answer and Analysis

Answer: ICMP Destination Unreachable (specifically Port Unreachable).

Analysis: As described in the "Receiving Packets from the Network Layer (L3) with UDP" section, when socket lookup fails (sk == NULL), it means no local application is listening on that port. If the checksum is correct, the kernel should send an ICMP Destination Unreachable message to notify the sender that the port is unreachable.

Exercise 5: Application

Question: DCCP (Datagram Congestion Control Protocol) combines features of both TCP and UDP. Suppose you need to develop a real-time streaming application that requires both low latency (tolerating packet loss) and avoidance of network congestion. Would you choose DCCP or UDP? Explain your reasoning based on DCCP's characteristics (such as CCID).

Answer and Analysis

Answer: You should choose DCCP.

Analysis: As described in the "DCCP" section, DCCP provides congestion control mechanisms that UDP lacks. For streaming applications, while some packet loss is tolerable (a UDP characteristic), using pure UDP could lead to congestion collapse. DCCP uses built-in congestion control algorithms (like CCID-3, similar to TCP's smooth algorithm) to control sending rate, maintaining low latency (no reliability guarantees, no retransmission delays) while ensuring network fairness — making it the optimal choice for such scenarios.


Key Takeaways

The Linux kernel exposes networking functionality through the standard POSIX socket API, following the "everything is a file" philosophy. Although the user-space interface is unified, the kernel internally maintains a complex object model. This dual structure consists primarily of struct socket and struct sock: the former is the user-space-facing interface layer, associated with the filesystem for standard I/O operations; the latter is the lower-level representation facing the network protocol stack, managing send/receive queues and protocol state. The two work closely together, using struct msghdr as a container for efficient transfer of data payloads and control information between kernel and user space.

UDP is the best entry point for understanding data flow in the kernel networking stack, with its implementation embodying a balance of simplicity and efficiency. During initialization, the UDP protocol registers its handler functions into the kernel's protocol table via udp_protocol and udp_prot. When sending data, the kernel distinguishes between fast and slow paths based on whether UDP_CORK is enabled: the fast path directly constructs an sk_buff and sends it, while the slow path accumulates data into the write queue via ip_append_data for merged transmission — a design that maintains flexibility while optimizing network performance for small-packet scenarios.

TCP's kernel implementation is built on complex state machines and strict timer mechanisms to ensure transmission reliability. When creating a socket, tcp_v4_init_sock initializes four core timers including retransmission, delayed ACK, keepalive, and zero-window probe, along with buffer and initial congestion window settings. During the three-way handshake, the kernel manages half-open connections through the listening socket, generating a request_sock for each new connection request, converting it to a full child socket only after the handshake completes and placing it in the accept queue — effectively decoupling connection establishment logic from the data transmission path.

TCP's data send/receive flow incorporates deep optimizations for performance and concurrency. On the receive path, the kernel decides processing strategy based on whether the socket is locked by a user process: if unlocked, it attempts batch processing via the prequeue to reduce context switches; if locked, packets are temporarily stored in the backlog queue to avoid drops. On the send path, tcp_sendmsg not only copies data but also handles Nagle's algorithm, MSS (Maximum Segment Size) calculations, and other logic, ultimately handing the encapsulated packet to the IP layer via tcp_transmit_skb — reflecting the protocol stack's extreme pursuit of reliability and flow control.

From UDP's "fire and forget" to TCP's "watertight fit," the Linux kernel builds a highly extensible framework by separating the generic Socket layer from protocol-specific implementations (like proto_ops and proto structures). This architecture allows TCP, UDP, SCTP, and other protocols to reuse the same system call entry points (like sendmsg) while implementing differentiated low-level logic based on their own characteristics (like TCP's complex congestion control vs. UDP's simple checksums). Understanding this abstraction layer is the key to grasping how the Linux kernel networking subsystem balances standard interfaces with internal complexity.