11.6 DCCP: Datagram Congestion Control Protocol
We have finally reached the last stop on the IPv4 transport layer tour.
DCCP is a protocol born from a dilemma: it wants the speed of UDP (low latency, no retransmission) combined with the politeness of TCP (congestion control). You can think of it as "polite UDP" — it doesn't blindly flood the network regardless of congestion, nor does it halt everything to retransmit a single lost packet and ruin real-time performance.
But this analogy isn't entirely accurate: DCCP isn't a hack that patches UDP. Like TCP, it is connection-oriented and requires a three-way handshake before communication begins. In the kernel implementation, it even shares deep lineage with TCP.
Returning to the "polite UDP" analogy: you can probably see that DCCP's core strength lies in controllability. It allows applications to choose different "levels of politeness" (i.e., different congestion control algorithms, or CCIDs) based on the scenario. If you pick the wrong algorithm — for example, using CCID-4, designed for small packets, in a streaming media scenario — you might get a terrible experience, or it might not work at all.
11.6.1 DCCP Core Mechanism: Pluggable Congestion Control
DCCP's approach to congestion control is a bit more "democratic" than TCP's.
In TCP, the congestion control algorithm is hardcoded in the kernel (although Linux now supports pluggable modules, the protocol specification itself is fixed). In DCCP, however, the congestion control algorithm is negotiable and is known as a CCID (Congestion Control ID).
A DCCP connection is viewed as two "half-connections":
- Sending half-connection: The A → B data flow, where A controls the sending rate.
- Reverse half-connection: The B → A data flow (primarily ACKs), where B controls the sending rate.
This separation means both sides can use entirely different algorithms — for instance, A might use CCID-3 (TCP-Friendly Rate Control) to send video, while B uses CCID-2 (TCP-like) to send ACKs.
Currently, there are two mainstream CCIDs:
- CCID-2 (TCP-Like): This is essentially TCP congestion control wearing a DCCP disguise. It uses slow start, a congestion window, and SACK (Selective Acknowledgment). Use case: high-bandwidth, high-latency scenarios where packet loss is acceptable but a network collapse is not.
- CCID-3 (TCP-Friendly Rate Control, TFRC): This is a rate-based smooth algorithm. Instead of drastically adjusting the window like CCID-2, it calculates a smooth sending rate. It's ideal for streaming media, where visual jitter (rate changing too quickly) is much less acceptable than dropping a few frames.
- CCID-4: A small-packet variant of CCID-3, specifically optimized for small data payloads (experimental).
The Linux kernel added DCCP support way back in version 2.6.14 (2005). This chapter only discusses the implementation principles of DCCPv4; the actual mathematical derivations for specific CCIDs belong in a different textbook.
11.6.2 DCCP Header Parsing
Like any proper protocol, DCCP has its own header.
The minimum header is only 12 bytes, but it is variable-length (ranging from 12 to 1020 bytes). Why such a huge difference? Because DCCP header options also use a TLV format (similar to SCTP or TCP Options), and the sequence number length is variable.
Note: DCCP sequence numbers increment per-packet (packet-based), unlike TCP, which increments per-byte (byte-based). This is a crucial distinction that comes into play when handling sequence number logic later.
Let's look at the kernel definition:
struct dccp_hdr {
__be16 dccph_sport,
dccph_dport;
__u8 dccph_doff;
#if defined(__LITTLE_ENDIAN_BITFIELD)
__u8 dccph_cscov:4,
dccph_ccval:4;
#elif defined(__BIG_ENDIAN_BITFIELD)
__u8 dccph_ccval:4,
dccph_cscov:4;
#endif
__sum16 dccph_checksum;
#if defined(__LITTLE_ENDIAN_BITFIELD)
__u8 dccph_x:1,
dccph_type:4,
dccph_reserved:3;
#elif defined(__BIG_ENDIAN_BITFIELD)
__u8 dccph_reserved:3,
dccph_type:4,
dccph_x:1;
#endif
__u8 dccph_seq2;
__be16 dccph_seq;
};
Here are a few key fields, broken down:
- dccph_sport / dccph_dport: Source and destination ports. Same concept as TCP/UDP, nothing more to add.
- dccph_doff: Data offset. Tells the kernel "how long the DCCP header is," in 4-byte units. Since a bunch of TLV options might be appended after the header, without this field the kernel wouldn't know where the actual data starts.
- dccph_cscov (Checksum Coverage): This is a clever trick in DCCP — partial checksums.
- Normally, a checksum covers the entire packet. But some applications (like certain audio streams) don't care if a few bits in the data are corrupted, or they sacrifice a bit of correctness for performance. You can set
cscovto a smaller value to checksum only the header, leaving the data payload to its own devices. UDP-Lite has a similar design.
- Normally, a checksum covers the entire packet. But some applications (like certain audio streams) don't care if a few bits in the data are corrupted, or they sacrifice a bit of correctness for performance. You can set
- dccph_ccval: A 4-bit space for CCID algorithms to pass algorithm-specific information between sender and receiver (such as echoing the congestion window value). It is not generic.
- dccph_type: Packet type. 4 bits. For example,
DCCP_PKT_DATAis a data packet, andDCCP_PKT_ACKis an acknowledgment packet. - dccph_x (Extended Sequence Numbers): Extended sequence number flag.
- If
dccph_x = 0: the sequence number is 24 bits (dccph_seq16 bits +dccph_seq2upper 8 bits). - If
dccph_x = 1: the sequence number is 48 bits (a true long sequence number). - If the network speed is extremely fast, a 24-bit sequence number will wrap around quickly, requiring extended mode to be enabled.
- If
- dccph_seq2 / dccph_seq: These are the sequence number fields mentioned above. Because sequence numbers increment per-packet, what's stored here is simply the packet number.
Figure 11-4 shows the dccph_x=1 case, where you can see the Sequence Number portion pieced together into 48 bits.
Figure 11-5 shows the dccph_x=0 case, where the sequence number is only 24 bits, saving space.
11.6.3 Initialization: Registration and Socket Creation
The DCCP initialization flow in the kernel is practically cut from the same cloth as TCP/UDP.
Step 1: Protocol Registration
In net/dccp/ipv4.c, DCCP first defines a proto structure (for socket layer operations) and a net_protocol structure (for network layer reception):
static struct proto dccp_v4_prot = {
.name = "DCCP",
.owner = THIS_MODULE,
.close = dccp_close,
.connect = dccp_v4_connect,
.disconnect = dccp_disconnect,
.init = dccp_v4_init_sock, // 关键回调
.sendmsg = dccp_sendmsg,
.recvmsg = dccp_recvmsg,
. . .
};
static const struct net_protocol dccp_v4_protocol = {
.handler = dccp_v4_rcv, // 收包入口
.err_handler = dccp_v4_err,
.no_policy = 1,
.netns_ok = 1,
};
Then, in dccp_v4_init(), it hooks itself into the kernel's protocol list:
static int __init dccp_v4_init(void)
{
int err = proto_register(&dccp_v4_prot, 1); // 注册 proto
if (err != 0)
goto out;
// 注册到 IP 层,告诉内核:IPPROTO_DCCP 这种包给我处理
err = inet_add_protocol(&dccp_v4_protocol, IPPROTO_DCCP);
if (err != 0)
goto out_proto_unregister;
...
}
Step 2: Socket Initialization
When user space calls socket(AF_INET, SOCK_DCCP, ...), the kernel eventually routes it to dccp_v4_init_sock():
static int dccp_v4_init_sock(struct sock *sk)
{
static __u8 dccp_v4_ctl_sock_initialized;
// 调用通用的 DCCP 初始化逻辑
int err = dccp_init_sock(sk, dccp_v4_ctl_sock_initialized);
if (err == 0) {
if (unlikely(!dccp_v4_ctl_sock_initialized))
dccp_v4_ctl_sock_initialized = 1;
// 设置 IPv4 特定的地址族操作回调
inet_csk(sk)->icsk_af_ops = &dccp_ipv4_af_ops;
}
return err;
}
dccp_init_sock() does three main things:
- Initializes fields: Sets the socket state to
DCCP_CLOSED, configures default queue lengths, etc. - Initializes timers: Calls
dccp_init_xmit_timers(). DCCP has timers too; while simpler than TCP's, it hasn't completely shaken off the constraints of "time." - Feature negotiation initialization: Calls
dccp_feat_init(). This is a signature DCCP feature used to negotiate the CCID and other parameters during the handshake.
11.6.4 Receiving Data: From L3 to L4
When a packet arrives from the NIC and gets its IP header stripped off, the IP layer locates dccp_v4_rcv() based on the protocol number.
The structure of this function is strikingly similar to tcp_v4_rcv(). This is no coincidence — Arnaldo Carvalho de Melo, the Linux author of DCCP, deliberately designed it this way to reuse TCP's code logic: if it can be reused, never rewrite it.
Let's walk through the processing flow:
static int dccp_v4_rcv(struct sk_buff *skb)
{
const struct dccp_hdr *dh;
struct sock *sk;
int min_cov;
First, drop the garbage packets:
// 检查包是否发给本机,长度是否合法(最小 12 字节)
if (dccp_invalid_packet(skb))
goto discard_it;
Then, look up the socket. It searches the hash table for the corresponding struct sock using the four-tuple:
sk = __inet_lookup_skb(&dccp_hashinfo, skb,
dh->dccph_sport, dh->dccph_dport);
if (sk == NULL) {
// 找不到 Socket?说明没人监听这个端口,或者这包是野包
goto no_dccp_socket;
}
Next, handle the Minimum Checksum Coverage.
Remember dccph_cscov? If the negotiated Coverage is greater than the current packet's Coverage, it means the packet's checksum scope is insufficient, so we drop it:
// (代码片段简化示意)
min_cov = dccp_sk(sk)->dccps_pcrlen;
if (dh->dccph_cscov < min_cov) {
// 校验和覆盖范围太小,丢弃
goto discard_it;
}
Finally, pass the packet to the socket:
return sk_receive_skb(sk, skb, 1);
}
After this, the data enters the socket's receive queue, waiting for user space to read it via recvmsg().
11.6.5 Sending Data: Construction and Queuing
When user space calls sendmsg() to send data, the kernel lands at dccp_sendmsg().
The core task here is to copy user-space data into a kernel-space SKB, and decide whether to send it now or later based on the CCID.
int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
size_t len)
{
const struct dccp_sock *dp = dccp_sk(sk);
const int flags = msg->msg_flags;
const int noblock = flags & MSG_DONTWAIT;
struct sk_buff *skb;
int rc, size;
long timeo;
Allocate the SKB:
// 分配一个 skb,大小等于数据长度 + 头部预留空间
skb = sock_alloc_send_skb(sk, size, noblock, &rc);
lock_sock(sk); // 加锁,防止并发
if (skb == NULL)
goto out_release;
skb_reserve(skb, sk->sk_prot->max_header); // 预留头部空间
Copy the data:
// 把用户态数据(msg->msg_iov)拷贝进 skb 的数据区
rc = memcpy_fromiovec(skb_put(skb, len), msg->msg_iov, len);
if (rc != 0)
goto out_discard;
Trigger the send: This is the most critical step. DCCP doesn't simply "throw the packet to the IP layer" like UDP does. It has to defer to the CCID:
if (!timer_pending(&dp->dccps_xmit_timer))
dccp_write_xmit(sk);
dccp_write_xmit() does different things depending on the active CCID (whether it's CCID-2 or CCID-3):
- If it's window-based (CCID-2), it calculates how many packets can be sent right now and transmits them immediately.
- If it's rate-based (CCID-3), it uses a send timer to smooth out packet transmission.
Ultimately, all packets slated for transmission go through dccp_transmit_skb(). In this function, the kernel fills in the DCCP header (calculates the checksum, fills in the Seq), and then calls the IP layer's callback:
- For IPv4, it calls
ip_queue_xmit(). - For IPv6, it calls
inet6_csk_xmit().
11.6.6 DCCP and NAT: The Chicken-and-Egg Dilemma
DCCP's design is highly academic and elegant, but when faced with the real-world internet, it hit a massive roadblock: NAT.
Many home routers (NAT devices) don't recognize DCCP at all. In the eyes of NAT, only TCP (protocol number 6) and UDP (protocol number 17) are legitimate citizens; DCCP (protocol number 33) is an undocumented alien and gets dropped immediately.
To solve this problem, RFC 5596 patched DCCP in 2009 by introducing Near Simultaneous Open.
This is somewhat similar to TCP's Simultaneous Open, but in DCCP, it's designed to facilitate NAT traversal (hole punching) techniques. It introduced a new packet type, DCCP-LISTEN, and modified the state machine.
However, this creates a vicious cycle:
- NAT vendors don't support DCCP because nobody uses it.
- Users don't use DCCP because NAT doesn't support it.
This results in a classic chicken-and-egg problem.
To work around this hurdle, DCCP-UDP (RFC 6773) was proposed — essentially stuffing DCCP packets inside UDP packets for transport. Since UDP can go anywhere, the idea is to disguise DCCP as UDP. But this strips DCCP of its purity as an independent transport layer protocol.
This is also why, even today, you almost never see DCCP on the public internet. It exists in the Linux kernel primarily as an experimental, academic protocol, waiting for a revival of "end-to-end IP connectivity" that may never come.
(The remainder of this chapter consists of exercises and a summary, omitted here.)