In October 1986, something alarming happened on the Internet. Data throughput between Lawrence Berkeley Laboratory and UC Berkeley—sites separated by just 400 yards and two network hops—dropped from 32 Kbps to 40 bps. That is not a typo. The throughput collapsed by a factor of 1000. The Internet was experiencing its first “congestion collapse,” and nobody knew how to fix it.
Van Jacobson, then at Lawrence Berkeley Laboratory, became fascinated by this catastrophic failure. His investigation led to a landmark 1988 paper titled “Congestion Avoidance and Control,” which introduced the fundamental algorithms that still govern how data flows through the Internet today. The story of TCP congestion control—from those desperate early fixes to modern algorithms like CUBIC and BBR—is really a story about how we learned to share a finite resource without a central coordinator.
The Conservation Principle
Jacobson’s key insight was deceptively simple: a TCP connection should obey a “conservation of packets” principle. If a connection is running stably with a full window of data in transit, a new packet should not enter the network until an old one leaves. Think of it like a pipe filled with water. If you pour more water in one end than can flow out the other, pressure builds until something bursts.
In TCP, that “bursting” manifests as dropped packets. When routers receive more packets than they can forward, their buffers fill up and eventually overflow. The dropped packets trigger retransmissions, which add even more traffic to an already congested network. This positive feedback loop is what caused the 1986 collapse.
Jacobson identified three ways this conservation principle could fail:
- The connection never reaches equilibrium (slow start fixes this)
- The sender injects packets before old ones exit (timer fixes this)
- The equilibrium cannot be reached due to resource limits (congestion avoidance fixes this)
Slow Start: Learning to Walk Before Running
When a TCP connection first starts, it has no idea how much bandwidth is available. Sending a full window of data immediately would likely overwhelm the network. The solution is “slow start”—a somewhat misleading name, since the window actually grows exponentially.
Here’s how it works: the sender begins with a congestion window (cwnd) of one segment. Each time an acknowledgment arrives, cwnd increases by one segment. After one round-trip time (RTT), the window has doubled to two segments. After two RTTs, it is four segments. This exponential growth continues until a packet loss occurs or the sender reaches the slow start threshold (ssthresh).
The beauty of this approach is its self-clocking nature. Acknowledgments can only return as fast as the network can deliver them, so the sender automatically adapts to the bottleneck bandwidth. Jacobson called this “self-clocking”—the network itself provides the rhythm for packet transmission.
sequenceDiagram
participant S as Sender
participant N as Network
participant R as Receiver
Note over S,R: RTT 1: cwnd = 1
S->>N: Send 1 packet
N->>R: Deliver packet
R->>S: ACK
Note over S,R: RTT 2: cwnd = 2
S->>N: Send 2 packets
N->>R: Deliver packets
R->>S: 2 ACKs
Note over S,R: RTT 3: cwnd = 4
S->>N: Send 4 packets
N->>R: Deliver packets
R->>S: 4 ACKs
Note over S,R: Exponential growth continues...
AIMD: The Mathematics of Fairness
Once slow start gets the connection up to speed, a different algorithm takes over: Additive Increase, Multiplicative Decrease (AIMD). This is the congestion avoidance phase, and its mathematical properties are remarkable.
During normal operation, AIMD increases the congestion window by roughly one segment per RTT. This is additive increase—a steady, conservative probing for more bandwidth. But when packet loss signals congestion, the window is cut in half. This is multiplicative decrease—a drastic response that quickly relieves pressure on the network.
Why this particular combination? Chiu and Jain proved in 1989 that AIMD converges to fair bandwidth allocation. Consider two flows sharing a bottleneck link. When both increase additively, they creep toward their fair share. When congestion occurs and both halve their windows, they maintain their relative proportions. Over many cycles of increase and decrease, they converge to equal shares of the available bandwidth.
The mathematics works out beautifully: AIMD(α, β) with parameters α=1 and β=0.5 (the original TCP values) guarantees convergence to fairness regardless of the starting window sizes. No central coordinator required. No explicit signaling from routers. Just two simple rules that every TCP endpoint follows.
From Tahoe to Reno: Handling Multiple Losses
The original TCP congestion control, retroactively named “Tahoe” after the 4.3BSD Tahoe release, had a problem with multiple packet losses. When three duplicate acknowledgments indicated a lost packet, Tahoe would retransmit the missing segment but then wait for a timeout, halving the window and re-entering slow start.
“Reno,” introduced in 4.3BSD Reno, added “fast recovery.” Instead of waiting for a timeout after a fast retransmit, Reno keeps the pipe full by maintaining a reduced window. This works well for a single packet loss but breaks down when multiple packets from the same window are lost.
“NewReno” fixed this by recognizing “partial acknowledgments”—ACKs that acknowledge some but not all of the data outstanding when fast recovery began. NewReno stays in fast recovery until all outstanding data is acknowledged, retransmitting one lost packet per RTT.
CUBIC: The Cubic Revolution
By the mid-2000s, the Internet had changed dramatically. High-speed, long-distance networks with large bandwidth-delay products (BDP) were common. A 10 Gbps link with 100 ms RTT has a BDP of about 83,000 packets. Standard TCP, increasing the window by one packet per RTT, would take hours to fully utilize such a link.
CUBIC, developed by Injong Rhee and colleagues at North Carolina State University, replaced the linear increase of TCP Reno with a cubic function:
$$W(t) = C(t-K)^3 + W_{max}$$Where:
- $C$ is a scaling constant (typically 0.4)
- $t$ is the elapsed time since the last congestion event
- $K$ is the time to reach $W_{max}$ without further losses
- $W_{max}$ is the window size before the last reduction
The cubic function has a fascinating property: it grows slowly near $W_{max}$, plateaus briefly, then accelerates. This “concave then convex” profile makes CUBIC stable around the saturation point. Most samples cluster near $W_{max}$, promoting high utilization without the aggressive probing that caused bursty losses in earlier algorithms.
CUBIC also changed the multiplicative decrease factor from 0.5 to 0.7. Instead of halving the window on loss, CUBIC reduces it by 30%. This improves throughput on high-BDP networks at the cost of slightly slower convergence when new flows arrive.
Since Linux kernel 2.6.19 (released in 2006), CUBIC has been the default congestion control algorithm. It is also the default in modern QUIC implementations.
BBR: A Paradigm Shift
All the algorithms discussed so far share a fundamental assumption: packet loss equals congestion. But this assumption, reasonable in 1988 when buffers were small, became problematic as memory prices plummeted and routers grew massive buffers.
Google’s BBR (Bottleneck Bandwidth and Round-trip propagation time), introduced in 2016, takes a completely different approach. Instead of reacting to loss, BBR directly measures two parameters:
- BtlBw: The bottleneck bandwidth—the maximum rate at which data can traverse the path
- RTprop: The round-trip propagation time—the minimum RTT when queues are empty
These two parameters define the “bandwidth-delay product” (BDP = BtlBw × RTprop), which is the optimal operating point identified by Leonard Kleinrock in 1979. At this point, throughput is maximized and delay is minimized.
The challenge is that BtlBw and RTprop cannot be measured simultaneously. To measure bandwidth, you need to fill the pipe, which creates a queue that obscures the true RTT. To measure RTT, you need an empty queue, which means underutilizing the bandwidth.
BBR solves this with a state machine that alternates between probing:
stateDiagram-v2
[*] --> Startup
Startup --> Drain : BtlBw estimated
Drain --> ProbeBW : Queue drained
ProbeBW --> ProbeRTT : RTprop stale
ProbeRTT --> ProbeBW : RTProp measured
ProbeBW --> Startup : BtlBw increased
- Startup: Exponential increase to discover BtlBw (using gain 2/ln2 ≈ 2.89)
- Drain: Clear the queue created during startup
- ProbeBW: Cycle through pacing gains (1.25×, 0.75×, 1×) to measure bandwidth while maintaining high utilization
- ProbeRTT: Briefly reduce window to 4 packets to measure RTprop
The results are striking. In Google’s B4 WAN deployment, BBR achieved 2-25× higher throughput than CUBIC. On YouTube’s edge servers, BBR reduced median RTT by 53% globally and over 80% in developing regions.
The Bufferbloat Problem
To understand why BBR matters, consider “bufferbloat”—the latency explosion caused by oversized router buffers. Modern routers often have megabytes of buffer space. When loss-based congestion control fills these buffers before backing off, RTTs can balloon from milliseconds to seconds.
This is not theoretical. If you have ever noticed your video call freezing when someone starts a large download, you have experienced bufferbloat. The download’s TCP flow has filled the buffer, delaying all other traffic.
Loss-based algorithms like CUBIC operate at the right edge of the bandwidth-limited region—full utilization, but with a large standing queue. BBR operates at the left edge, near the BDP line. Same throughput, minimal queue, dramatically lower latency.
Fairness and Coexistence
One concern with any new congestion control algorithm is fairness. Will BBR flows steal bandwidth from CUBIC flows? Will the Internet descend into congestion collapse again?
BBR was designed to coexist fairly with loss-based algorithms. During ProbeBW, BBR’s gain cycling naturally yields bandwidth to competing flows. The ProbeRTT state helps establish a fair share through a distributed coordination mechanism: when multiple flows enter ProbeRTT simultaneously (triggered by RTprop expiration), they collectively drain the queue, allowing each to measure a new minimum RTT.
The convergence is not perfect—BBR flows competing with CUBIC on networks with large buffers may grab more than their fair share. BBRv2, released in 2021, addresses this with improved ECN (Explicit Congestion Notification) support and better policer detection.
What This Means for You
Every time you stream a video, load a web page, or download a file, TCP congestion control is working behind the scenes. The algorithm choice affects your experience in ways you might not expect:
- High latency networks (satellite, intercontinental): BBR can achieve throughput 100× higher than loss-based algorithms
- Lossy networks (Wi-Fi, cellular): Loss-based algorithms treat random loss as congestion; BBR ignores it
- Interactive applications (video calls, gaming): BBR’s low-queue operation reduces latency spikes
Most operating systems now allow you to change the congestion control algorithm. On Linux, check /proc/sys/net/ipv4/tcp_congestion_control. On modern systems, you might see cubic as the default, with bbr as an option.
References
- Jacobson, V. (1988). Congestion Avoidance and Control. ACM SIGCOMM ‘88.
- Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., & Scheffenegger, R. (2018). CUBIC for Fast Long-Distance Networks. RFC 8312.
- Cardwell, N., Cheng, Y., Gunn, C., Yeganeh, S. H., & Jacobson, V. (2017). BBR: Congestion-Based Congestion Control. Communications of the ACM, 60(2), 58-66.
- Chiu, D. M., & Jain, R. (1989). Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks. Computer Networks and ISDN Systems, 17, 1-14.
- Floyd, S. (2003). HighSpeed TCP for Large Congestion Windows. RFC 3649.
- Gettys, J., & Nichols, K. (2011). Bufferbloat: Dark Buffers in the Internet. Communications of the ACM, 55(1), 57-65.
- Mathis, M., Semke, J., Mahdavi, J., & Ott, T. (1997). The Macroscopic Behavior of the TCP Congestion Avoidance Algorithm. ACM SIGCOMM Computer Communication Review, 27(3), 67-82.