Power of Two Choices

Imagine you’re running a service with 10 servers, each capable of handling 1,000 requests per second. You set up a round-robin load balancer—simple, elegant, fair. Every server gets its turn in sequence. Traffic flows smoothly until suddenly, at 2 AM, your monitoring alerts start screaming. Half your servers are overwhelmed, queues are growing, latencies are spiking, and the other half of your servers are nearly idle. What went wrong? The servers weren’t identical. Three of them were newer machines with faster CPUs and more memory. Three were legacy boxes running older hardware. The round-robin algorithm, in its mechanical fairness, sent exactly the same number of requests to a struggling legacy server as it did to a powerful new one. The legacy servers couldn’t keep up, requests piled up in their queues, and eventually they started timing out—cascading into a partial outage that woke up half your engineering team. ...