When 30% of CPU Time Disappears Into JSON.parse(): The Hidden Cost of Human-Readable Serialization

A Go service at a high-traffic company began experiencing mysterious CPU spikes. The flamegraphs revealed something unexpected: 30-40% of CPU time was spent inside json.Marshal and json.Unmarshal. No database queries were slow. No algorithms were inefficient. The serialization layer alone was consuming nearly half the computational budget. This isn’t an anomaly. At scale, the choice of serialization format becomes a first-order performance concern. The difference between JSON and binary formats isn’t a few percentage points—it’s often 5-7x in throughput and 2-3x in payload size. ...

8 min · 1551 words

When One Slow Service Took Down an Entire Region: The Circuit Breaker Pattern Explained

On September 20, 2015, Amazon DynamoDB in US-East-1 went dark for over four hours. The root cause wasn’t a hardware failure or a cyberattack—it was a feedback loop. Storage servers couldn’t retrieve their partition assignments from a metadata service, so they retried. The metadata service became overwhelmed. More timeouts. More retries. More overload. Engineers eventually had to firewall the metadata service from storage servers entirely, effectively taking DynamoDB offline to break the cycle. ...

14 min · 2971 words