When One Second Crashed the Internet: The Hidden Complexity of Timekeeping

At 23:59:60 UTC on June 30, 2012, a second was added to the world’s clocks. Within minutes, Reddit, LinkedIn, Mozilla, Gawker, and dozens of other major websites had crashed. Their servers were running at 100% CPU, locked in tight loops that made them completely unresponsive. The culprit wasn’t a cyberattack or a hardware failure—it was the handling of a single extra second. The Linux kernel’s high-resolution timer subsystem, called hrtimer, had gotten confused by the leap second. When the system clock stepped backward by one second, sleeping processes were awakened prematurely, flooding the CPU with activity. Java-based applications like Cassandra—the database powering Reddit—were particularly affected. The site was offline for over an hour. ...

12 min · 2481 words

The Hidden Memory Tax: Why Your 80GB GPU Still Can't Handle Long-Context LLMs

In March 2024, a team of researchers attempted to deploy a 70-billion parameter language model on a single NVIDIA H100 GPU with 80GB of VRAM. The model weights alone consumed approximately 140GB in FP16—already exceeding their hardware capacity. But even after applying 4-bit quantization to squeeze the weights down to ~40GB, the system still ran out of memory when processing contexts beyond 8,000 tokens. The culprit wasn’t the model size. It was something far more insidious: the KV cache. ...

9 min · 1846 words