Computer Science

When Serializable Is Not Serializable: The Hidden World of Transaction Isolation Levels

In 2012, a team of database researchers published a paper that would reshape how engineers think about transaction isolation. The paper, titled “Serializable Snapshot Isolation in PostgreSQL,” described a subtle anomaly that had been hiding in plain sight for decades: two transactions could both execute correctly in isolation, yet produce an incorrect result when run concurrently. The anomaly wasn’t a dirty read or a phantom—it was something called write skew, and it exposed a fundamental truth about the ANSI SQL isolation levels: the names don’t always mean what developers think they mean. ...

When 0.1 + 0.2 ≠ 0.3: The IEEE 754 Standard That Broke Your Calculations

Type 0.1 + 0.2 into any browser console, Python REPL, or JavaScript runtime. The answer comes back as 0.30000000000000004. This isn’t a bug. It’s not an error in your programming language. It’s the inevitable consequence of a fundamental tension: humans count in base 10, but computers count in base 2. The IEEE 754 floating-point standard, adopted in 1985, unified how computers represent decimal numbers. Before this standard, different machines handled floating-point arithmetic differently—code that worked on one system could produce completely different results on another. William Kahan, the primary architect of IEEE 754, designed a system that traded perfect precision for predictability. Every programmer would get the same answer, even if that answer wasn’t mathematically exact. ...

How Virtual Memory Actually Works: The Invisible Layer That Makes Every Program Think It Has the Entire RAM

In 1962, the Atlas computer at the University of Manchester faced an impossible problem. Programs were growing larger than available memory, and programmers spent countless hours manually shuffling data between main memory and drum storage. The solution they invented—virtual memory—would become one of the most consequential abstractions in computing history. Today, every program you run believes it has access to a massive, contiguous block of memory starting at address zero. None of this is real. ...

When One Letter Changes Everything: The Algorithms Behind Every Spell Checker

In 1961, Les Earnest at MIT built the first spell checker as part of a cursive handwriting recognition system. His program used a list of just 10,000 common words, comparing each handwritten recognition result against the dictionary. The system was rudimentary, but it established a pattern that would repeat for decades: spell checking is fundamentally a string matching problem, and the challenge lies in making it fast enough to be useful. ...

How JPEG Compression Actually Works: The Mathematics Behind Every Photo

In September 1992, a committee called the Joint Photographic Experts Group published a standard that would fundamentally change how humanity stores and shares images. The JPEG format, based on the discrete cosine transform (DCT), made digital photography practical by reducing file sizes by a factor of 10 while maintaining acceptable visual quality. Three decades later, JPEG remains the most widely used image format in the world, with billions of images created daily. ...

How VPNs Actually Work: From Tunneling Protocols to the Hidden Latency Costs

In 2019, a network engineer at a major financial institution noticed something odd. Their newly deployed VPN, configured with OpenVPN over TCP, was causing a 40% drop in throughput for database replication traffic. The latency between their New York and London data centers had jumped from 75ms to over 200ms. After weeks of troubleshooting, they discovered the culprit wasn’t bandwidth or hardware—it was TCP-over-TCP meltdown, a fundamental interaction between the VPN protocol and the underlying transport layer. ...

How Bloom Filters Store 100 Million Items in 120 MB While Never Missing a Match

In 1970, Burton Howard Bloom faced a problem that would feel familiar to any modern software engineer working with large datasets. He needed to check whether words required special hyphenation rules, but storing 500,000 dictionary entries in memory was prohibitively expensive. His solution—a data structure that uses dramatically less space than any traditional approach—became one of the most widely deployed probabilistic data structures in computing history. The insight was radical: what if you could trade certainty for space? A Bloom filter will never tell you an item is absent when it’s actually present (no false negatives), but it might occasionally claim an item exists when it doesn’t (false positives). For many applications, this trade-off is not just acceptable—it’s transformative. ...

When One Bit Can Kill: How Error Correction Codes Save Your Data Every Day

In 1947, a mathematician at Bell Labs faced a frustrating problem. Richard Hamming was using the Model V relay computer to perform calculations, and every weekend the machine would grind to a halt when it encountered an error. The computer would simply stop, flashing its error lights, and Hamming would have to wait until Monday for the operators to reload his program. One Friday evening, staring at the silent machine, he asked himself a question that would change computing forever: “Why can’t the computer correct its own mistakes?” ...

What Makes ZIP Files Shrink: The Mathematics Behind Lossless Compression

In 1952, a graduate student at MIT named David Huffman faced a choice: write a term paper or take a final exam. His professor, Robert Fano, had assigned a paper on finding the most efficient binary code—a problem that had stumped both Fano and Claude Shannon, the father of information theory. Huffman, unable to prove any existing codes were optimal, was about to give up and start studying for the final. Then, in a flash of insight, he thought of building the code tree from the bottom up rather than the top down. The result was optimal, elegant, and would become one of the most widely used algorithms in computing history. ...

Why Unicode Has Three Encoding Schemes: The Engineering Trade-offs Behind UTF-8, UTF-16, and UTF-32

On September 2, 1992, Ken Thompson sat in a New Jersey diner with Rob Pike and sketched an encoding scheme on a placemat. That dinner napkin design became UTF-8—the encoding that now powers 99% of the web. But UTF-8 is just one of three encoding schemes for Unicode, alongside UTF-16 and UTF-32. Why does Unicode need three different ways to represent the same characters? The answer reveals fundamental trade-offs in computer systems design: space efficiency versus processing simplicity, backward compatibility versus clean architecture, and the messy reality of historical decisions that cannot be undone. ...

From HTML to Pixels: The 100-Millisecond Journey Through the Browser Rendering Pipeline

In 1993, when the first graphical web browser displayed a simple HTML document, the rendering process was straightforward: parse markup, apply basic styles, display text. Today’s browsers execute a far more complex sequence involving multiple intermediate representations, GPU acceleration, and sophisticated optimization strategies. Understanding this pipeline explains why some pages render in under 100 milliseconds while others struggle to maintain 60 frames per second during animations. The browser rendering pipeline consists of five primary stages: constructing the Document Object Model (DOM), building the CSS Object Model (CSSOM), creating the render tree, calculating layout, and painting pixels to the screen. Each stage transforms data from one representation to another, and bottlenecks in any stage cascade through the entire process. ...

How Search Engines Find a Needle in a 400 Billion-Haystack

When you type a query and hit enter, results appear in under half a second. Behind that instant response lies an engineering marvel: a system that must search through hundreds of billions of documents, score each one for relevance, and return the best matches—all before you can blink. The numbers are staggering. Google’s index contains approximately 400 billion documents according to testimony from their VP of Search during the 2023 antitrust trial. The index itself exceeds 100 million gigabytes. Yet the median response time for a search query remains under 200 milliseconds. ...