Computer Architecture

When Two Cores See Different Realities: The Cache Coherence Problem MESI Was Built to Solve

In 1984, researchers at the University of Illinois published a paper that would quietly shape every multicore processor built since. Mark Papamarcos and Janak Patel proposed a solution to a problem that didn’t seem urgent at the time—how to keep data consistent when multiple processors each have their own cache. Today, with CPUs packing dozens of cores, their invention runs on billions of devices, silently orchestrating a dance of state transitions every time one core writes to memory and another needs to read it. ...

How RAID Actually Survives Disk Failures: The Mathematics Behind Your Data's Safety Net

In 1987, three researchers at the University of California, Berkeley published a paper that would fundamentally change how we think about data storage. David Patterson, Garth Gibson, and Randy Katz proposed something counterintuitive: instead of buying one expensive, reliable disk drive, why not combine many cheap, unreliable ones into a system more reliable than any single drive could ever be? They called it RAID—Redundant Arrays of Inexpensive Disks. The insight was mathematical, not magical. By distributing data across multiple drives with carefully calculated redundancy, you could achieve both performance and reliability that would be impossible with a single disk. The key was a simple operation that most programmers learn in their first computer science course: XOR. ...

Why malloc Is Not Just malloc: The Hidden Architecture of Memory Allocators

When a C program calls malloc(1024), what actually happens? The programmer might assume the operating system finds 1024 bytes of free memory and returns a pointer. The reality is far more complex. Modern memory allocators are sophisticated pieces of software that manage virtual memory, minimize fragmentation, optimize for multi-core CPUs, and make trade-offs between speed and memory efficiency that can affect application performance by orders of magnitude. The default allocator on Linux systems—ptmalloc, part of glibc—has evolved over decades. Facebook replaced it with jemalloc. Google developed tcmalloc. Microsoft created mimalloc. Each makes different architectural choices that matter for different workloads. Understanding these choices explains why switching allocators can speed up a database by 30% or reduce memory consumption by half. ...

What Your CPU Does When It Doesn't Know What Comes Next: The Hidden Science of Branch Prediction

The most famous question on Stack Overflow isn’t about JavaScript frameworks or Git commands. It’s about why sorting an array makes code run faster. The answer—branch prediction—revealed something most programmers never consider: your CPU spends considerable effort guessing what your code will do next. In 2012, a user named GManNickG asked why processing a sorted array took 11.777 seconds while the same operation on unsorted data took only 2.352 seconds—a 5x difference for identical computation. The accepted answer, written by user Mysticial, became the highest-voted answer in Stack Overflow history. It wasn’t about algorithms. It was about how processors handle uncertainty. ...

Why Your SSD Will Outlive Your Hard Drive: The Engineering Behind Flash Memory

When you save a file to a solid-state drive, something happens at the atomic level that your hard drive could never accomplish. Electrons tunnel through an insulating barrier and become trapped in a microscopic cage, where they can remain for years without power. This is the fundamental magic of flash memory—and understanding it explains everything from why SSDs slow down when full to why they eventually wear out. The first commercial flash memory chip appeared in 1988, but the technology traces back to a 1967 paper by Dawon Kahng and Simon Sze at Bell Labs. They proposed storing charge in a transistor’s floating gate—a conductive layer completely surrounded by insulator. Nearly six decades later, every NAND flash cell operates on this same principle, even as manufacturers have stacked cells hundreds of layers high and squeezed multiple bits into each one. ...