From 8-Second Pauses to Sub-Millisecond: The 60-Year Evolution of Garbage Collection

In 1959, John McCarthy was building Lisp at MIT when he encountered a problem that would define decades of programming language design. Programs in Lisp created and destroyed linked structures constantly—lists within lists, functions returning functions, recursive structures that no programmer could feasibly track manually. McCarthy’s solution was to make memory management automatic. He called it “garbage collection,” dedicating just over a page in his seminal paper to describe a mark-and-sweep algorithm that would free programmers from the burden of explicit deallocation. ...

13 min · 2766 words

How Virtual Memory Actually Works: The Invisible Layer That Makes Every Program Think It Has the Entire RAM

In 1962, the Atlas computer at the University of Manchester faced an impossible problem. Programs were growing larger than available memory, and programmers spent countless hours manually shuffling data between main memory and drum storage. The solution they invented—virtual memory—would become one of the most consequential abstractions in computing history. Today, every program you run believes it has access to a massive, contiguous block of memory starting at address zero. None of this is real. ...

12 min · 2372 words

When Zero-Copy Isn't Zero: The Hidden Copies in Your "Efficient" Code

A file sits on disk. Your application reads it and sends it over the network. Simple enough—but behind this mundane operation hides one of computing’s most persistent performance bottlenecks. In a traditional I/O path, that single file traverses through four distinct memory copies before reaching the network interface. The kernel reads data from disk into a kernel buffer via DMA. The read() system call copies it to user space. The write() system call copies it back to a kernel socket buffer. Finally, DMA transfers it to the NIC. Each copy consumes CPU cycles, memory bandwidth, and cache space. ...

8 min · 1585 words

Why malloc Is Not Just malloc: The Hidden Architecture of Memory Allocators

When a C program calls malloc(1024), what actually happens? The programmer might assume the operating system finds 1024 bytes of free memory and returns a pointer. The reality is far more complex. Modern memory allocators are sophisticated pieces of software that manage virtual memory, minimize fragmentation, optimize for multi-core CPUs, and make trade-offs between speed and memory efficiency that can affect application performance by orders of magnitude. The default allocator on Linux systems—ptmalloc, part of glibc—has evolved over decades. Facebook replaced it with jemalloc. Google developed tcmalloc. Microsoft created mimalloc. Each makes different architectural choices that matter for different workloads. Understanding these choices explains why switching allocators can speed up a database by 30% or reduce memory consumption by half. ...

11 min · 2232 words