When the Power Fails: How WAL Guarantees Your Data Survives Every Crash

In the late 1970s, Jim Gray and his colleagues at IBM Research were working on transaction processing systems that needed to guarantee data integrity even when power failed mid-operation. His solution was elegant in its simplicity: never write data to the main store until you’ve first written it to a log. This principle, formalized in his 1981 paper “The Transaction Concept: Virtues and Limitations,” became known as Write-Ahead Logging, and decades later, it remains the foundation of every major database system. ...

11 min · 2257 words

How LSM Trees Write 10x Faster Than B-Trees: The Hidden Architecture Behind Modern Databases

In 1996, Patrick O’Neil and his colleagues at the University of Massachusetts Boston published a paper describing a data structure that would take nearly a decade to find widespread adoption. The Log-Structured Merge-Tree (LSM-Tree) was designed to solve a problem that barely existed at the time: how to efficiently index data when writes vastly outnumber reads. Today, LSM-Trees power the storage engines of Cassandra, RocksDB, LevelDB, HBase, InfluxDB, and countless other systems that handle massive write throughput. Yet the fundamental insight remains surprisingly misunderstood: LSM-Trees don’t just “write faster”—they fundamentally restructure how data moves from memory to disk. ...

10 min · 2020 words

When Serializable Is Not Serializable: The Hidden World of Transaction Isolation Levels

In 2012, a team of database researchers published a paper that would reshape how engineers think about transaction isolation. The paper, titled “Serializable Snapshot Isolation in PostgreSQL,” described a subtle anomaly that had been hiding in plain sight for decades: two transactions could both execute correctly in isolation, yet produce an incorrect result when run concurrently. The anomaly wasn’t a dirty read or a phantom—it was something called write skew, and it exposed a fundamental truth about the ANSI SQL isolation levels: the names don’t always mean what developers think they mean. ...

13 min · 2711 words

Why Your Database Writes Are Slow: The B+ Tree Problem LSM Trees Were Built to Solve

When you insert a row into a database, what actually happens to that data? If you’re using a traditional relational database, the answer involves random disk I/O, page splits, and a fundamental mismatch between how applications write data and how storage media work best. In 1996, Patrick O’Neil and his colleagues at UMass Boston and Digital Equipment Corporation identified this problem and proposed a solution that would eventually power some of the world’s largest databases. ...

13 min · 2715 words

Why Databases Choose B+ Trees Over Hash Tables and B-Trees

When you create an index on a database table, have you ever wondered what data structure actually powers it? The answer is almost always a B+ tree. Not a hash table. Not a regular B-tree. Not a binary search tree. B+ trees have been the default index structure in nearly every major relational database for over five decades—MySQL, PostgreSQL, Oracle, SQL Server, and SQLite all use them. This isn’t coincidence or legacy inertia. It’s the result of fundamental trade-offs between disk I/O patterns, range query efficiency, and storage utilization. ...

12 min · 2498 words