How LSM Trees Write 10x Faster Than B-Trees: The Hidden Architecture Behind Modern Databases
In 1996, Patrick O’Neil and his colleagues at the University of Massachusetts Boston published a paper describing a data structure that would take nearly a decade to find widespread adoption. The Log-Structured Merge-Tree (LSM-Tree) was designed to solve a problem that barely existed at the time: how to efficiently index data when writes vastly outnumber reads. Today, LSM-Trees power the storage engines of Cassandra, RocksDB, LevelDB, HBase, InfluxDB, and countless other systems that handle massive write throughput. Yet the fundamental insight remains surprisingly misunderstood: LSM-Trees don’t just “write faster”—they fundamentally restructure how data moves from memory to disk. ...