B-Trees: Why Every Database Uses Them

https://news.ycombinator.com/rss Hits: 1
Summary

Your database has 10 million user records. You query for one user by ID. The database returns the result in 3 milliseconds. How?If the database scanned all 10 million records sequentially, it would take seconds, maybe minutes. But databases don’t scan. They use an index—and that index is almost certainly a B-Tree.Every major database system uses B-Trees: MySQL InnoDB, PostgreSQL, SQLite, MongoDB’s WiredTiger storage engine, Oracle Database, Microsoft SQL Server. It’s not a coincidence. B-Trees solve a fundamental problem: how to efficiently find data on disk when disk access is thousands of times slower than memory access.This is the story of why binary search trees fail on disk, how B-Trees fix that problem, and why after 50+ years, we’re still using them.Let’s start with what doesn’t work: binary search trees (BSTs) on disk.In memory, binary search trees are excellent. Each node stores one key and has two children (left and right). Keys in the left subtree are smaller, keys in the right subtree are larger. Finding a key takes O(log₂ n) comparisons.Figure 1: Binary search tree with 7 nodes. Finding key 11 takes 3 comparisons: 15 → 7 → 11.For 1 million records, a balanced BST has height log₂(1,000,000) ≈ 20. That’s 20 comparisons to find any record.In memory, this is fast. Each comparison is a pointer dereference (~0.0001 milliseconds on modern CPUs). Total lookup: 0.002 ms.On disk, this is catastrophic. Here’s why:The smallest unit of disk access is a block (typically 4 KB to 16 KB). To read a single byte from disk, you must read the entire block containing it.Disk access times:Disk is 100-100,000x slower than RAM.With a BST on disk, each node is stored in a separate disk block. Traversing from parent to child requires a disk seek.For 1 million records:That’s acceptable for SSDs, but terrible for HDDs. And it gets worse as the tree grows.For 1 billion records:The fundamental problem: BST fanout is too low (only 2 children per node). We need more children per node t...

First seen: 2025-11-24 04:20

Last seen: 2025-11-24 04:20