We made Postgres writes faster, but it broke replication

https://news.ycombinator.com/rss Hits: 11
Summary

We Made Postgres Writes Faster, but it Broke Replication By Stu Hood, Ming Ying, Mathew Pregasen, and Olive Ratliff on June 30, 2025 When we built pg_search, the Postgres extension for search and analytics, write throughput was a priority. To be an effective alternative to Elasticsearch we needed to support high ingest workloads in real time. This is because many Elasticsearch use cases — like real-time dashboards, e-commerce search, and recommendation systems — involve continuous writes that must be indexed and made searchable immediately. In a vanilla Postgres implementation, full-text search is backed by a B-tree or GIN (Generalized Inverted Index) structure. These indexes are good for relatively fast lookups, but they aren’t so fast for writes. We opted for a data structure optimized for writes: a Log-Structured Merge (LSM) tree. That was great for write throughput, but it broke Postgres replication! Specifically, it broke physical replication, one of the two mechanisms that allow Postgres to replicate data from a primary node across one or more read replicas. The other mode is logical replication, which sends individual row changes to replicas instead of copying the database byte-for-byte. It turned out that Postgres's out-of-the-box support for physical replication, built on Write-Ahead Log (WAL) shipping, isn't quite enough for an advanced data structure like an LSM tree to be replication-safe. We were surprised to learn this, so we decided to write up our experience and describe how we fixed the problem. In this post, we'll do a deep dive into: What is an LSM tree? What it means for an LSM tree to be replication-safe How Postgres' WAL shipping guarantees physical consistency Why atomic logging was necessary for logical consistency How we leveraged a little-known but powerful Postgres setting called hot_standby_feedback What is an LSM Tree? A Log-Structured Merge Tree (LSM tree) is a write-optimized data structure commonly used in systems like RocksDB and Cas...

First seen: 2025-07-21 14:37

Last seen: 2025-07-22 00:41