Achieveing lower latencies with S3 object storage

https://news.ycombinator.com/rss Hits: 2
Summary

Over the past 19 years (S3 was launched on March 14th 2006, as the first public AWS service), object storage has become the gold standard for storing large amounts of data in the cloud. It's reliable, reasonably cheap, reasonably fast, and requires no special incantations to deploy. Best of all, it offers a straightforward HTTP-based interface with clear semantics (see NFS horrors).Reading this, you might come up with an intriguing idea—why not run your entire database on object storage? You're in good company: Snowflake, Warpstream, Aurora, Neon, SlateDB, Pinecone, TurboPuffer, Turso, RocksDB’s tiered storage, Delta Lake, Iceberg, and many others already do that in some form or another.If you start building a system that reads a lot of data from object storage, you might find yourself spending a lot of time just reading data, with costs going up and users complaining about high latency, all the while your monitoring tells you you’re actually getting great latency from your cloud provider — so what is this about?In this post I’ll try and highlight some common issues I’ve run into, in the hope of helping you understand how to avoid them by designing and building better systems around object storage.Tail latencies will eat you upRoughly speaking, the latency of systems like object storage tend to have a lognormal distribution (please come and teach me on BlueSky, always happy to learn), and while it mostly behaves nicely and stays within reasonable latency, it has a tendency to have very long tail. These are large and complex systems, and one request might touch dozens of machines, network switches and disks, and every one of them might have a transient issue affecting overall latency.The latency of each individual action compounds, and as an almost unavoidable fact of distributed systems, sometimes we’ll get slow operations. It might be a faulty network switch, a garbage collection stop in a process, or just a busy disk. That fact is also true of our client systems -...

First seen: 2025-04-19 13:20

Last seen: 2025-04-19 14:20