Garbage Collection of Object Storage at Scale

https://news.ycombinator.com/rss Hits: 9
Summary

Over the last 10 years, I’ve built several distributed systems on top of object storage, with WarpStream being the most recent. One consistent factor across all of these systems is how much time we spent solving what seems like a relatively straightforward problem: removing files from object storage that had been logically deleted either due to data expiry or compaction.I discussed this in more detail in “The Case for Shared Storage” blog post, but to briefly recap: every shared storage system I’ve ever built has looked something like this:Clients interact with stateless nodes (that are perhaps split into different “roles”). The stateless nodes abstract over a shared storage backend (like object storage) and a strongly-consistent metadata store to create some kind of logical abstraction, in WarpStream’s case: the Apache Kafka protocol.There are a few ways in which a WarpStream file can end up logically deleted in the metadata store, and therefore needs to be physically deleted from the object store:All the data in the file has expired due to the configured topic TTLs: ↴All of the data in the file is deleted due to explicit topic deletions: ↴The file was logically deleted by a compaction in which this particular file participated as an input: ↴In the rest of this post, I’ll go over a few different ways to solve this problem by using a delayed queue, async reconciliation, or both. But before I introduce what I think the best ways to solve this problem are, let’s first go over a few approaches that seem obvious, but don’t work well in practice like bucket policies and synchronous deletion.Why Not Just Use a Bucket Policy?The easiest way to handle object storage cleanup would be to use a bucket policy with a configurable TTL. For example, we could configure an object storage policy that automatically deletes files that are more than 7 days old. For simple or time-series oriented systems, this is often a good solution.However, for more complex systems like WarpStream, wh...

First seen: 2025-05-13 21:32

Last seen: 2025-05-14 05:33