The case for an iceberg-native database

https://news.ycombinator.com/rss Hits: 5

Summary

TLDR; We launched a new product called WarpStream Tableflow that is the easiest, cheapest, and most flexible way to convert Kafka topic data into Iceberg tables with low latency, and keep them compacted. If you’re already familiar with the challenges of converting Kafka topics into Iceberg tables, feel free to skip ahead to our solution in the “What if we had a magic box?” section.Apache Iceberg and Delta Lake are table formats that provide the illusion of a traditional database table on top of object storage, including schema evolution, concurrency control, and partitioning that is transparent to the user. These table formats allow many open-source and proprietary query engines and data warehouse systems to operate on the same underlying data, which prevents vendor lock-in and allows using best-of-breed tools for different workloads without making additional copies of that data that are expensive and hard to govern.Table formats are really cool, but they're just that, formats. Something or someone has to actually build and maintain them. As a result, one of the most debated topics in the data infrastructure space right now is the best way to build Iceberg and Delta Lake tables from real-time data stored in Kafka.The Problem With Apache SparkThe canonical solution to this problem is to use Spark batch jobs.This is how things have been done historically, and it’s not a terrible solution, but there are a few problems with it:You have to write a lot of finicky code to do the transformation, handle schema migrations, etc.Latency between data landing in Kafka and the Iceberg table being updated is very high, usually hours or days depending on how frequently the batch job runs if compaction is not enabled (more on that shortly). This is annoying if we’ve already gone through all the effort of setting up real-time infrastructure like Kafka.Apache Spark is an incredibly powerful, but complex piece of technology. For companies that are already heavy users of Spark, this is n...

First seen: 2025-10-08 08:13

Last seen: 2025-10-08 12:13

Read Full Article More from this Source

The case for an iceberg-native database

Summary

Related News

Formal Reasoning [pdf]

You Already Have a Git Server

ICE Will Use AI to Surveil Social Media

How I turned Zig into my favorite language to write network programs in

Resource use matters, but material footprints are a poor way to measure it