Why was Apache Kafka created?

https://news.ycombinator.com/rss Hits: 18
Summary

Reading Time: 13 minutesWe talk all the time about what Kafka is, but not so much about why it is the way it is.What better way than to dive into the original motivation for creating Kafka?Circa 2012, LinkedIn’s original intention with Kafka was to solve a data integration problem.LinkedIn used site activity data (e.g. someone liked this, someone posted this) for many things - tracking fraud/abuse, matching jobs to users, training ML models, basic features of the website (e.g who viewed your profile, the newsfeed), warehouse ingestion for offline analysis/reporting and etc.The big takeaway is that many of these activity data feeds are not simply used for reporting, they’re a dependency to the website’s core functionality.As such, they require very robust infrastructure.Their old infrastructure was not robust.It mainly consisted of two pipelines:One was an hourly batch-oriented system designed purely to load data into a data warehouse.Applications would directly publish XML messages of the events (e.g profile view) to an HTTP server. The system would then write these to aggregate files, copy them to ETL servers, parse & transform the XML and finally load it into the warehouse infrastructure consisting of a relational Oracle database and Hadoop clusters.They used a separate more real-time pipeline for observability. It contained regular server metrics (CPU, errors, etc.), structured logs and distributed tracing events, all flowing to Zenoss.It was a cumbersome manual process to add new metrics there. This pipeline’s data was not available anywhere else besides Zenoss, so it couldn’t be freely processed or joined with other data.There were a few commonalities between both of these pipelines:manual work - both systems required a lot of manual maintenance, both to keep the lights on and add new data to.large backlogs - both systems had large work backlogs of missing data that needed to be added by the resource-constrained central teams responsible for them.no integration...

First seen: 2025-08-23 19:41

Last seen: 2025-08-24 16:10