How to ingest 1B rows/s in ClickHouse

https://news.ycombinator.com/rss Hits: 2
Summary

A few weeks ago, I saw a talk from Tesla claiming they were ingesting 1B rows per second using ClickHouse. I'm a petrolhead but I don't have any reason to think they are lying :). One (American) billion rows per second might feel like a lot, so let me try to explain how you can achieve that using ClickHouse. I'm not sure what ClickHouse flavor Tesla uses, but I don't think that's really important. I'm going to use the open source ClickHouse version for these tests.Let me do first a super quick intro about ClickHouse architecture:ClickHouse clusters are made up of nodes which can be replicas and shards.Each shard stores a portion of the data. Sharding can be random or using any kind of rule (e.g., split by customer)Each shard has N replicas, a "copy" of the data in each nodeCoordination is done using Zookeeper (original Zookeeper or the ClickHouse keeper).So data is sent to any of the replicas in each shard, ClickHouse replicates to all the replicas in that shard. When querying ClickHouse, the query is distributed using any replica in all the shards.If you are familiar with ClickHouse you most likely already know. If not, you'll need to do some back and forth with ChatGPT, but the important part is that you have buckets and you put your data in any of them.How do we ingest 1B rows per second?It's actually "easy": Check how many rows a single node can ingest, then divide 1B by that number and you know how many you need.Let's do a quick test. I'm not focusing on performance, it does not matter now, but just so you know I'm running all of this on my laptop (Macbook M4 Pro).1. Create a simple tableI asked ChatGPT to create some sample:CREATE TABLE otel_vehicle_metrics ( time DateTime64(9), resource LowCardinality(String), scope LowCardinality(String), metric_name LowCardinality(String), metric_type LowCardinality(String), value Float64, attributes Map(LowCardinality(String), String) ) ENGINE = MergeTree ORDER BY (metric_name, time) 2. Generate 10M rows with sample dataAg...

First seen: 2025-08-18 15:42

Last seen: 2025-08-21 17:18