From Millions to Billions

https://news.ycombinator.com/rss Hits: 16
Summary

How we solved request logging at scale by moving from MariaDB to ClickHouse, Kafka, and Vector after our deprecated database engine couldn't keep up with billions of monthly requests.This article is based on a talk that I gave at PHP[tek] in 2025. Scaling Request Logging from Millions to Billions with ClickHouse, Kafka, and Vector Why Track All These Requests? Geocodio offers a pay-as-you-go metered plan where users get 2,500 free geocoding lookups per day. This means we need to: Track the 2,500 free tier requests Continue tracking above that threshold for billing Let users view their usage in real-time on their dashboard Give admins the ability to query this data for support and debugging Store request details so we can replay customer requests when debugging issues This isn't just nice-to-have data. It's tied directly to our billing and customer support workflows. The Original Architecture Our initial setup was pretty straightforward: We were using MariaDB with the TokuDB storage engine, which was specifically designed for high-performance insert operations with incredibly good compression—often 5 to 10 times better than InnoDB. When you're dealing with billions of records, storage efficiency matters. Here's what our request tracking table looked like: CREATE TABLE `requests_2020_05` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `user_id` int(11) NOT NULL, `access_id` int(11) NOT NULL, `query` text, `count` int(11) DEFAULT 1, `fields` varchar(255) DEFAULT NULL, `fields_count` int(11) DEFAULT 0, `accuracy_score` decimal(5,4) DEFAULT NULL, `accuracy_type` varchar(50) DEFAULT NULL, `http_status` int(11) DEFAULT 200, `response_time_ms` int(11) DEFAULT NULL, `queue_time_ms` int(11) DEFAULT NULL, `user_agent` text, `source_ip_address` varchar(45) DEFAULT NULL, `request_type` varchar(20) DEFAULT NULL, `hostname` varchar(100) DEFAULT NULL, `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `deleted` tinyint(1) DEFAULT 0, PRIMARY KEY (`id`), KEY `user_id` (`user_id`),...

First seen: 2025-10-13 17:25

Last seen: 2025-10-14 09:31