How count-min sketches work – frequencies, but without the actual data

https://news.ycombinator.com/rss Hits: 5
Summary

Our teammate Daniel introduced Count-Min Sketches in Instant (a sync engine you can spin up in less than a minute). Sketches were so small and so fast that I got into a rabbit hole learning about them. The following post came out of the process. I have read and re-read just about every one of PG Wodehouse’s 71 novels. He’s one of my favorite authors. Wodehouse can take seemingly silly plots (quite a few involving stealing pigs) and twist them until you’re rapt with attention. And he’s a master of the English language. Wodehouse is known for eccentric diction. Instead of "Freddie walked over", he’ll say "Freddie (shimmied | beetled | ambled) over". You may wonder, how many times did he use the word 'beetle'? Well I could tell you approximately how many times Wodehouse used any word in his entire lexicon, just by loading the data structure embedded in this image: Compressed, it's 50 kilobytes and covers a 23 megabyte text file, or 3.7 million words. We can use it to answer count estimates with 0.05% error rate and 99% confidence. (If you aren't familiar with the probability terms here, no worries, we'll go over them in this post.) You can try it yourself right here: The Count-Min Sketch The magic needed to make this happen is called the Count-Min Sketch — a data structure that can give you frequency estimates over giant amounts of data without becoming a giant object itself. You could use it to make passwords safer: track all known passwords on the internet, and detect whenever someone chooses a common password. [1] Or you could use it estimate the popularity of links: update a sketch whenever a user looks at a tweet, and you can query for approximate views. [2] Or, use it to make databases faster: track the values of different columns, so you can estimate how many rows a filter would return. This is how we use them in Instant: our query planner decides which indexes to use based on estimates from sketches. [3] So how do Count-Min Sketches work? In this post we'll fin...

First seen: 2025-10-23 19:32

Last seen: 2025-10-23 23:33