The Probability of a Hash Collision

https://news.ycombinator.com/rss Hits: 10
Summary

The probability of a hash collision 2022-01-15 Exploring the math behind hash collisions. Tags: probability << previousnext >> A hash function takes arbitrarily complex input - a word, a website, an image, a human being - and maps it to a single number. This is useful for various computer science stuffs, such as data storage and cryptography. For example, let's say you want to store a book in one of NNN boxes. If you put the book in a random box, it's quite likely that you'll forget which box you picked, especially as NNN gets bigger. What you can do instead is apply a hash function to the title of the book (probably The Notebook, knowing you), which will spit out a number, iii. You then put your book in the iii-th box. When you want to retrieve The Notebook again, you recompute the hash function and it tells you to look in the iii-th box, without you needing to remember where you stored anything! You won't lose a book from your Nicholas Sparks collection ever again. We won't get into the specifics of what a hash function looks like. This post is instead concerned with hash collisions, which happen when your hash function assigns the same value to two different items. This might be a Bad Thing. Our hash function wouldn't be very useful, for instance, if it told us to put all our books in the 1st box, because we'd have to spend a lot of time rooting around in that box to find a specific book. For this reason, a good hash function should have evenly distributed outputs. But even then, it's only a matter of time before a hash collision occurs as we add more and more books to our collection. By the time we reach N+1N+1N+1 books, there won't enough boxes to store the books individually and there will definitely be at least 1 hash collision. Given NNN boxes and kkk books, how do you figure out the probability of a hash collision? Hash collisions can be a Bad Thing, but rather than trying to eliminate them entirely (an impossible task), you might instead buy enough boxes t...

First seen: 2025-06-25 06:17

Last seen: 2025-06-25 15:18