Bloom Filters

https://news.ycombinator.com/rss Hits: 13
Summary

May 01, 2025 at 18:35 Tags Go The original motivation for the creation of Bloom filters is efficient set membership, using a probabilistic approach to significantly reduce the time and space required to reject items that are not members in a certain set. The data structure was proposed by Burton Bloom in a 1970 paper titled "Space/Time Trade-offs in Hash Coding with Allowable Errors". It's a good paper that's worth reading. Why Bloom filters? Suppose that we store some information on disk and want to check if a certain file contains a certain entry. Reading from disk is time consuming, so we want to minimize it as much as possible. A Bloom filter is a data structure that implements a cache with probabilistic properties: If the cache says the key is not present in a specific file, then it's 100% certain we should not be reading the file. If the cache says the key is present in the file, there's a small chance this is a false positive (and in fact the key isn't there). In this case we just read the file as usual. In a scenario where the majority of queries "is this key in that file?" have a negative answer, a Bloom filter can significantly speed up the system . Moreover, the probabilistic nature (the existence of false positives) allows Bloom filters to be extremely fast and occupy very little space. Here's a quote from the Bloom paper: The new hash-coding methods to be introduced are suggested for applications in which the great majority of messages to be tested will not belong to the given set. For these applications, it is appropriate to consider as a unit of time (called reject time) the average time required to classify a test message as a nonmember of the given set. How a Bloom filter works A Bloom filter is a special kind of a hash table with open addressing. It's an array of bits (the length is typically denoted m), and some fixed number (k) of hash functions. We'll assume each hash function can take an arbitrary sequence of bytes and hash it into an integer i...

First seen: 2025-05-02 06:39

Last seen: 2025-05-02 18:41