Bulk Operations in Boost.Bloom

https://news.ycombinator.com/rss Hits: 3
Summary

Starting in Boost 1.90, Boost.Bloom will provide so-called bulk operations, which, in general, can speed up insertion and lookup by a sizable factor. The key idea behind this optimization is to separate in time the calculation of a position in the Bloom filter's array from its actual access. For instance, if this is the the algorithm for regular insertion into a Bloom filter with k = 1 (all the snippets in this article are simplified, illustrative versions of the actual source code): void insert(const value_type& x) { auto h = hash(x); auto p = position(h); set(position, 1); } then the bulk-mode variant for insertion of a range of values would look like: void insert(const std::array<value_type, N>& x) { std::size_t positions[N]; // pipeline position calculation and memory access for(std::size_t i = 0; i < N; ++i) { auto h = hash(x[i]); positions[i] = position(h); prefetch(positions[i]); } for(std::size_t i = 0; i < N; ++i) { set(positions[i], 1); } } By prefetching the address of positions[i] way in advance of its actual usage in set(positions[i], 1), we make sure that the latter is accessing a cached value and avoid (or minimize) the CPU stalling that would result from reaching out to cold memory. We have studied bulk optimization in more detail in the context of boost::concurrent_flat_map. You can see actual measurements of the performance gains achieved in a dedicated repo; as expected, gains are higher for larger bit arrays not fitting in the lower levels of the CPU's cache hierarchy.From an algorithmic point of view, the most interesting case is that of lookup operations for k > 1, since the baseline non-bulk procedure is not easily amenable to pipelining: bool may_contain(const value_type& x) { auto h = hash(x); for(int i = 0; i < k; ++i) { auto p = position(h); if(check(position) == false) return false; if(i < k - 1) h = next_hash(h); } return true; } This algorithm is branchful and can take anywhere from 1 to k iterations, the latter being the case for eleme...

First seen: 2025-10-08 17:14

Last seen: 2025-10-08 19:15