Hashed sorting is typically faster than hash tables

https://news.ycombinator.com/rss Hits: 9
Summary

Hashed sorting is typically faster than hash tables Problem statement: count the unique values in a large array of mostly-unique uint64s. Two standard approaches are: Insert into a hash table and return the number of entries. Sort the array, then count positions that differ from their predecessor. Hash tables win the interview (O(n)O(n)O(n) vs O(nlog⁡n)O(n \log n)O(nlogn)), but sorting is typically faster in a well-tuned implementation. This problem and its variants are the inner loop of some of the world’s biggest CPU workloads. Benchmark highlights Here is performance on M2 Pro, comparing our best-tuned hash table and our best-tuned sorting implementation. We also include Rust’s default implementations (sort_unstable(), and HashSet with foldhash::fast) as a reference point for high-quality general-purpose implementations: Data size Baseline hash table Baseline sorting Tuned hash table Tuned sorting 8 KiB 3.8 µs 5.1 µs 1.6 µs 6.5 µs 256 KiB 219 µs 264 µs 193 µs 92 µs 8 MiB 8.1 ms 12.0 ms 7.1 ms 3.9 ms 256 MiB 875 ms 464 ms 269 ms 185 ms 2 GiB 7.6 s 4.3 s 2.6 s 1.9 s Tuned sorting beats our best hash table by ~1.5× on non-tiny sizes, and is up to 4× faster than the excellent “Swiss Table” hash tables that ship with Rust’s standard library. Benchmark code is available. Why does sorting win? Memory bandwidth: even though sorting makes multiple passes through memory, each pass uses bandwidth far more efficiently than a hash table’s single pass. Once the dataset outgrows CPU caches (often around ~1 MiB on a single core, CPU-dependent), both hashing and sorting become limited by cache-line fetch bandwidth to main memory. Cache lines are typically 64 bytes, and the CPU fetches the entire line if you touch even a single byte. Hash tables waste bandwidth: each 8-byte key access pulls a full 64-byte cache line. So a hash table must incur at least 128 bytes of traffic per uint64 processed: 64 bytes read and 64 bytes written. For sorting we use a radix sort that splits into 10...

First seen: 2025-09-11 09:14

Last seen: 2025-09-11 17:16