Wavelet Trees: An Introduction (2011)

https://news.ycombinator.com/rss Hits: 11
Summary

Today I will talk about an elegant way of answering rank queries on sequences over larger alphabets – a structure called the Wavelet Tree. In my last post I introduced a data structure called RRR, which is used to quickly answer rank queries on binary sequences, and provide implicit compression. A Wavelet Tree organises a string into a hierarchy of bit vectors. A rank query has time complexity is $\mathcal{O}(\log_2{A})$, where $A$ is the size of the alphabet. It was introduced by Grossi, Gupta and Vitter in their 2003 paper High-order entropy-compressed text indexes [4] (see the Further Reading section for more papers). It has since been featured in many papers [1, 2, 3, 5, 6]. If you store the bit vectors in RRR sequences, it may take less space than the original sequence. Alternatively, you could store the bit vectors in the rank indexes proposed by Sadakane and Okonohara [7]. It has a different approach to compression. I will talk about it another time ;) – fortunately, I will be studying under Sadakane-sensei at a later date (update: now I’m doing my Ph.D. under him in Tokyo). In a different future post, I will show how Suffix Arrays can be used to find arbitrary patterns of length $P$, by issuing $2P$ rank queries. If using a Wavelet Tree, this means a pattern search has $\mathcal{O}(P \log_2{A})$ time complexity, that is, the size of size of the ‘haystack’ doesn’t matter, it instead depends on the size of the ‘needle’ and size of the alphabet. Constructing a Wavelet Tree A Wavelet Tree converts a string into a balanced binary-tree of bit vectors, where a $0$ replaces half of the symbols, and a $1$ replaces the other half. This creates ambiguity, but at each level this alphabet is filtered and re-encoded, so the ambiguity lessens, until there is no ambiguity at all. The tree is defined recursively as follows: Take the alphabet of the string, and encode the first half as $0$, the second half as $1$: $\{ a, b, c, d \}$ would become $\{ 0, 0, 1, 1 \}$; Group each...

First seen: 2025-05-15 15:39

Last seen: 2025-05-16 01:41